Abstract
Development of the ASME Materials Properties Database was initiated in the early 2010s to support the ASME Codes and Standards. As information technologies advance at an accelerated pace with the artificial intelligence era on the horizon, the ASME Materials Properties Database must be further modernized from a database to a knowledgebase to ride the wave of digital information revolution and effectively support the ASME Codes and Standards in the new era. This paper is intended to provide an overview of the ASME Materials Properties Database and discuss a roadmap for its future development to facilitate understanding of and participation from different sectors of the Codes and Standards community. It first reviews the basic concepts of data, information, knowledge, database, and database system as well as the pros and cons in different types of data management and then discusses the path forward for a desired evolution of the database into a self-explanatory and machine-readable knowledgebase that is consistent with human cognitive processes for the Codes and Standards development and, furthermore, provides resources for data processing and analysis to reach an eventual goal of streamlining the Codes and Standards development from the initial inquiry, throughout data submission, analysis, …, to Codes and Standards rule establishment for final publication.
1 Introduction
Engineering materials, including metals, graphite, ceramics, and composites, are essential to industrial development and are, therefore, an integral part of the ASME Codes and Standards. To support the continued advancement and maintenance of the Codes and Standards, ASME has been developing an online database named “ASME Materials Properties Database” since 2012 with technical support from the Oak Ridge National Laboratory under the supervision of the materials database working group (MDWG) of the Boiler and Pressure Vessel Code (BPVC) Section II Committee on Materials [1,2].
Historically, the Codes and Standards have been published in hardcopy books and more recently electronic media, such as portable document format (PDF) files. As the world is now rapidly approaching the artificial intelligence (AI) era, changes of the Codes and Standard become inevitable. To usher in the new era, ASME has started to position itself as the leading organization in the upcoming global transition. In February 2022, a seminar “ASME Digitalization Codes & Standards: The Future of Standards” was held by ASME Technical and Engineering Communities Sector and the Digitization Technology Group. Presentations were made by two trendsetting standards committees including the ASME Model Based Enterprise Standards Committee and the ASME Plant System Design Standards Committee. Their objective is to go beyond the PDF publication and incorporate the Codes and Standards into accepted datasets to assist with integrated information flow.
Accordingly, the MDWG must follow suit and consider what the Codes and Standards may look like in 50 years and what must be planned now in order to meet the needs and requirements of the digital transformation and evolution of the Codes and Standards. This paper is intended to conduct a thorough discussion on the ASME Materials Properties Database development, starting from a brief review of its current status and prospects, some basic concepts, and database types, followed by data modernization needs and requirements, and finally the objectives and an eventual goal on the path forward in database structural and functional development, as well as data collection and management strategies.
It is hoped that by providing this discussion, users and stake holders of the Codes and Standards can gain a good understanding of the efforts and, whenever possible and adequate, offer their support and contribution to the ASME Materials Properties Database development to ensure continued success of the Codes and Standards in the AI era.
2 Current Status of the Database
In the initial plan for the ASME Materials Properties Database project, a piecewise development strategy was set out to allow evolution of the system to be adaptable to emerging needs and requirements so that it could start from BPVC Section II but eventually support the entire ASME Codes and Standards [3]. As of today, a total of fourteen compartments have been developed as listed below. Each compartment is constructed based on a specific database schema design that matches the structure of its managed information subdomain. These constructed compartments continue to evolve as new needs and requirements emerge from the database operation.
Operation Instructions
ASME Records
Material Indices [developing]
Data Package/Metallic Materials
Pedigree/Metallic Materials
Test Data/Creep
Test Data/Tensile
Permit and Restriction
Data Analysis
Software
BPVC Section II Part D/PDF
References
Terminologies
Link Management
The Operation Instructions compartment provides step-by-step directions in videos, photos, slides, texts, and/or PDF documents for navigating the database and using the built-in functionalities and software tools of the database management system. The ASME Records compartment manages the records and associated information generated from materials codification activities of the quarterly Code Week meetings. The Material Indices compartment is intended to list all materials qualified for the Codes and Standards, and furthermore, hyperlink to all relevant data in other compartments to provide an overview of the types and quantity of information available in the database for a given material. The Data Package/Metallic Materials compartment is used to store alloy test data files received from Codes and Standards development inquiries in their as-received formats, which enables convenient and quick retention of the information to facilitate data collection. The Pedigree/Metallic Material compartment manages the background information, such as chemistries, product forms, treatments, grain size, etc., of given alloy batches used to generate the test data files in the Data Package/Metallic Materials compartment. The Test Data/Creep and Test Data/Tensile compartments are designed to manage self-explanatory and machine-readable creep and tensile test data, respectively, processed from the as-received test data files in the Data Package/Metallic Materials compartment. The Permit and Restriction compartment stores necessary permission and/or restriction from data providers on the data they provided to the database. The Data Analysis compartment allows a data analyst to create a virtual drawing table for his/her specific Codes and Standards development task so that he/she can lay out the essential information for the task and also hyperlinks to all the needed data, references, tools, as well as the analysis results for convenient access, and more importantly, leave behind a comprehensive record with hyperlink trails that enables peers and posterity to check and/or understand details of his/her development process. The Software compartment is designed to manage software applications created for data analysis, data collection, data reduction, etc., used in Codes and Standards development. The BPVC Section II Part D/PDF compartment preserves historical editions of the PBVC Section II Part D properties tables (II-D Tables) in PDF documents. The Reference compartment manages the reference documents involved in Codes and Standards development. The Terminologies compartment provides definitions of vocabularies that may become ambiguous and prone to misinterpretation in the database for users of different technical backgrounds. The Link Management compartment documents all the hyperlink mechanisms created in the database and is intended for database developers access only to facilitate their development work.
The database is developed under the MDWG charter that stipulates: “The MDWG will facilitate the development and maintenance of the content of the ASME Materials Properties Database. The responsibilities will include evaluating and making recommendations on the type of data to be uploaded, the functionality and features for the database, criteria and format for the data, and user/access types.” The MDWG meets quarterly or as needed to review and discuss the development.
3 Codes and Standards in the Artificial Intelligence Era
To optimize further development of the ASME Materials Properties Database and ensure effective support to the Codes and Standards in the rapidly approaching AI era, it is necessary to first prognosticate on what the Codes and Standards may look like in the next 50 years. As previously mentioned, in preparing for the future the two trendsetting standards committees have set their objective to incorporate the Codes and Standards into accepted datasets to assist with integrated information flow. It can be expected that, with this objective achieved, the future Codes and Standards should possess the following two characteristics:
Published in machine-readable forms that allow efficient use and processing of the information by simulation and design software applications
Enabled traceability from the Codes and Standards rules and design parameters to the initial datasets accepted for developing the rules and design parameters throughout the entire development steps in between, and vice versa
The first characteristic will greatly facilitate automated data exchange for applications of AI technologies in design and construction projects that use the Codes and Standards. The second one will not only enhance transparency in the Codes and Standards development process, and thus, build a good understanding of and strong user confidence in the design parameters and rules, but also help eliminating latent inconsistency in different parts of the Codes and Standards as well as disconnections between the design parameters/rules and the initial data and assumptions/bases from which the design parameters and rules are derived.
Coverage of the Codes and Standards may also be expanded in the next 50 years. Currently, ASME is looking into new initiative actions with a concentration in five core areas, including bioengineering, robotics, clean energy, manufacturing, and pressure technology. The database will also expand accordingly as needed.
4 Basic Concepts and Database Types
Before the discussion of the path forward in materials data modernization for the Codes and Standards in the AI era, a few basic concepts and database types need to be reviewed.
4.1 Basic Concepts
Data: Symbols or signs, representing observations or the product of observations
Information: Data that are endowed with meaning and purpose for the recipient
Knowledge: Information in a structured form, consistent with human cognitive processes as opposed to simple lists of data items
Database: An organized collection of files or data that are stored in hardcopy archives or a computer system
Database System: a.k.a. database management system, a computer software application that interacts with the user, other applications, and the database itself to manage data
When a database contains not simply data items, but a collection of information in a structured form that is consistent with human cognitive processes and maintains information interrelationships enabling a full understanding of the information for deductive and inductive reasoning like a subject matter expert, it becomes a knowledgebase, although it may still be habitually called a database. Correspondingly, its computer software application must be able to manage the complicated information interrelationships of the knowledge instead of simply lists of data items.
4.2 Database Types.
Although various collections of data and/or files may be considered as databases, from the operation and functionality perspective, or in other words, judged on the sophistication levels of data management, they can be roughly categorized into the following four types/levels excluding the anachronistic hardcopy archive collections.
Upload electronic data files for users to browse and find the information they need
Organize data files in a searchable fashion and maintain relationships among data chunks
Manage data/metadata and detail interrelationships of the data/metadata in self-explanatory and machine-readable records in customized database schemas
Enable automated data exchange between database system and simulation software applications
Development to level 1 produces a “data dump.” It is economical and easy for database developers, but as more and more data files are stored it inevitably grows into a data swamp that is very difficult for users to navigate. Many databases are, therefore, developed to level 2 to facilitate navigation but often prove prone to misinterpretation and misuse of data in unavoidable encounters with ambiguities and information gaps in some data, particularly when consultation with the data providers to clarify ambiguities is no longer available. Misinterpretation and misuse of data in the development of Codes and Standards can likely put engineering applications at risk and may also become dangerous to the public. Development to level 3 can eliminate these problems but is much more effort-demanding for the database developers, particularly it requires subject matter expertise of the developers in creating self-explanatory records of data for a given subject matter. Level 4 must be built on level 3 with custom Application Programing Interfaces (APIs) to enable great efficiency and accuracy in data transfer, simulation, and machine learning.
5 Data Modernization Requirements
From the discussion above, it has become obvious that to effectively support the Codes and Standards in the upcoming AI era, it is the materials knowledge, instead of merely materials data, which needs to be managed by the ASME Materials Properties Database.
It should further be noted that data is the carrier of information, and for it to be managed as knowledge, data must be organized into a meaningful and purposeful structure that is consistent with human cognitive processes. To achieve such knowledge management, a database must be constructed with schemas that map the purposeful information structure of the intended knowledge domain and subdomains.
It follows that for the knowledge domain and subdomains of developing and maintaining the Codes and Standards, the schemas of the ASME Materials Properties Database must map the entire information structure and information flow from the initial Codes and Standards inquiry data package, through various data analysis operations, to the final product of Codes and Standards rules and design parameters for publication.
Furthermore, to ensure quality and efficiency of the Codes and Standards development and maintenance in the AI era, data preserved in the ASME Materials Properties Database should possess the following two essential characteristics:
Self-explanatory
Machine-readable
Being self-explanatory here means the information from the database can be clearly understood by users without any ambiguity or likelihood of misinterpretation, and particularly posterity that will no longer have the possibility to consult the original data provider can unequivocally comprehend the information with confidence and derive the knowledge they need for their Codes and Standards development and maintenance activities without misusing the data.
To achieve the self-explanatory capability in the ASME Materials Properties Database, the following approach should be taken in the database schema design. Ontological analyses are first conducted on the intended knowledge domain to identify its constituents and basic elements of the information. The analysis results can then be used to help delineate the information structure of the intended knowledge domain and subdomains. For example, the knowledge domain for the development and maintenance of BPVC Section II Part D properties tables (II-D Tables) is found to include subdomains of tensile properties, creep properties, alloy pedigrees and so on. Each subdomain involves a specific group of attributes. According to this structure, specific database compartments, or tables in the database jargon, can then be designed for managing the data of corresponding subdomains, e.g., a tensile data compartment for tensile property data, and a creep data compartment for creep property data. Based on the information elements and substructure of a given subdomain, a data record template can be designed, with attributes, a.k.a. entries, each having its specific meaning in the custom record layout that is consistent with human cognitive processes. Terminologies used in the attributes can be further equipped with online definition menus wherever ambiguity and multiple interpretations may become a concern. Furthermore, all the related information elements in the database are hyperlinked to maintain their interrelationships with easy navigability for developing a full understanding of the contents. With all these component designs, the overall database schema is created to ensure that database users can easily comprehend the information based on its locations in the database schema, the attribute terminologies, the online definition menus, and the hyperlinks to related data and/or metadata.
Being machine-readable here means the information is preserved in a fashion that its data can be individually and collectively processed by the built-in functionalities and software tools of the database management system. To achieve such machine-readability, data should be preserved and managed at the Lowest Information Element Level. An analogy is provided in Table 1 with three popular formats to facilitate explaining this concept. As we all known, in the PDF format, a word, a number, or a phrase, i.e., information elements at the lowest meaningful level, can neither be individually managed nor processed. It is impossible for user to move such an information element around in the document or conduct a mathematical or logical operation on it. In the DOC format, which is the format of Microsoft Word, the information elements can be individually moved around or edited but cannot be mathematically or logically operated on. In the XLS format, which is the format of a spreadsheet, the information elements are contained in individual cells and can, thus, be individually and collectively managed and processed using the built-in software functionalities and tools.
To achieve the machine-readability in the ASME Materials Properties Database, the following approach should be taken in the database construction. Based on the schema design, the database is constructed using various virtual containers such as compartments and record templates with attributes each of a specific purpose and software functionalities. Then, data packages are analyzed to break down to the Lowest Information Element Level and the resulting information elements are uploaded into corresponding attribute virtual containers, which are already organized by the schema design to be consistent with human cognitive processes. Since the virtual containers are electronically functional, the uploaded data can be individually and collectively processed using the built-in functionalities and software tools of the database management system, which enables control and manipulation of every information element. Operations, such as data extraction, reformatting, mathematical calculation, logical operation, etc., can be conveniently conducted. Automated data exchanges between the database and external databases or simulation software can also be enabled through custom APIs.
6 Path Forward
To modernize the materials data for the Codes and Standards to flourish in the AI era, the ASME Materials Properties Database must keep steadily advancing in areas including the database structure, system functionalities, and data amassment to fully achieve three objectives and reach a final goal [4].
6.1 Three Objectives and the Final Goal.
On the path forward, the database development will continue to strive for the final goal of streamlining the Codes and Standards development from the initial inquiry throughout data submission, analysis, rule development, …, to the eventual Codes and Standards publication through the following three major objectives.
Preserve Codes and Standards development documents, materials data, metadata, and all necessary background information
Track interrelationships among the documents, materials data, metadata, and all necessary background information
Provide resources for data processing and analysis
Currently objective 1 is covered by the following eight compartments and more will be constructed as needs and requirements are identified in future Codes and Standards development:
ASME Records
Data Package/Metallic Materials
Pedigree/Metallic Materials
Test Data/Creep
Test Data/Tensile
Permit and Restriction
BPVC Section II Part D/PDF
References
Objective 2 is achieved by adequate arrangement of associated information elements to relational locations in the database and a network of hyperlinks. As the Data Analysis compartment demo shows in Fig. 1, the ASME Record document Cu-Zn-Pb_99-391_200012 is downloadable from attribute Related ASME Record under the database record heading of GENERAL INFORMATION. Its associated information elements are preserved under various headings with their respective attribute names and relative locations indicating their interrelationships. Whenever the locational arrangement is not effective or impossible, hyperlinks are created to connect related information elements. As the database grows in the future, more relational records and hyperlinks will be created.
Objective 3 is currently being reached for by the following five compartments. They will be further expanded to meet the emerging needs and requirements in Codes and Standards development and maintenance. New compartments will also be added as needed to make the development and maintenance of the Codes and Standards increasingly efficient.
Operation Instructions
Material Indices
Data Analysis
Software
Terminologies
6.2 Database Structural Development.
At the present time, the following new compartments are under consideration for development.
6.2.1 Code and Standard Methods.
It has been a lesson learned the hard way that some methods and rules employed in developing the Codes and Standards become blank operations without the rationales behind them fully understood by most developers and users. For example, in the BPVC stress tables, why a 1.1 factor must be applied for tensile strength, but not for yield strength, above room temperature? And how the Favg factor applied to average stress for rupture in 100,000 h was conceptualized? The answer to the first question was barely retained before a senior Code developer in the know passed away [5], and that to the second one is still in pursuance. More similar items are at risk of getting lost forever. It has become obvious that preservation of such knowledge is not only important for ensuring consistency in Codes and Standards development and maintenance but also essential for educated further development and/or improvement of the methods and rules. It follows that to prevent losing the knowledge and reinventing the wheel, a dedicated compartment with effective search capability must be created.
6.2.2 Test Data/Fatigue.
In addition to continued optimization of the existing compartments of Test Data/Creep and Test Data/Tensile, a compartment whose schema maps the information structure of fatigue property data must be developed to support Codes and Standards concerning material fatigue behavior.
6.2.3 Test Data/Fatigue Crack Growth.
Because the fatigue crack growth is characterized with a set of attributes different from those for fatigue life, a compartment whose schema maps the information structure of fatigue crack growth property data must be developed to support Codes and Standards concerning material fatigue crack growth behavior.
6.2.4 Data Package/Graphite Materials.
As graphite materials are increasingly involved in Codes and Standards for high temperature structural applications, a dedicated compartment for preserving the graphite data files from Codes and Standards inquiries is needed.
6.2.5 Data Package/Composite Materials.
Same as the graphite materials, there is a trend in increasing use of composite materials in Codes and Standards, and therefore a dedicated compartment for preserving the composite data files from Codes and Standards inquiries is needed.
6.2.6 Pedigree/Graphite Materials.
Since engineering structural graphite materials are usually specific in their provenance and processing history, a compartment of pedigree is essential for managing such information for knowledgeable analysis and use of their test data.
6.2.7 Pedigree/Composite Materials.
The properties of composite materials are extremely sensitive to their manufacturing processes and constituent materials. Adequately managing such information is crucial for enabling correct understanding of their property data and confident use of the materials.
6.2.8 Boiler and Pressure Vessel Code Section II Part D/Digital.
The existing BPVC Section II Part D/PDF compartment was developed to meet the urgent need of preserving historical PDF documents of the BPVC Section II Part D tables before they became irretrievable. To enable integration of the Codes and Standards into accepted datasets, such tables in the future must be managed in a machine-readable fashion. The BPVC Section II Part D/Digital compartment needs to be developed to satisfy this requirement.
6.2.9 Project Management.
Developing a self-explanatory and machine-readable database that effectively manages knowledge instead of merely data requires significant efforts and time. The process will involve considerable details management and sometimes long-term activities coordination. To ensure all the developers stay on the same page for orderly progress, a Project Management compartment is necessary.
It is noted that the compartments discussed above are based on the current perception of the needs and requirements for the database. As the Codes and Standards development and maintenance continues, new needs and requirements will emerge, and accordingly new compartments will be designed and constructed. For example, to support the Plant System Design Standard (PSDS), particularly the probabilistic design methods (PDM), fracture (brittle and ductile), and tearing data are of great interest. In addition, environmental effects (stress corrosion cracking, for example) and load ordering effects increasingly demand Codes and Standards coverage. New technologies, such as additive manufacturing, also requires new Codes and Standards.
6.3 System Functional Development.
As a machine-readable database intended to manage knowledge rather than merely data, i.e., a knowledgebase, its effectiveness will heavily depend on the capabilities of its database management system. To achieve the final goal of streamlining the Codes and Standards development, more built-in functionalities and software tools must be created. At the present time, development of the following applications is under consideration.
6.3.1 Data Extraction Templates.
In developing a given Codes and Standards item, it is common that multiple sets of data for a specific group of attributes are of interest. For example, to derive the Maximum Allowable Stress and the Design Stress Intensity values consistent with the PBVC time-independent criteria using the trend curve method, multiple datasets for a group of tensile property attributes listed in the left column of Table 2 are of interest and to derive those values consistent with the PBVC time-dependent criteria using the Larson-Miller, Orr-Sherby-Dorn, and Manson-Haferd methods, multiple datasets for a group of creep property attributes listed in the right column of Table 2 are of interest. Extracting multiple datasets of interest from each attribute in a large number of data records can be very time-consuming and error-prone. Since data in the Test Data/Creep and Test Data/Tensile compartments are machine-readable, built-in templates can be created to allow automated extraction of these interested data and output them in a consistent and desirable format in the spreadsheet for convenient use. Furthermore, additional attributes of interest, such as full stress–strain curves and creep curves, for example, can be efficiently added to these templates to meet the needs of more complicated analyses, and other desired new templates can also be created when demand arises.
Tensile property attributes (time-independent) | Creep property attributes (time-dependent) |
---|---|
Heat/lot | Heat/lot |
Test temperature | Test temperature |
Yield stress | Test stress |
Ultimate tensile stress | Rupture life |
Tensile strain | Minimum creep rate |
Reduction of area | Creep strain |
Reduction of area |
Tensile property attributes (time-independent) | Creep property attributes (time-dependent) |
---|---|
Heat/lot | Heat/lot |
Test temperature | Test temperature |
Yield stress | Test stress |
Ultimate tensile stress | Rupture life |
Tensile strain | Minimum creep rate |
Reduction of area | Creep strain |
Reduction of area |
6.3.2 Data Operation Apps.
In maintaining and using the database, various data operations are conducted, such as data collection and preparation for uploading, excessive test curve data points reduction, minimum creep rate determination, to name a few. To achieve efficiency, and more importantly, consistency for high quality Codes and Standards development, various data operation apps need to be developed to provide custom software tools in the database system.
6.3.3 Data Exchange Interfaces.
It is expected that more and more commercial software applications, such as those for computer-aided design, finite element analysis, and probability sampling techniques will be used in engineering developments associated with the Codes and Standards. The ASME Codes and Standards committees will also develop their own special software applications, e.g., the Swindeman software applications developed by the BPVC Committee on Materials for design stress derivation. As we approach the AI era, new technologies such as data-driven modeling and machine learning are anticipated as well in using the digitized Codes and Standards. All these applications require efficient data input for their operation and their output often needs a depository for preservation. To satisfy such data flow requirements and enable automated data exchanges between the ASME Materials Properties Database and these software applications, custom APIs will be developed.
6.3.4 Attribute Definition Menus.
As the ASME Materials Properties Database continue to grow in its coverage of the Codes and Standards, one attribute with multiple meanings when used by different Codes and Standards committees or with one meaning but expressed in different terminologies will inevitably be encountered. Misunderstanding of an attribute can often result in misinterpretation and misuse of its data, which may lead to serious consequences. To avoid such problems, online definition menus must be developed that allows user to access the correct definition on spot with a simple computer mouse click. An example of such online definition menus is given in Fig. 2, where attribute Static Long Term Properties is hypertexted and its definition is displayed after the hyperlink is clicked.
6.4 Data Amassment.
Besides developing effective database structure and advanced software functionalities, amassing high-quality materials data is indisputably another important factor that determines the effectiveness of the ASME Materials Properties Database. Unfortunately, for historical reasons, a large portion of the data used to develop the existing Codes and Standards is no longer available. Consequently, it has become a matter of great importance and urgency to amass high quality data, which will involve a significant effort in two aspects including identifying data sources and establishing an efficient workflow for data collection and preservation.
6.4.1 Data Sources.
Conventionally, a material's data package is submitted to ASME when an inquiry is made for a Code Case development. This will continue to be an important source for data collection. However, data accumulation rate of such collections is very slow because the quantity of each submission is often limited. Furthermore, it has been found that quality, especially completeness, of such data packages often varies depending on the inquirer's budget and originality of the data. Some desirable attributes are often excluded from the data package preparation to reduce the inquirer's project cost, and data garnered through secondhand or multiple hands often contain inadvertent human error. Obviously, more active approaches are needed to address such problems of this passive data collection method.
A lot of materials data useful for the Codes and Standards can be found in the research and development (R&D) reports generated by national laboratories of the Department of Energy (DOE) and some of the reports are available in the public domain. At a request of ASME, DOE provided a written clarification in 2021 that such data can be collected and kept by ASME for use in development of the Codes and Standards [6]. This clarification has opened a door for ASME Materials Properties Database developers to actively collect R&D reports generated by DOE national laboratories in the public domain for data preservation and use.
In 2022, the White House issued a new guidance to ensure federally funded research data equitably benefits all of America [7]. This guidance has further expanded the data collection sources for the ASME Materials Properties Database. It is expected that many datasets desired for the Codes and Standards not in the public domain may be obtained under this new guidance as some operational details are hammered out.
Another active approach to amassing materials data is to acquire data from industries. Many materials manufacturers have generated large quantities of materials data for their products. Some employees of these industries are active Codes and Standards developers and have the means of acquiring desired data and permits from their employers for the ASME Materials Properties Database. Preserving the industry-generated data in the ASME Materials Properties Database is not only beneficial to the Codes and Standards, but also to these industries as their products can be better understood and used with the supporting data securely accessible by the Codes and Standards committees for years to come.
6.5 Data Collection and Preservation.
As previously discussed, to modernize materials data for the Codes and Standards, the ASME Materials Properties Database must manage not just materials data but materials knowledge by preserving the materials data in a machine-readable and self-explanatory fashion consistent with human cognitive processes. To achieve such preservation and management, data packages, usually collected as data files in various formats, must be processed into self-explanatory and machine-readable database records through an efficient and consistent procedure, and the database system should provide adequate structure to facilitate this process.
In the early development of the ASME Materials Properties Database, the database structure was designed with two major parts, i.e., the Data File Warehouse and the Digital Database [1]. The former was intended to enable efficient retainment of data collected in various formats from different sources by uploading the data files as-collected with minimum database development efforts. The latter was designed as a relational database to manage digitized data processed from the as-collected data files. Hyperlinks would connect Digital Database records to their original data files in the Data File Warehouse to maintain the pedigree and allow convenient provenance and information verification whenever needed. In future development for data collection and preservation, the concept of these two parts will remain unchanged but their conceptual functions will be distributed into different compartments. Some compartments, such as those for data packages and documents, e.g., Data Package/Metallic Materials, ASME Records, BPVC Section II Part D/PDF, etc., will provide the function of the Data File Warehouse while others such as those for machine-readable data records, e.g., Test Data/Creep, Test Data/Tensile, and Pedigree/Metallic Materials, etc., will play the role of the Digital Database. The two functional parts together will provide the desired capability that effectively facilitates the data collection and preservation operations.
With these two functional parts and hyperlinks between them, the information flow from the initial data files collection, through the self-explanatory and machine-readable data records creation, to the final Codes and Standards publication can be securely maintained and documented when an efficient and consistent procedure for data processing is implemented.
To establish an efficient and consistent data processing procedure, two aspects should be considered, i.e., the workflow and the data quality metric.
Several crucial steps must be included in the workflow. For a given data package, its terminologies must first be unified with those used in the attributes of the database. The unification can not only minimize misinterpretation and misuse of the data, but also enhance database searchability. In the unification process, any ambiguities in the meaning of the terminologies must be clarified in a timely manner with the data provider. Next, information gaps must be identified and filled with additional data collection. Meanwhile, the interrelationships among information elements must be delineated for developing positional and/or hyperlink connections in the database schema. After all these preparations, one self-explanatory and machine-readable record can be created for structural and functional testing and improvement as needed. Finally, all data of the data package can be processed following suit in a batch.
Developing quality metric for datasets has always been a difficult endeavor because its consideration involves many complicated factors such as calibration of the data generation equipment, accuracy of measurements, completeness of data and metadata, data quantity in the dataset, to name a few. Furthermore, a given dataset can be considered of good quality for one application but of low quality for another. Nevertheless, data quality is always recognized to be important for Codes and Standards development. For example, for many statistical methods employed in Codes and Standards development, data quality can significantly affect the quantified uncertainty. Therefore, attempts must be continuously made in the quality metric development.
Currently, stipulations regarding data collection, such as Appendix 5 of BPVC Section II Part D [8] and Appendix HBB-Y of BPVC Section III Division 5 [9], provide guidelines on the types and quantity of data that need to be collected with vague and little measurement for data quality. To pursue a rudimentary and practical quality metric for starters, whether a dataset contains data for differently desirable attributes can be considered as a measurement of quality from the perspective of information completeness. In a given dataset, data for its attributes normally fall into the three desirability categories, including mandatory, conditional mandatory, and optional. Take tensile test dataset, for example, the data for attributes Ultimate Tensile Strength, Yield Strength, and Test Temperature belong to the mandatory category because the dataset will be useless without these data; the data for attributes Specimen Heat Treatment and Gage Punch Mark Length are conditional mandatory because they are essential for correct interpretation and use of the dataset under the conditions that the specimen was heat treated in addition to its pedigree material treatment and its gauge punch marks were used for elongation measurement and strain calculation; and the data for attributes Tensile Fracture Stress and Proportional Limit are optional because they are nice to have but not indispensable for most analyses in Codes and Standards development. With the data desirability categories, a metric of data quality in information completeness can be developed based on the data available versus the data desirable in each category. Such a measurement may provide a quantified tool to facilitate resolving an inevitable conflict between the data provider and data user, in which the former tends to collect data for as few attributes as possible to reduce cost and effort while the latter is inclined to having data for more attributes to ensure fully knowledgeable analyses.
7 Summary
To support ASME's objective of going beyond the traditional PDF publication and incorporating the Codes and Standards into accepted datasets to assist with integrated information flow for the rapidly approaching AI era, creation of a roadmap for future development of the ASME Materials Properties Database is discussed. To modernize the materials data, the ASME Materials Properties Database must manage materials knowledge instead of merely the materials data, which is only the carrier of information that must be organized into a purposeful structure consistent with human cognitive processes. To achieve such knowledge management, data should be preserved in a self-explanatory and machine-readable fashion.
The self-explanation can be achieved by database schemas designed based on ontological analyses of the intended knowledge domain, plus the online terminology definition menus and hyperlinked maintenance of data interrelationships while machine-readability by managing data at the Lowest Information Element Level that allows individually and collectively processing of data using built-in functionalities and software tools of the database management system.
On the path forward, the ASME Materials Properties Database will strive for its goal of streamlining the Codes and Standards development from the initial inquiry to the eventual Codes and Standards publication through preserving all development information, tracking data interrelationships, and providing resources for data processing and analysis. To achieve these objectives and goal, new database compartments will be developed, built-in functionalities and software tools will be created, and materials data will be amassed. The data amassment will leverage all possible sources and opportunities, particularly the White House's 2022 guidance to ensure federally funded research data equitably benefits all of America. To facilitate data collection with high data quality, practical quality metric will be developed and implemented.
Acknowledgment
The author would like to thank members of the Boiler and Pressure Vessel Committee on Materials, Plant System Design Standards Committee, and Model Based Enterprise Standards Committee that reviewed this paper for publication.
Funding Data
ASME Standards Technology, LLC (Contract No. NFE-19-07881).
U.S. Department of Energy, Office of Nuclear Energy Science and Technology (Contract No. DE-AC05-00OR22725; Funder ID: 10.13039/100000015).
Oak Ridge National Laboratory, managed by UT-Battelle, LLC (Funder ID: 10.13039/100006228).
Notice
This paper has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The U.S. government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this paper, or allow others to do so, for U.S. government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.1
Nomenclature
- AI =
artificial intelligence
- API =
application programing interface
- ASME =
American Society of Mechanical Engineers
- BPVC =
boiler and pressure vessel code
- CAD =
computer-aided design
- DOE =
department of energy
- FEA =
finite element analysis
- II-D Tables =
PBVC Section II Part D properties tables
- MDWG =
materials database working group
- ORNL =
Oak Ridge National Laboratory
- PDF =
portable document format
- R&D =
research and development
- TEC =
technical and engineering communities