Skip to content

Open Web Data Integration with Archive Information

Web records encompass various types of information, documents, or databases transmitted from a server to a browser using Hypertext Transfer Protocol (HTTP) upon activation of a URL. Key components of these records can incorporate other record types, necessitating the delineation of their...

Open Data Network for Internet Documents
Open Data Network for Internet Documents

Open Web Data Integration with Archive Information

==========================================================================================

In the vast digital landscape of the National Archives and Records Administration (NARA), several crucial file formats are housed, including Cascading Style Sheets (CSS) and Extensible Hypertext Markup Language (HTML). These formats, along with others, form part of the Web Records, Software and Code, and Structured Data categories in NARA's holdings.

For instance, HTML versions 1.0 and 1.1, as well as CSS versions 1.0, 2.0, 2.1, 2.2, and an unspecified version, are all archived with their respective NARA Format IDs and Linked Open Data available at the corresponding URLs in a table. Extensible Forms Description Language (XFDL) and the CDX Internet Archive Index also fall under these categories.

While specific details about a "Web Records Preservation Plan" used as testing criteria in format transformations for web records could not be found, general principles can be inferred from records management principles and preservation practices.

A Web Records Preservation Plan, in the context of format transformations, would typically serve as a documented strategy outlining how web-based records are preserved over time, especially when migrating or converting formats to maintain accessibility, usability, integrity, and authenticity.

Such a plan would involve identifying web records types, their characteristics, and formats; defining retention periods and preservation goals based on legal, business, and historical requirements; criteria and methodologies for format transformation testing to ensure the fidelity and usability of web records post-transformation; procedures for documentation of processing, validation, and quality assurance steps during transformation; methods for storage, backup, and access controls to protect records integrity; and regular review and update cycles to accommodate technological changes and evolving standards.

Testing criteria in format transformations verify that converted web records remain complete, unaltered in meaning, and usable in target environments. Testing can include checks on metadata preservation, content rendering, functional links, multimedia handling, and compliance with standards like WARC (Web ARChive format).

The Digital Preservation Framework as Linked Open Data includes the same elements as are available in the version of the Preservation Plans on GitHub. NARA's Linked Open Data can be opened in any text editor, and it is available in Resource Description Framework Terse RDF Triple Language (RDF Turtle) format.

Other file formats under the Web Records category in NARA holdings include Adobe AIR files and Cascading Style Sheets 1.0, both with their respective NARA Format IDs and Linked Open Data available at the corresponding URLs.

For comprehensive specifics on this topic, it may be necessary to consult specialized guidelines or standards for digital preservation of web records such as those from the International Internet Preservation Consortium (IIPC), the Library of Congress Digital Preservation, or ISO standards related to digital records preservation.

Technology is essential in data-and-cloud computing for the preservation, transformation, and long-term storage of web records. The Web Records Preservation Plan, a strategic document outlining these processes, utilizes technology to define retention periods, ensure fidelity and usability post-transformation, and maintain the integrity of web records.

Read also:

    Latest