An overview of code developed to assist work at the Bentley Historical Library
The majority are tools for working with EAD files, but there are several other useful projects described as well.
Utility tools used in many of the other scripts. They may be necessary for other scripts to run.
- ASpace/python API interface: A python-based convenience interface for interacting with the ASpace API
- EAD utilities: Two convenience classes for working with EAD files. One wraps common functionality used with single EAD files into one place and the other allows easy access to performing actions on entire directories of EADs
- EAD cleanup: Scripts for prettifying the Bentley's EADs
Tools for cleaning and normalizing EAD data
- ASpaceify Extents: A large-scale script to split extents into component ASpace parts, normalize terms, and write them back to their original EAD files
- Authority Reconciliation: Script to reconcile our local controlaccess terms with those in the LCNAF
- Removing unitdates from unittitles: Many unittitles had unitdates embedded in them, which ASpace blindly strips out without regard to context. This script pre-empts that with some slightly more intelligent removal logic.
- Cleaning empty unittitles: Our EADs originally had a number of empty unittitle fields. This fixed them.
- Attribute normalization: Convenience script to normalize attribute values based on given criteria
- Container label normalization: Makes all container labels singular
- Expanding container ranges: Given a single c0x item described with a range of containers (eg boxes 1-10), this script creates individual c0x entries for each item in that original range.
- Extent extraction from unittitles: A number of unittitles included their extents as a parenthetical. This extracts those and creates their relevant extent tags
Tools to map data from one system to another
- Agents from EADs to ASpace: Extracts all agents from a directory of EAD files, generates ASpace-compliant JSON for all of them, then posts that data to an ASpace instance.
- Accessions from BEAL to ASpace: Transforms BEAL accession exports (as exported by Dallas' scripts) into ASpace JSON, the posts the transformed data.
Tools to summarize or characterize specific sets of data
- UMich publications in HathiTrust: Summarizes all U-Michigan publications found in the HathiTrust's digital library by publication series
- Web log exploration: A tool to explore the web logs for our online finding aid collections
- Removable media summaries: Creates a detailed inventory of digital removable media
- Characterizing c0x series paths: Exports a list of all series paths with occurrence counts across all eads (eg Series -> File -> File -> Item)
- Summarizing all tag/attribute value pairs: Exports a list of the counts of all pairs of tag/attribute values found in all eads.
- Self-nesting tag detection: Finds all instances of tags that have the same tag type as a child.
- Missing boxes check: Finds potential missing boxes in a finding aid (for example, if in the entire finding aid there is box 1 and 3 but never a 2, this flags that EAD file)