Recent initiatives in accessioning born-digital archives have focused on removable media, such as using forensic tools to image media (e.g., 1, 2, 3, 4). However, there has been little discussion of the born-digital archiving needs of institutional archives. In institutional settings, terabytes of records with permanent value often reside on large, unstructured network drives, often alongside active records. For example, a National Archives of the UK blog post mentions that up to two-thirds of government information is held on unstructured shared drives with some departments holding up to 190 terabytes of information.
Tools to identify batches of inactive records, such as the records of departed staff members or initiatives that have long ended, are often lacking and are designed more for IT departments to manage disk space. To address this need, I created the script Archives Finder that aims to address some of the issues with existing tools for locating batches of inactive records. Archives Finder searches across large, unstructured network drives for the largest possible grouping of records that are a given number of years old defined by the user. It also includes “fuzzy math” feature that allows the user to specify that only a certain threshold of files need to by X years old. The defaults are 95% of files are 7 years old, but these values can be readily modified. The results are output as a CSV file that can be readily viewed in MS Excel.