In this summer’s session of born-digital archives, students have been working on a born-digital archives project, which includes working with records on obsolete media (5.25 diskettes, 3.5 floppies, Zip disks, Mini DV tapes, etc.) as well as inactive records on network storage which originate in a variety of antiquated file formats (e.g., WordPerfect, email in MS Outlook Express format, etc.). Students are divided into three teams to tackle the project: a Digital Forensics team (working primarily with obsolete media), a Digital Preservation team (working primarily with format migration), and Curation and Description (working primarily on appraisal, arrangement and description). The collection comes from Pratt School of Information’s own files, and will eventually become available through the School’s on-site archives.
More information can be found in the course syllabus.
I wanted to go ahead and put out there some new scripts that I have recently developed. These include:
BagIt Validation Script
For a given directory, this script validates all the “BagIt” bags in it, and send an email to a designated email address with the status of the bags. BagIt is a standard and a software originally developed by Library of Congress that is used to confirm the integrity of collections of files (e.g., not files deleted, no files tampered with, no files suffering from bit-rot/bit-corruption/etc.). Written with Python and tested on Windows.
File Normalization tools: WordPerfect to PDF
Doing born-digital archives work almost always seems to turn-up WordPerfect (WPD) files. This script will go thru a directory, including all subdirectories, and create PDF verisons of all WPD files using MS Word for Windows. Requires Windows XP+ and MS Word for Windows.
Below you will find the upcoming course projects that we be undertaken by my students in the Fall 2016 and Summer 2016 classes:
Fall 2016 – LIS 668-01 Projects in Moving Image and Sound Archives
The course project in this class will involve digital reformatting and exhibiting to the public the public access program Dyke TV, in collaboration with the Lesbian Herstoy Archives. Below you will find some information about the program written by Erica Titkemeyer (2013):
In 1993, Dyke TV began as an access television show created by members of the New York City lesbian community (specifically Linda Chapman, Ana Simo, and Mary Patierno) at Prince St. and Broadway in Manhattan. The purpose was to produce news segments by, for, and about lesbian individuals and communities throughout the United States. The founders more specifically wished to document “rising lesbian activism and to provide a viable platform for lesbian voices to enter the realm of popular culture.” By the time the series came to an end thirteen years later in 2006, the production had reached a total of 78 public access channels , produced at least 322 total shows , and planted its office among the lesbian community in Park Slope, Brooklyn.
The project will involve working with a video collection on U-Matic videotape, which is endangered because of a declining number of units available for playing the format. Past student work digitized from LHA can be found at http://herstories.prattinfoschool.nyc.
Continue reading “Upcoming course projects (Fall and Summer 2016 semesters)”
Recent initiatives in accessioning born-digital archives have focused on removable media, such as using forensic tools to image media (e.g., 1, 2, 3, 4). However, there has been little discussion of the born-digital archiving needs of institutional archives. In institutional settings, terabytes of records with permanent value often reside on large, unstructured network drives, often alongside active records. For example, a National Archives of the UK blog post mentions that up to two-thirds of government information is held on unstructured shared drives with some departments holding up to 190 terabytes of information.
Tools to identify batches of inactive records, such as the records of departed staff members or initiatives that have long ended, are often lacking and are designed more for IT departments to manage disk space. To address this need, I created the script Archives Finder that aims to address some of the issues with existing tools for locating batches of inactive records. Archives Finder searches across large, unstructured network drives for the largest possible grouping of records that are a given number of years old defined by the user. It also includes “fuzzy math” feature that allows the user to specify that only a certain threshold of files need to by X years old. The defaults are 95% of files are 7 years old, but these values can be readily modified. The results are output as a CSV file that can be readily viewed in MS Excel.
You can download the script at GitHub, which runs on Windows machines.
This semester is off to a nice start. In LIS 665 Projects in Digital Archives, students will be working to arrange, describe and digitize portions of a collection of architectural photography (with some landscape and craft photography) donated to the School by the estate of Bill Maris. You can checkout the finding aid created by students last Fall here. One of Maris’ digitized photograph is shown here, depicting President Jimmy Carter making furniture.
In LIS 625 Management of Archives and Special Collections, students will continue arranging and describing a collection of records on the history of the school. The finding aid created by students last semester can be found here. You can find both course syllabi below:
Syllabus – LIS 665 Projects in Digital Archives
Syllabus – LIS 625 Managemenet of Archives & Special Collections
[Update 8/6/17 – The book is now for sale at the SAA bookstore!]
I am pleased to announce that I am working on a new book project titled Moving Image and Sound Collections for Archivists to be published by the press of the Society of American Archivists.
Most archivists encounter and most archives contain some form of moving image and sound material. These can include recordings of events on video, oral histories captured on audiotape, and films created by independent filmmakers. The purpose of this book is to provide practical guidance to the archivist on how to preserve and make accessible the moving image and sound record. Although the moving image archivist may find value in this book, it is specifically targeted at the general archivist who may deal primarily in paper-based collections and need additional guidance or the student archivist with interest in building-out this expertise.
Continue reading “New book project: Moving Image and Sound Collections for Archivists”
I had a hunch that the webpages were deploying text less text than the used to. I put together a study that looks at the use of text on webpages since 1999, using the Internet Archive’s archived webpages in the WayBackMachine. I found that indeed there has been a decline, beginning around year 2005.
You can read the paper online at Information Research:
The rise and fall of text on the Web: a quantitative study of Web archives
Update (Oct 17, 2015): I have also blogged about this study on the CILIP blog and the Web Archives for Historians blog.
Anyone who has ever taken one of my classes knows that I like Omeka. It is very easy to use, and has many nice features. However, one limitation is that it can be difficult to do rapid data entry on many items. For example, if you use the Dropbox plugin to bring in several hundred pictures, and you want to quickly add metadata for them, or if some of the metadata fields are exactly the same (like author of photographer), there is no way (at least that I am aware of) to easily add this data. Thus, I created what I call the quick metadata entry form, which allows you to update Title, Description, Date, Creator, and Rights fields, and allows you to make all values in a collection the same for a given field. Unfortunately, it was not developed as an Omeka plugin.
Continue reading “Quick Metadata Entry in Omeka”
In this blog post, I am going to offer a way to extract large batches of email newsletters from Constant Contact for the purposes of creating email archives, resulting in each message as a PDF.
First, some background. I have recently finished an email archiving project for the History & Archives of Front Runners New York. The club used to snail-mail newsletters since the early 1980s, but transitioned to email newsletters around 2004, and has been using Constant Contact since 2007 for its newsletter software. They had managed to retain all the messages in Constant Contact, however, not all the embedded images.
Constant Contact does not have an easy way to export sent messages in bulk. Thus, I created a script that leverages the Constant Contact API to export messages and the related metadata. It creates a PDF, first including a full-length image of the email message, followed by a JSON export of the message metadata, and complete with text-version of the email message (if available). This allows for the look of the message to be retained, but also text-searchable.
Continue reading “Archiving Email Newsletters, or getting your Newsletters out of Constant Contact”
The Fall semester is just right around the corner and I thought I would share my upcoming course projects. This semester’s LIS 668 Projects in Moving Image and Sound Archives we will be working on digitizing, curating, and making available a collection of video and sound recordings around the topic of Women, AIDS, and ActUP, in collaboration with the Lesbian Herstory Archives. In LIS 665 Projects in Digital Archives, we will continue an oral history digitization and curation project with the Archives of the Center for Puerto Rican Studies at Hunter College, CUNY, around the topic of the Puerto Rican diaspora. And lastly in LIS 625 Management of Archives and Special Collections, we will work on an archives processing and exhibition around the history of Pratt SILS, which is celebrating its 125th Anniversary this year. Feel free to download the syllabi:
LIS 625 Management of Archives and Special Collections
LIS 665 Projects in Digital Archives
Lis 668 Projects in Moving Image and Sound Archives