New tool to export photos from Facebook pages and groups

csv_exportA few years ago, Facebook famously started allowing users to download their own data, providing them with a zip file of all their photos and status updates. However, they have never offered such a feature for pages or groups.  The tool developed allows you to download all the photo albums for a page that you like or a group that you manage.  It creates a metadata CSV file for all the photos, and provides you will a script that you can run on your local computer to download all the images.  You can try it out at:

The source code is also available on GitHub.

Summer 2016 course project: born-digital archives

mediaIn this summer’s session of born-digital archives, students have been working on a born-digital archives project, which includes working with records on obsolete media (5.25 diskettes, 3.5 floppies, Zip disks, Mini DV tapes, etc.) as well as inactive records on network storage which originate in a variety of antiquated file formats (e.g., WordPerfect, email in MS Outlook Express format, etc.).  Students are divided into three teams to tackle the project: a Digital Forensics team (working primarily with obsolete media), a Digital Preservation team (working primarily with format migration), and Curation and Description (working primarily on appraisal, arrangement and description).  The collection comes from Pratt School of Information’s own files, and will eventually become available through the School’s on-site archives.

More information can be found in the course syllabus.

New open-source scripts

wordperfectI wanted to go ahead and put out there some new scripts that I have recently developed.  These include:

BagIt Validation Script
For a given directory, this script validates all the “BagIt” bags in it, and send an email to a designated email address with the status of the bags.  BagIt is a standard and a software originally developed by Library of Congress that is used to confirm the integrity of collections of files (e.g., not files deleted, no files tampered with, no files suffering from bit-rot/bit-corruption/etc.).  Written with Python and tested on Windows.

File Normalization tools: WordPerfect to PDF
Doing born-digital archives work almost always seems to turn-up WordPerfect (WPD) files.  This script will go thru a directory, including all subdirectories, and create PDF verisons of all WPD files using MS Word for Windows.  Requires Windows XP+ and MS Word for Windows.

Upcoming course projects (Fall and Summer 2016 semesters)

umaticBelow you will find the upcoming course projects that we be undertaken by my students in the Fall 2016 and Summer 2016 classes:

Fall 2016 – LIS 668-01 Projects in Moving Image and Sound Archives
The course project in this class will involve digital reformatting and exhibiting to the public the public access program Dyke TV, in collaboration with the Lesbian Herstoy Archives. Below you will find some information about the program written by Erica Titkemeyer (2013):

In 1993, Dyke TV began as an access television show created by members of the New York City lesbian community (specifically Linda Chapman, Ana Simo, and Mary Patierno) at Prince St. and Broadway in Manhattan. The purpose was to produce news segments by, for, and about lesbian individuals and communities throughout the United States. The founders more specifically wished to document “rising lesbian activism and to provide a viable platform for lesbian voices to enter the realm of popular culture.” By the time the series came to an end thirteen years later in 2006, the production had reached a total of 78 public access channels , produced at least 322 total shows , and planted its office among the lesbian community in Park Slope, Brooklyn.

The project will involve working with a video collection on U-Matic videotape, which is endangered because of a declining number of units available for playing the format. Past student work digitized from LHA can be found at

New script: Archives Finder

archives_finderRecent initiatives in accessioning born-digital archives have focused on removable media, such as using forensic tools to image media (e.g., 1, 2, 3, 4).  However, there has been little discussion of the born-digital archiving needs of institutional archives.  In institutional settings, terabytes of records with permanent value often reside on large, unstructured network drives, often alongside active records.  For example, a National Archives of the UK blog post mentions that  up to two-thirds of government information is held on unstructured shared drives with some departments holding up to 190 terabytes of information.

Tools to identify batches of inactive records, such as the records of departed staff members or initiatives that have long ended, are often lacking and are designed more for IT departments to manage disk space.  To address this need, I created the script Archives Finder that aims to address some of the issues with existing tools for locating batches of inactive records.  Archives Finder searches across large, unstructured network drives for the largest possible grouping of records that are a given number of years old defined by the user.  It also includes “fuzzy math” feature that allows the user to specify that only a certain threshold of files need to by X years old.  The defaults are 95% of files are 7 years old, but these values can be readily modified.  The results are output as a CSV file that can be readily viewed in MS Excel.

You can download the script at GitHub, which runs on Windows machines.

Summer School!

Oregon TrailHello there.  So this summer I will be teaching Projects in Digital Archives for a fifth year in a row.  This semester, we will be working with a selection of personal materials from Ms. Liza Loop.  Ms. Loop is looking to create the History of Computing in Learning and Education (HCLE) Virtual Museum, and has worked her career in Silicon Valley’s computing industry with an interest in uses of computing for education and learning.

The collection that we will be working with is both born-digital and analog: 5.25 floppy disk, 3.5 floppy disks, Hi8 video and Betamax video (which is the bulk).  Our goal is re-animate these materials using methods relevant to a modern archival environment (e.g., digitizing analog material, imaging obsolete media, making it intelligible/runnable, etc.), and providing value to the HCLE initiative.

Although we will not be working with the Oregon Trail (screenshot above), it is one of the more well known and often remembered educational games.    I also remember playing a lot of Number Munchers…. and Carmen Sandiego (all on the Apple IIe, which may mean that I am really old or that my school was slow to adopt new technology, or both).

You can also download the course syllabus (PDF).