Skip to Main Content

Special Interest Group Meeting Notes: Digital Preservation SIG 7/31/24

Notes from the most recent meetings of special interest groups at Southeastern

July 31, 2024

The Digital Preservation SIG met on July 31, 2024 via Zoom
Topic: Creating an Inventory of Your Digital Content

Meeting Notes

Today’s topic: Creating an inventory! 

 

Inaugural meeting of the Preservation Digital Preservation Special Interest Group. 

Digital preservation can be very overwhelming. But talking through some of the issues, especially with people in the same boat, can help make it easier. That is the goal of this SIG! Suggest we meet every 8-10 weeks to start. 

 

Creating an inventory of your digital content is a good first step on the digital preservation journey. Here’s a short video introduction from the Digital Preservation Coalition (video is also on the Getting Started page of their DigPres Handbook).

 

Use a tool you are already familiar with (doesn’t have to be more complicated than a spreadsheet). Ex. MS Excel, Google Sheets, AirTable, etc. A spreadsheet or database are preferred over a document (Word, PDF), as they allow you to filter your data.

 

Record digital content at a high level - just a snapshot of the what you have, broadly grouped. Go for breadth over depth.

 

Why do we do this? 

  • Great and important management tool. Should be updated as you collect, acquire or act on digital content.
  • Helps with planning what you need to do with certain content and how you are going to do it (software, hardware, storage).
  • Good advocacy tool if you need to communicate needs with decision-makers. Can help them better understand what you are trying to manage and what you need.
  • Provides an internal finding aid to your organization’s digital content.

 

There are templates available on the web. Some basics that people say are good to have: 

  • Name of the group of items (collection name, project name, directory name, etc.)
  • Brief description of the group
  • Current storage location(s) - include if they are available online (NY Heritage, Flickr, local repository, etc.)
  • File formats in that group of items (TIFF, JPEG, PDF, WAV, MP3, TXT, etc.)
  • Number of files in the group
  • Current storage size of the group (MB, GB, TB)
  • Creation date(s) of the digital files. (some people record the dates of the originals, if the files were created by digitizing physical media)
  • Person or Department responsible for the creation and/or management of the files (if applicable)
  • Born-digital resources vs. digitized (born-digital files are generally more at risk and should be prioritized)
  • Broad categories, if applicable: audio, video, images, documents, newspapers, etc.
  • File-naming is really important! You might want to note if file names need to be cleaned-up or edited
  • Copyright issues? If you don’t own the rights to copy/preserve the materials, you might want to document that

 

Southeastern example:

  • Southeastern doesn’t have any digital collections of our own, beyond organizational records. But we assist with the creation (we loan scanners, laptops, hard-drives), uploading and cataloging of archival materials. As a result we have copies of a lot of our members' files.
  • Jen created an inventory of these items that includes information on holding inst, location of material (what computer, hard drive, cloud storage, etc), description of content, dates of the files, number of files, file formats, storage size, any extra notes that could be useful in understanding the materials and working with members to make decisions about what to do with these copies.

 

Discussion:

 

One person has to transfer an entire archival collection to another institution so there has to be a complete finding aid. An inventory would document to a future archives team exactly where everything is located. What’s on paper, what’s been digitized, born digital, etc. However, it does seem that it needs to be more granular in certain cases. Ex: have hundreds of CDs, DVDs, difference between tracking the disk and the digital items on them. In this case, probably need an inventory of everything, no matter what the format is.

 

When you indicate storage, do you indicate storage in all locations?

  • If you have content that is duplicated in multiple locations you should record all locations.
  • Once you make decisions about content (IE pulling files off of disks) that should be recorded on the spreadsheet. 

 

Even if your digital content is already stored in one place, an inventory is helpful because it’s a document that everyone can look at, whether or not they have access to the digital files.

 

One archivist is part of a collaborative group using Archive-it to build an oral history website. Could that be another column in terms of storage. 

  • If anything is available online, include that as well! 
  • Not storing tiffs (or other high res, master files) online necessarily, but content is stored/available online. 

 

It can be paralyzing because you want to do it right. But you should do what works best for your organization! 

 

A lot of digital files are on an external hard drive. Have been considering getting a second one. How do most people store their files and have redundancy for them?

  • Two copies are better than one! 
  • You should be prepared to replace an external drive every few years. 
  • Should have a copy that lives physically someplace else. Example, we store one drive at the library and another drive is held at the local historical society.
  • If you’re doing things under a personal account, if people leave, subscription lapses, etc. their content goes. Ex. document transcriptions were on a personal website and family stopped paying for the domain after the transcriber passed.
  • The golden rule is 3-2-1. Three copies, two kinds of storage media, 1 in a different location. 
  • One organization has a copy on a local server, another copy in drop box, and a third copy in Amazon Glacier. Server is in Ohio in case there is a weather event on the east coast. Good to have it in a different location in case there is a flood, fire, burglary, etc.
  • Southeastern does something similar. The digitized historical newspaper files are on a physical drive at the office, production copies in Virginia (AWS), a third set in Amazon Glacier. 

 

Southeastern’s Digital Dark Archive service uses Amazon Glacier. This could function as your “offsite” copy. 

 

USB is generally pretty universal but zip drive is not the best option for storage.

 

One org has been putting photos and other event photos on flickr. It’s hard to know how to ask the social media team how the items are currently stored. Any suggestions for how to advise people in your organization to store something more efficiently?

  • Another org had a conversation with a communications employee. Anything they’re currently using is theirs to manage. It’s analogous to physical materials – they only go to archives when they’re not needed anymore.
  • Talked about the importance of keeping files organized with descriptive folders, and having a back-up somewhere. 
  • Can’t compel anyone to do anything. You have to build a relationship first, and then they will ask you for advice once they know what you do. 

 

Concerns about digital photography: people take more photographs. People take so many, and you end up with multiple photos with incremental changes. 

  • Is that something that AI could eventually do? Pick the “best” of identical photographs. 
  • This could also be a good use for students / interns. 

 

How do people select file names?

  • It’s a really important thing for preservation. 
  • Good best practices: no spaces, no special characters.
  • Some description of what the item is.
  • Can put a text file in the folder with more description. This is recommended for faculty who are organizing their files. This is helpful since it doesn’t have to all go towards the file name. 
  • If you have to include dates in file names, use yyyy-mm-dd format. It sorts dates really well! 
  • There is a bulk rename utility that can be helpful for renaming files en masse: https://youtu.be/lc-VtNVRrh4?si=n_SfqtNCN_nn5Gsp 

 

One challenge of Google Drive is sharing files outside of your organization. Sometimes it can provide access to items beyond what you initially intended to share. 

  • Sharing links - someone found the link!
  • It’s also a data integrity issue, especially if someone modifies it or deletes it. 

 

Some ideas for future meetings

  • Policies and plans for sustaining the work at an organizational level
  • How to convince people in your organization that this work is important and ongoing
  • File fixity (demo of free tool)! 

 

DHPSNY: has some webinars around this topic. They have archived webinars on their website and on YouTube: https://www.youtube.com/@DHPSNY/videos 

 

DHPSNY also provides Strategic Planning assistance. Strategic plans can and should include digital preservation. Working with DHPSNY on a strategic plan could be a good way to get organizational support for digital preservation.

Southeastern NY Library Resources Council
21 South Elting Corners Road | Highland, NY 12528
Phone: (845) 883-9065
www.senylrc.org