Skip to Main Content

Special Interest Group Meeting Notes: Digital Preservation SIG 8/13/2025

Notes from the most recent meetings of special interest groups at Southeastern

August 13, 2025

The Digital Preservation SIG met on August 13, 2025 via Zoom
Topic: Archivematica, digital preservation processing/packaging software used at Southeastern

Archivematica Demo

Archivematica is not a digital repository - it doesn't provide storage of digital files. It's used to process and package files before they go to storage.It's built on digital preservation standards. It runs on Linux. It's Open Source and incorporates a couple dozen open source tools

Some of the things it does:

  • Checks files for viruses
  • Duplicates files
  • Preservation reformatting (for select files; MS Word to PDF, for example). The submitted files doesn't change.
  • Assigns a unique ID to each file
  • Identifies and characterizes files (extracts all of the file info and properties embedded within a file)
  • Validates files (are the well formed and not corrupted)
  • Assigns checksums

Processed files are then packaged (in a 7zip file). Archivematica produces an XML file for each submission that includes all of the extracted information about the files, preservation metadata and a brief descriptive metadata.

Q & A and Discussion

Question: Can you clarify why the newer version of the iPhone image file was flagged in Archivematica? 

Answer: Archivematica runs on an older version of Linux because of the complexity of suite of tools. For this demo, Zack just removed that file from the set and reprocessed the remaining files. In real life, he would convert files outside of Archivematica if he had too. Library of Congress Recommended Formats web site is a good source for learning about file formats he's never worked with and also what is the best preservation format.


Question: Related to the previous question. Often times in the Open Source world updates take a toll on the software's performance. How engaged/active is the Archivematica community? Has this been discussed? Is there a space for discussions/questions ("hey, Archivematica isn't able to work with this file type...."). 

Answer: I haven't seen any community discussions about the new iPhone .heic format because it's only been out about a year so archives probably aren't having to preserve them. But the Archivematica community is very active. Sometimes my questions are very unique to my workflow and I don't get any answers. But in other cases, I get a lot of answers and help from the community when it comes to file identification and customizing rules to handle different formats. Archivematica is based on so many different tools (dependencies), but they do a good job of keeping it updated on a regular basis.


Question: You said Amazon Glacier checks the health of the files. Have you ever received notification that something is wrong? 

Answer: There is no reporting from Amazon. Zack does pull things back periodically and test them and he's never had any problem. He has experienced bit rot on our EBS Amazon server (not Glacier - think of EBS more like Google Drive). There is no fixity checking, like there is in Glacier. It hasn't been a lot - 5 or 6 files out of 900,000 (newspaper files). Many cloud storage services are just providing storage. They are not providing monitoring/repair. Glacier, while not transparent (which is a criticisms of using Glacier - Transparency is important in Digital preservation), does offer monitoring/repair and makes a claim that you have a 99.99999999999 chance of getting out what you put in. The monitoring is at the package level. The package is one file and that's what's being checked and repaired (if necessary) from the other copies of the package. 

 

Question: Have any of you seen born digital content being loaded onto NY Heritage or being submitted to the Dark Archive? It's something that I've been thinking about as my organization considers born digital acquisitions, digital archives, digital preservation policies, etc.

Answer: Most of what we take into the Dark Archive were created through digitization projects. There might be some born digital PDFs and audio files. But they are likely coming. Once you all start collecting them, we'll be here to help preserve them.


Question: I was surprised to see that the JPEG in your demo set got converted to a TIFF. I remember that was the default Archivematica rule/process a few years ago. But I thought the community decided to change rule and just duplicate to JPEG because TIFF creates a large file and you don't get any quality from going to TIFF (because it has already been compressed).

Answer: We kept that rule. I felt better about keeping that rule. Amazon Glacier storage is cheap enough that we can support that (and we don't take in a lot of JPEGs anyway.)


After Q and A, Zack mentioned that he recently setup a Network Attached Storage (NAS). It holds two hard drives with 1.8 TB of storage. One drive mirrors the other so if one goes bad, there's another copy. It does check for digital rot. It's more expensive than a regular hard drive, but could be a good solution for you all locally. It connects to our network and I can see when people have changed files. 

Someone set up Synology 220J at home. Even if you're not tech savvy, it's easy to use and they walk you through everything. It does health checks. I've learned a lot about how servers work doing this at home.

Final note about Archivematica: it works much better in a local environment rather than virtual. You're moving a lot of data around. And it's safer because it runs on an older version of Linux.

Question: We really need to be storing files in multiple places, rather than keeping them in one place. I was assured by my IT department that our new cloud storage is secure, but I'm going to put a copy in Southeastern's Dark Archive for more peace of mind. 

Future meeting topic: Web Archiving, Archive-It (and other tools). Wayback Machine can be used to make sure certain web pages are captured before they disappear. 

Southeastern NY Library Resources Council
21 South Elting Corners Road | Highland, NY 12528
Phone: (845) 883-9065
www.senylrc.org