Skip to Main Content

Special Interest Group Meeting Notes: HRVH 6/20/24

Notes from the most recent meetings of special interest groups at Southeastern

June 20, 2024

The HRVH Users  Group met on June 20, 2024 via Zoom
Topic: Storage and Management of Digital Files

Notes

One organization just starting working with Catalog-It. They have decided to pursue a dual strategy of publishing onto NY Heritage and through Catalog-it. About Catalog-it:

  • Can store high resolution images
  • Automatically resize for delivering to public at lower resolution 
  • Retain file in full resolution 
  • $550 per year (Pricing)
  • https://www.catalogit.app/ 
  • Can do multi-page documents ; not sure how efficient it will be for newspapers

 

Q: When working on digital preservation, do you need to retroactively change all your file names? How often are you checking your digital masters to make sure they’re still good? 

  • File naming can be a challenge.
  • Some are storing lots of files in different locations. File naming can be a challenge to navigate.
  • Files are in google drive, external hard drives. It’s time to merge them. File naming is the biggest challenge. Should we use metadata fields in HRVH - consistent language for format, file name, shelf number, etc. It’s chaos when you have done it in bits and pieces. 
  • Every file name you create should be unique! 
  • Southeastern uses a tool to rename file names in batches: https://www.bulkrenameutility.co.uk/ 
  • One example: had to migrate to Past Perfect from a previous management system. Tens of thousands of digital images are on a server that is nearing end-of-life. There is supposed to be a conversation about what the next steps will be. 
  • Servers have to be replaced. External drives need to be replaced regularly as well. 

 

Organizations invested in creating a lot of digital files (scan everything!) and now faced with storing and managing those files. It’s good to digitize when you have a plan for access + when the original materials are at risk. Practice good selection when you’re considering what to digitize. 

 

One institution is using Flickr to “store” graduation photos. Is that a wise decision? Another organization is also using Flickr to showcase recent library history/events. It’s a good option for items that aren’t a good fit for New York Heritage.

 

One library has a server they are using. They are starting to go through things to determine what is relevant. It became an issue when there started to be a shortage in server space. It’s important to think about how scanning can be scaled up in a manageable way. 

 

Some organizations are using Google Drive for storage. Google doesn’t provide information about the number of files in - or storage size of - directories, which is challenging from a Digital Preservation standpoint. There are add-ons that can do this, but there is concern about using them (potential for malware, etc.).

 

One library is experimenting with Bagger/Bagit to package/store files. 

 

One organization has a collection of digital newspaper articles. Right now they are on google drive and on a server. Looking for ways to make them more available to the public? Currently using them as the source for blog posts.

  • If the articles were pulled from subscription newspaper databases, do they have the rights to publish? Could be helpful to consult with a copyright attorney. The blog posts are likely ok.

 

One organization has a lot of different files originating from different places. Some are emailed images that were shared by the public and not necessarily scans from the collection. Does anyone have a protocol in terms of naming things?

  • One organization uses a serialized approach, with none of the numbers repeating. No descriptive information in the file names.
  • One library – file names are based on what the source material is. If it’s a letter, diary, etc. usually that is included. Have codes for type - doc, photo, etc. Then the subject, then the date. Then page numbers are included for multi-page documents. When there is no date, they are listed as undated. It works right now for rapid reference. 
  • When it comes to file naming there are a lot of nuances and unique needs. However, there are some best practices that you can follow (no special characters other than hyphens and underscores; no spaces).
  • A bulk file renaming program can assist in quickly renaming files, when needed. Southeastern uses this one (but there are others) and have trained some members how to use it. If, in the future, you need more than serialized file names, a file renaming program can easily add the folder name to files.
  • If you have dates in file names use the yyyy–mm-dd format.
  • Should be structured, consistent, and DOCUMENTED!!!! Documentation is very important. 

 

Fixity checking 

  • Fixity checking is something you do to make sure your files haven’t changed, corrupted, moved, or altered in any way

  • You run your files through a program and it assigns a unique number to each file (often referred to as a digital fingerprint). Then you periodically check the files. If a number changes, then you know the file has been altered/corrupted/deleted.

  • One library does fixity checking every month. Everything gets checked over the span of a year. Uses Quick Hash GUIi - https://www.quickhash-gui.org/. Open source, free software. Willing to share procedures, provide a demo.

  • Not a consensus on how often fixity checking should be done. It’s really all over the map on how often organizations do this.

  • If something goes wrong: this is why it’s important to have multiple copies. If a fixity check shows that a file has been changed, you can replace it with a copy.

  • Some of the more expensive digital preservation systems incorporate fixity checking and repair. 

  • Using stand-alone fixity software is not out of reach for smaller institutions. 

 

Southeastern uses Amazon Glacier, which duplicates data and does automatic checking and repairing. Cloud services like Amazon offer this as part of the service. However these services don’t always provide an audit/reports. Amazon might be working on it because so many cultural heritage institutions are using Glacier to support digital preservation.

 

Other than reliability you need to think about: who has access to the data, what are the back-ups, what is the accountability for it? If you’re uploading data to something, you have to think about how it can be recovered. 

 

Financial side: you need to assign a value to your data. How much would it cost to recreate the work that has been done? This can be a way to advocate for the software, hardware, services you need to protect your investment.

 

DPOE-N offers emergency hardware grants. One library was able to get a grant for a hard drive. They also offer professional development grants. https://www.dpoe.network/emergency-hardware-support

https://www.dpoe.network/professional-development-support/ 

They are a great resource for learning about digital preservation. 

 

Helpful class on digital preservation: https://libraryjuiceacademy.com/shop/course/183-introduction-digital-preservation/ 

 

Digital Preservation Handbook is a great resource (includes video tutorials throughout): https://www.dpconline.org/handbook 

 

A good place to start: create a high-level inventory of what you have. This is especially helpful if you have materials in different locations. Record Formats, numbers, etc. This helps you get a sense for how much you have and what you need to do to preserve it. All planning decisions flow from this document. 

 

Be cautious of low cost hard drives. If they seem too good to be true, they probably are. https://www.grc.com/validrive/the-report.htm 

 

Do you store like (physical) materials together or keep items together with their collections? 

  • Good practice to keep collections together. 

 

Would it be good to continue discussions and create a special interest group on this topic? People are interested! Southeastern will create a SIG on Digital Preservation. These will be announced via our newsletter. People who attended the previous SIG will also be notified of the next meeting. 

 

Sign up for Southeastern’s newsletter here: https://airtable.com/appF5045dT9RSe7HP/shrF1StKqcdSVugMT 

Southeastern NY Library Resources Council
21 South Elting Corners Road | Highland, NY 12528
Phone: (845) 883-9065
www.senylrc.org