Written by: Ryan Edge
Primary Source: Digital Scholarship Collaborative Sandbox
In lieu of telling you where this budding Media Preservation program and I are at in our fourth month together, I’m going to share a few basic concepts of the field, before steering slightly toward digital scholarship and a few tools/resources that might excite you. In other words, I’ll keep it light, and will share projects and outcomes at a later date (very soon).
First, some obligatory background: Media Preservation, as a field, is at this time pushing at full steam to migrate vast numbers of analog and physical digital recordings stored on obsolete and endangered formats. Media like these are rapidly deteriorating on the shelves of libraries and archives worldwide, and—like those in our own Special Collections—typically contain rare or unique content of high research value.
The urgency around mass digitization initiatives stems from the very real notion that not all of our AV artifacts can or will be saved. We are fighting against the physical degradation of media objects (e.g. delaminating lacquer discs, shedding magnetic tape), but also factors of technological obsolescence. For obsolete formats, access becomes more difficult and costly as functional playback devices disappear, just as the tools, supplies, and expertise required to sustain these technologies become more obscure and thus more prohibitively expensive. Magnetic tapes (i.e. audio and video tapes), for instance, comprise the majority of AV objects in Special Collections. The consensus among media preservationists is that these formats generally have less than fifteen years left before the two-headed threat of “degralescence” (a term coined by Indiana University’s Mike Casey) renders these media irrecoverable.
And while the colossal demands of uncompressed video data will curb optimism of Moore’s Law regarding digital storage hardware, costs will continue to decrease gradually over time, just as consistent protocol for digital file submission and organization will reduce unnecessary waste. Regardless, digital media are enormously rich resources, in addition to being enormous. We are capturing far more potential research data in these monolithic files than we realize, and this potential will only grow larger in the communities we serve, just as capacities for search, manipulation, and analysis grow.
Whether or not the appropriate tools are ready to meet grand expectations of researchers depends greatly on the nature of the work. Legacy text and statistical data have found a new lease on life through computational analysis in recent years. I realize audiovisual sources will continue to get short shrift next to these classical building block formats in digital scholarship. Yet recorded sound and moving images can contain all these elements and more—it just requires more work upfront to convert the encoded AV information into something researchers can interpret.
Here are some audiovisual-centric tools that I use and suspect some of you have heard of; this list is by no means comprehensive. All of these are free, most are open source. These are command line interface (CLI) tools, unless otherwise noted as having a graphical user interface (GUI).
Access | Play
- VLC Media Player – Audio and video player GUI. This is the most robust media player out there. Leveraging the exhaustive libavcodec library, VLC is capable of handling nearly any format you throw at it. Available for Mac and PC.
Reformat | Edit | Manipulate
- FFmpeg – Comprehensive suite of AV tools: transcoder, editor, player, analyzer, and validator. Converts, records, and plays audio and video of nearly any format. FFmpeg comes bundled with an unparalleled number of codec libraries, as well as ancillary transcoding and authentication functions. Many other well-known software employ FFmpeg (albeit with restrictions), including Handbrake, QCTools, FFmpegX (an outdated GUI, don’t mistake the two), and nearly any other open source or web application that touches audio or video. FFmpeg is my favorite tool and I seem to use it every day; it can perform nearly every function detailed in this list. Available for Mac and PC.
- MPEG Streamclip – Video transcoder, editor, and player GUI. “MPEG” is kind of a misnomer as the tool supports the encoding and export of many other video codecs and formats. Streamclip can help you to quickly trim, divide, and join videos, export audio tracks and individual frames, while also supporting high-quality uncompressed or HD video encoding. Available for Mac and PC.
Analysis | Processing | Validation
- MediaInfo – AV format-specific technical metadata extractor and identifier. Unlike more widely used characterisation tools like JHOVE and Droid, MediaInfo supports the analyzation of components and tags unique to audio and video files. Available for Mac and PC; also available on Mac as a lightweight GUI for a reasonable price (~$2).
- ExifTool – Metadata extractor, identifier, and editor. ExifTool supports many metadata formats, and has been a part of general digital preservation ingest workflows for some time, but has recently increased support for audiovisual files. Available for Mac and PC; GUI available for Windows only.
- QCTools or “Quality Control Tools” – Video quality assurance GUI. Enables visual analyzation of digital video and detection of corruption or visual artifacts. This has been particularly useful for those scanning for interstitial errors post-digitization, but that sells it short. Available for Mac and PC. More details on QCTools, its applications and updates (current version 0.7), can be found through the Bay Area Video Coalition, which developed the tool along with an excellent team of media preservationists. Another excellent BAVC project is the A/V Artifact Atlas, an online resource used to identify and diagnose artifacts and errors in media and analog-to-digital workflows.
- BWF MetaEdit – Metadata embedder, validator, and extractor for Broadcast WAVE Format (BWF) audio files. More details about BWF and metadata chunks can be found through FADGI, which developed MetaEdit with AVPreserve. Available for Mac and PC.
Transcribe | Search
- CMUSphinx – Speech recognition toolkit (developed by Carnegie Mellon University). CMUSphinx’s primary functions include speech transcription, closed captioning, (live) speech translation, and voice search. It also supports keyword spotting, alignment, and pronunciation evaluation. Supports English, French, Mandarin, German, Dutch, Russian, as well as the ability to model others. CMUSphinx is leveraged in many other applications that support voice control. Available for Mac and PC.
Supercuts (for kicks, laffs, yuks)
- Videogrep / Audiogrep – Twin projects by Sam Lavigne, each is essentially a Python script that can automatically assemble a “supercut” when passed a word, phrase, or grammatical structure (e.g. “[gerund] [determiner] [adjective] [noun]”). In the case of Videogrep, this is achieved through searching an accompanying subtitle track (.srt text file). Audiogrep, on the other hand, must first use a component of CMUSphinx (PocketSphinx) to index the speech. The script then crawls the text files and jumps to the corresponding timecode in the video/audio file and stitches elements together. Available for Mac and PC. More on Videogrep and Audiogrep.
Harnessing information contained in audio, video, and other rich ancillary bitstreams will never be as straightforward as, say, text mining. In fact, audiovisual search is facilitated by textual data, most often as a sidecar metadata file containing transcribed speech anchored by timecodes. So if you want to search spoken word recordings, you must first index the audio, alternatively by hand or speech-to-text software (or by employing a combination of the two). Speech-to-text and image recognition software is gradually improving, but certainly has not permeated the digital scholarship arena. AV data has not yet scale up to meet most researchers’ expectations of “distant viewing,” but there are many applications (above) and services that are making strides today. (An aside: it’s perversely beautiful, in a way, that the subtle nuances of recorded sound and human speech continue to elude algorithmic recognition, despite all advances in communication.)
I’d be happy to answer any questions you may have, or to discuss any projects that might come to mind in these fields of media!
Ryan Edge, your Media Preservation Librarian