The challenges of preserving digital content

A problem archivists have been bringing up for a while now is that with the majority of content going digital and the pace of change in storage mechanisms and formats, it’s becoming harder to preserve content even when it is not what would be considered old by the standards of other historic documents created by humanity.

Case in point – the efforts required to preserve even recent movies as described in this article on IEEE Spectrum. As the article mentions, we’ve already lost access to 90% of US movies made during the silent area and about 50% of movies made before 1950. I suspect that the numbers for the European film industry might be even worse thanks to World War 2. However, keep in mind that those are numbers for movies stored on a more durable medium (and yes, I know that the early nitrate film is about as flammable as they come).

One of the moist poignant quotes, regarding Pixar’s challenge when trying to re-render Finding Nemo in 3D nine years after it’s initial release:

The fact that the studio had lost access to its own film after less than a decade is a sobering commentary on the challenges of archiving computer-generated work.

Even consumer households will face the same issue sooner or later when it comes to preserving family photos, home movies and other digital content. Just looking aroud my office, there are terrabytes of data floating around, a fair number of photographs and video. And of course this blog doesn’t have a dead tree version either, nor have I ever had an article published in a software development magazine. Not to mention that the ones I would have like to publish in (C/C++ User’s Journal and Dr Dobbs) are now dead and gone as well. Their paper products are hopefully still being preserved, but how long are we able to read their digital archives?

If we had infinite space to store physical objects we could try to preserve the computers the content is stored on, but that’s not realistically possible either.

For me personally, I am trying to make sure that the relatively recent content migrates with me when I update my computers and remains accessible. So far I’ve been lucky that my approximately 10 years of digital photos are all still usable. Even with that precaution in place I will try to shoot more film again, but only after checking that my decade-old Minolta film scanner is still working. Fortunately I also own a flatbed scanner that can scan film up to 5″x4″ and as long as it remains functional and software like Hamrick’s Vuescan supports it, I should be OK with a dual analog/digital strategy. Plus places like Freestyle photo seem to be able to supply more film that’s been reissued these days compared to a few years ago.

Mutt regex pattern modifiers

I still use the mutt email client when I’m remoted into some of my FreeBSD servers. It might not be the most eye pleasing email client ever, but it’s powerful, lightweight and fast.

Mutt has a very powerful feature that allows you to tag messages via regular expressions. It has a couple of special pattern modifiers that allow you to apply the regex to certain mail headers only. I can never remember so I’m starting a list of the ones I tend to use most in the hope that I’ll either remember them eventually or can refer back to this post. The full documentation can be found here, so this is only a cheat sheet that reflects my personal usage of the mutt regex pattern modifiers.

~f – match from
~t – match to
~c – match cc
~C – match to or cc
~e – match sender

Take the 10% code reduction challenge!

It might sound paradoxical, but in general, writing more code is easier than writing less code that accomplishes the same goals. Even if your code starts out clean, compact and beautiful, the code that is added later to cover the corner cases nobody thought of usually takes care of the code being well designed, elegant and beautiful. Agile programming offers a solution, namely constant refactoring, but who has time for that? That’s why I occasionally give myself the 10% code reduction challenge and I encourage you to do the same.

Read More

Timo’s occasional link roundup, late July edition

A couple of interesting articles about debugging. Debugging doesn’t seem to get a lot of attention when people are taught about programming, I assume you’re supposed to acquire this skill by osmosis, but it is actually one of those skills that should receive much greater attention because it’s one of those that separates highly productive developers from, well, not so productive ones.

Why I’m Productive in Clojure. I’ve long been a fan of Lisp and Lisp-like languages, even though I wasn’t originally that happy with having Lisp inflicted on me when I was at university. Because it was weird and back then I didn’t much appreciate non-mainstream languages. These days I do because that’s where you usually find better expressiveness and ideas supposedly too strange for mainstream languages. I guess that makes me a language hipster.

And while we’re on the subject of lisp-like languages – I’ve never heard of Julia, but this blog post made me wonder if it should be on my list of languages to look at.

We have a Nest thermostat and I wasn’t too keen when I heard that Google bought them. Probably have to look into securing it (aka stopping the data leakage). While I understand the trade “your data for our free service” model from an economics perspective, I do take some issue with the “we’ll sell you a very expensive device and your data still leaks out to us” model. Nests aren’t exactly cheap to begin with.

Debugging on a live system that’s shouldn’t be live. Been there done that, on a trading system…

Netflix and network neutrality, as seen from the other side. I’m an advocate of regulating ISPs (especially the larger ones) as public utilities and essentially enforcing network neutrality that way. Netflix obviously has been going on about network neutrality for a while now but the linked article does make me wonder if those supposed “pay to play” payments were actually more like payments for the server hosting. You know, like the charges that us mere mortals also have to pay if we want to stick a server into someone’s data centre.


Someone is building a BBC Micro emulator in Javascript

For those of us who remember when the BBC Micro was the home computer with the fastest Basic implementation available, a long time ago, and was pretty legendary in home computing circles in Europe. It didn’t sell that much outside of the UK, mostly because of its price. It was also the target system for the original implementation of Elite. Matt Godbolt is building an emulator in JavaScript. First post of his series can be found here.

It’s amazing how far we have come since I started playing with computers, yet they’re still not fast enough.

Accessing the recovery image on a Dell Inspiron 530 when you can’t boot into the recovery partition

My hardware “scrap pile” contained a Dell Inspiron 530 – not the most glamorous of machines and rather out of date and old, too, but it works and it runs a few pieces of software that I don’t want to reboot my Mac for regularly. Problem was, I had to rebuild it because it had multiple OSs installed and none of them worked. Note to self – don’t mix 32 and 64 bit Windows on the same partition and expect it to work flawlessly.

I did still had the recovery partition, but it wasn’t accessible from the boot menu any more. Normally you’re supposed to use the advanced boot menu to access it. I couldn’t figure out how to boot into it. There is a Windows folder on the partition, but no obvious boot loader. I also didn’t want to pay Dell for a set of recovery disks, mainly because those would have cost more than the machine is worth to me.

Poking around the recovery partition showed a Windows image file that looked it contained the factory OS setting – its name, “Factory.wim” kinda gave that away – and the necessary imaging tool from Microsoft, called imageex.exe.

All I needed was a way to actually run them from an OS that wasn’t blocking the main disk, so I grabbed one of my Windows 7 disks, booted into installation repair mode and fired up a command prompt.

After I made sure that I was playing with the correct partition, I formatted the main Windows partition and then used imageex to apply Factory.wim to the newly cleansed partition. This restored the machine to factory settings even though I hadn’t been able to boot into the recovery partition to use the “official” factory reset.

Oh, and if the above sounds like gibberish to you, I would recommend that you don’t blindly follow these vague instructions unless you want to make sure you’re losing all the data on the machine.

As a bonus task, you also get to uninstall all the crapware loaded on the machine. Fortunately it looks like everything can be uninstalled from the control panel. While you’re installing and uninstalling, make sure you update the various wonderful pieces of software that come with the machine as they’ll be seriously outdated.

Setting up the Ghost blogging system on FreeBSD

Ah, a meta blogging post. Sorry, I try to keep these to a minimum…

For those who haven’t been caught up in the hype yet, Ghost is a new blogging system that is much more minimal than WordPress and the other more popular systems. It’s designed to be much smaller and faster (plus it uses a lot of cool tools like node.js, handlebars etc).

I recently tried to set up the 0.3.3 release on FreeBSD and overall it was straight forward. Node.js is available as a port – just make sure that you’re installing the regular node port instead of the node-devel port as the latter will install node 0.11 and Ghost wants to use 0.10.

The only hiccup I encountered was that building the sqlite node module failed, but this post suggested an appropriate workaround by pointing npm at the existing install of sqlite.

First impressions are very good, populating the Ghost blog from this WordPress installation was very easy using the Ghost plugin for WordPress. The contents made it over OK, however Ghost doesn’t support comments, thus the export and re-import loses the existing comments. I’d also have to integrate an external comment system.

Overall I’m pretty excited about Ghost as a blog platform so I’ll be keeping an eye on it. For now though this blog will stay on WordPress.

About a month without Google Reader

As a bit of an RSS junkie – see previous post – I had to go look for alternatives to Google Reader. I’ve been a feedly user on and off for a few years but I was never that taken with it. It does seem to do mostly do what it says on the tin and having various tablet apps available for feedly is a good thing, but it tends to run into a few issues with high-volume feeds (craigslist feeds, I’m looking at you). Mind you, the reoccuring Craigslist feed issue seems to be more of an issue with Craigslist themselves than feedly.

In the end I decided to do with a self-hosted installation of Tiny Tiny RSS. The UI isn’t as spectacular as feedly’s – OK, out of the box it’s downright boring, and I haven’t got around to playing with the themeing yet – but it does what it says on the tin, hasn’t got problems with craigslist feeds and so far just works. Importing the OPML from Google Reader was a snap once I figured out how to export it from Google Reader so as a bonus I didn’t have to mess with the folder structure that I had built up over time. Plus, as I self-host it, as long as I’m paying my hosting bill I don’t have to worry about the service being discontinued.