Publishers: Be sure to create a media archive before it’s too late

If you publish online, whether it’s a blog, or a report, or a guest column, or as a journalist, be sure to create a media archive before it’s too late. Most of it will eventually be lost when the publisher migrates to a new CMS, gets acquired, or goes out of business.

Once that happens, there’s very little chance your content will live on somewhere else. I’ve been working in media for more than 3 decades and almost nothing from the 1990s or 2000s that I worked on still exists online. This includes magazine articles, newspaper articles, blog posts, columns, and special reports. Some of this content even won awards from SABEW and other organizations.

LexisNexis once had some print articles as part of its research services, but that’s a closed system. Internet Archive does some great work archiving old media sites, but the quality is uneven – images and dynamic content is frequently broken, and even text content may not be preserved if IA’s crawlers missed the URL.

Internet Archive faces an uncertain future itself in light of its recent courtroom losses, although I suspect if the judgement bankrupted the organization a Silicon Valley angel, government agency, or private firm would swoop in to save it or purchase the assets (I hope a beneficial service, as opposed to a data broker or other parasite).

The article below dates from my time as Managing Editor of The Industry Standard from 2008-2010. When IDG shuttered that initiative in the wake of the 2008 financial crisis, the website went dark (and I ended up at MIT). Hundreds of articles were migrated to other IDG sites but those media archives were wound down or lost. The Internet Archive has some of the articles, but the images and dynamic content is broken.

I saved PDF copies of these articles, though … anyone need some LLM training data?

Industry Standard 2008 media archive

Leave a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.

Scroll to Top