U bent hier
Voortbestaan
New online - March 2023
New online - February 2023
The curse of HTML mail
It’s been most of a year since I last posted here, but I wanted to rant about HTML mail, and this is the right blog for it. People complain about the intrusiveness of Web tracking, but email tracking is even worse. I’ve noticed this especially after subscribing to a couple of Substack newsletters. They’re sent as HTML, and whenever possible, I click the link to the equivalent Web page, which is less intrusive. Every link in a Substack newsletter is a tracking link, with the odd exception of the link to the Substack page.
The links in a Substack newsletter don’t go to the target page but to a Substack redirection URL. Their purpose is to let Substack know about everything you click on. There are no terms or privacy policy in the email telling you what Substack uses the information for.
It has a privacy policy on its website, but there’s no direct way to get to it. The policy says it collects personally identifiable information, including your name, address, picture, and phone number, and shares them with “affiliates.” Other services, such as Mailchimp, do much the same. Some HTML email services put “web bugs,” single-pixel images, into their mail. If your client displays images, the service knows each time you open the message.
The tracking links are tailored to you, so email is less private than opening a page on a site you haven’t logged into.
Tracking links make it difficult or impossible to tell where a link is actually going. Substack links use an encoding that doesn’t show the actual target in plain text, even if you view the message source.
You can read Substack messages as plain text; they’re sent as multipart messages with a plaintext version. With some newsletters, this doesn’t work too badly, but others are so interspersed with long URLs that they’re painful to read.
There is one way email is less bad than websites. Few modern email client applications, if any, will run JavaScript in email. Some early ones did, but opening a message from a malicious spammer and letting it run JavaScript would be a security disaster. If you read your email in a Web client, though, it will usually run its own JavaScript (the client’s, not the sender’s). It could also modify the links to add its own tracking.
The security risks of HTML email are widely known. Before the format was widely used, the idea of spreading malware by email was a joke. Now people are advised not to open email from suspicious-looking senders, with good reason. The battle is lost, and email for personal communication has gone into steep decline.
Thunderbird and some other clients offer “simple HTML” as a compromise. It does basic formatting but doesn’t display images. If you have to open HTML messages, that’s the safest way.
Personally, I view all my email as text when it’s possible. If a message is unreadable that way, I discard it unless it’s really important.
New online - November 2022
New online - October 2022
EAP Cataloguer Vacancy
New online - September 2022
EAP video
New online - August 2022
The Marvels of the Manaki Brothers
Webinars for Applicants – Round 18
West African Manuscripts Crowdsourcing Project Fellowship: Call now open
New online - July 2022
EAP Regional Hub Event at Jadavpur University, 14 September 2022
Job Opportunity
The Secret Service text message situation
The disappearance of the Secret Service’s text messages from January 6, 2021 is a data preservation issue, so I’m briefly reviving this blog from its long sleep to analyze it the best I can.
What we know
“Text messages” sent between Secret Service phones on January 6, 2021, during the unrest in Washington, DC, became unavailable within the bureau. News reporting has gotten so bad that it’s hard to find out just what this means; this CNN article contains more detail than most of the reports I’ve found.
The DHS Inspector General requested text records from the phones of 24 individuals in the Secret Service. These people included the heads of the details for the president and vice president. Only one record was given in response, and the bureau said no additional records were available. Ten phones had metadata indicating the transfer of text messages but didn’t have the messages’ content. On July 20, 2022, the Inspector General announced a criminal investigation into the lost messages.
Secret Service has stated that it lost messages as the result of a “system migration,” which occurred sometime between January 6 and February 26. It further claims that “none of the texts it [the Office of Inspector General] was seeking had been lost in the migration.” In other words, it’s saying there were no lost messages within the investigation’s scope.
Messaging and data retention
That’s not a lot to go on. Depending on whom you believe, we could be looking at anything from inconsequential sloppiness to a deliberate cover-up. But let’s see what we can get out of it.
“Text messages” usually means SMS messaging, but I haven’t found anything that explicitly says so. SMS messages are encrypted, but not end-to-end; they’re vulnerable to man-in-the-middle and spoofing attacks. If Secret Service values the “secret” in its name and it’s guarding against tech-savvy terrorists, I’d think it should use something more secure. But in the absence of other information, I’ll assume SMS. (But see below; iMessage may also have been used.)
A government agency dealing with sensitive data needs a data retention policy. It needs to make sure information doesn’t get lost and doesn’t get into unauthorized hands. The Federal Records Act requires such policies in many cases. SMS messages are normally retained only on the sender’s and recipient’s devices, so a data retention policy needs to focus there. If both the sender’s and recipient’s phones were destroyed and their text messages were never backed up, the data could be gone for good. However, it appears this isn’t what happened.
Data backup prior to migration was left up to individual Secret Service agents. This amounts to no retention policy. Even if everyone made a good-faith effort to do a backup, the saved messages would be all over the place, some of them stored on insecure servers, some irrecoverably lost.
A Washington Post article comments: “Cybersecurity professionals said that policy was ‘highly unusual,’ ‘ludicrous,’ a ‘failure of management’ and ‘not something any other organization would ever do.'” The article suggests some agents may have used iMessage on iPhones rather than SMS. It includes this extremely interesting bit:
In a letter to the House select committee investigating the insurrection, Secret Service officials said they began planning in the fall of 2020 to move all devices onto Microsoft Intune, a “mobile device management” service, known as an MDM, that companies and other organizations can use to centrally manage their computers and phones.
That sounds as if it wasn’t a matter of tossing old phones on the fire but merely installing some new software. A software installation isn’t supposed to wipe out existing data by default. It certainly shouldn’t delete it so thoroughly that forensic software can’t find at least some of the lost data.
The situation invites comparison to Hillary Clinton’s unauthorized use of a private email server for her office as Secretary of State in 2016. Some people overreacted to it, even calling for her execution, but the situations are similar in their failure to handle sensitive government records properly. The present situation is much more likely to involve the actual and possibly deliberate loss of vital information.
There’s a saying: “Never attribute to malice what can be explained by stupidity.” Is the Secret Service message black hole the result of a cover-up or gross negligence? Hopefully we’ll find out soon.
New online - June 2022
(From the Endangered Archives Blog: Lynda Barraclough on histories in peril)
Digitising Haalpulaar Islamic Manuscripts (EAP1245 Project)
(From the Endangered Archives Blog: Lynda Barraclough on histories in peril)
The Argoknot project: JSON song data
I’ve got a new project which I ought to blog about somewhere, and it’s related to file formats, so it’s going here.
There have been projects to archive information about filk songs. They’ve tended toward wikis such as the Filk Discography Wiki, which contains information about filk recordings. Many filk albums have gone out of publication and might otherwise be forgotten, and the wiki keeps them in the cultural memory. Wikis are fine, and they’re easy to participate in with little technical knowledge. They’re also fragile; if the hosting for a wiki goes away, it might find a new home, but it might disappear if no one takes prompt action.
Structured information has advantages. It’s easy for anyone with a little file storage to keep a copy and give it to others. People can create their own repositories, perhaps of songs which they have published. It’s easy to search them and extract information, e.g., all the songs by an author. This isn’t to say that we should abandon wikis, but having structured information as well strengthens the effort. With a little work, it can be fed to wikis.
This is why I’ve created the Argoknot project. It’s a Python-based project to process song data in JSON format. As of this post, it can do one thing: convert CSV files to JSON. I’m planning to add the ability to convert XML files that use the MODS schema. There is a pile of such files in the MASSFILC Filk Book Index.
One of the project’s aims is to create a JSON nomenclature for the filk community. That will let other projects work with the same JSON files to create websites, import into wikis, or do lots of other things.
What I’m doing here is just a start, and it won’t get far without the participation of others. I encourage others in the filk community to join the effort, whether working directly on Argoknot, offering suggestions on how to organize the data, or creating other coding projects.
New online - April 2022
(From the Endangered Archives Blog: Lynda Barraclough on histories in peril)