U bent hier

Voortbestaan

Sibelius

Obsolete Thor - 12 april 2024 - 7:57am

Music notation software is among the earliest software for desktop computers. SCORE in 1987, Finale came around in 1988, Capella in 1992, and Sibelius in 1993. Many others came and went during this time. Music notation software was so much more than the typical word processing or desktop publishing system. Specialized fonts were needed to display the music notation and there are many other variables for different instruments allowing individuals and others the ability to create complicated compositions in an inexpensive way.

Sibelius [SI] + [BAY] + [LEE] + [UHS] was originally developed for the Acorn system in 1986, then released on Windows and Macintosh in 1998-99. The software became very popular and in 2006 was purchased by the software giant AVID. The software was used enough to get a preservation assessment by the British Library in 2017 and draft status format description by the Library of Congress, written by the amazing Ashley!

Both reviews of the format emphasize the proprietary nature of the file format which has been used since the early versions. Aside from the early Acorn release, the Windows and Macintosh versions used a binary format with the SIB extension. They are actually quite easy to identify.

hexdump -C Sibelius-s01.sib | head 00000000 0f 53 49 42 45 4c 49 55 53 00 00 40 00 02 00 a4 |.SIBELIUS..@....| 00000010 f1 ed 00 00 00 30 00 00 00 02 00 00 00 01 00 00 |.....0..........| 00000020 00 2a 00 00 00 00 00 00 00 00 0f 53 49 42 45 4c |.*.........SIBEL| 00000030 49 55 53 00 00 40 00 02 00 00 00 00 00 00 00 3a |IUS..@.........:| 00000040 00 00 00 00 38 a1 28 06 b3 d2 2f 66 03 04 16 4e |....8.(.../f...N| 00000050 5f 5c 8d f3 95 27 3e f1 2a 1b 68 de 08 81 e8 9a |_\...'>.*.h.....| 00000060 ea 1c bf dd 54 0e 92 8d 4d be e3 34 ed 42 78 36 |....T...M..4.Bx6| 00000070 d2 e1 67 7b 8d f7 98 6a 3a 70 c4 8b 0b 08 7b 26 |..g{...j:p....{&| 00000080 f9 45 00 00 00 00 48 71 7c 4c 98 df 0b 38 7d 9d |.E....Hq|L...8}.| 00000090 2a 2d 84 9c a4 39 0f 4d da a2 cc 97 ad 3d b0 55 |*-...9.M.....=.U|

This is exactly how PRONOM and other identification methods determine they are Sibelius files. PRONOM has assigned the format fmt/696 and is looking for the hexadecimal bytes 0F534942454C495553.

The problem with this identification method is that all the Sibelius files are identified as such, regardless of version. As mentioned by Ashley, version of the software used is highly important as new features were added all the time making backwards compatibility difficult. Add in the fact that there were different releases for each version which would limit these features even more and I can see how a musician could get very frustrated. If you created a score in Sibelius 5 and tried to open in Sibelius 5 Student version, you may find your composition lacking in many ways. The only way to avoid compatibility issues is to always open in the latest “Ultimate” version. Sibelius Ultimate can open all versions of the SIB format back to version 2. The software even has an export feature which allows you to export back to a previous version stripping what is necessary to ensure compatibility.

Sibelius export to previous version

For those with a bunch of SIB files in their archives, how would you know which software version created the file? Well lets take a closer look at the bytes and see if we can find some patterns. Let it be known, I am not reverse engineering the format, just looking for patterns which will allow for proper identification!

I am not the first person to ask this question, many others want to know the versions of their SIB files. Thankfully others have found some clues on which bytes hold the version information. It seems we can determine the version based on 4 bytes shortly after the SIBELIUS string. Specifically bytes 10-13.

hexdump -C Sibelius2-s01.sib | head 00000000 0f 53 49 42 45 4c 49 55 53 00 00 08 00 22 00 47 |.SIBELIUS....".G| 00000010 98 4c 00 00 00 3a 00 00 00 00 4e 81 49 34 41 2c |.L...:....N.I4A,| 00000020 fa 76 62 f9 71 53 a9 93 0f 54 1e 20 6c 63 61 4d |.vb.qS...T. lcaM| 00000030 f7 b2 b0 a7 5d bd 82 3a 0d 86 02 8b f2 89 d2 a0 |....]..:........| 00000040 83 1f 8d e0 37 1b ed 1c 6a 8b 82 08 4b 6d 64 60 |....7...j...Kmd`| 00000050 71 59 e8 aa ef b1 3c df 5c 25 0a 9f 66 50 69 de |qY....<.\%..fPi.| 00000060 2a d3 4e 2a cd 97 88 06 67 5f 50 64 0f 8f 86 2b |*.N*....g_Pd...+| 00000070 08 0d 3f f7 80 26 e0 63 f6 7d 4e f8 e7 c0 3f fc |..?..&.c.}N...?.| 00000080 7a 77 ea b3 4a b9 30 59 13 47 6e 09 0a 0b ae 3c |zw..J.0Y.Gn....<| 00000090 c1 93 85 f6 41 f8 58 22 4b 92 35 3f b2 f5 3f 9d |....A.X"K.5?..?.|

From what others have gathered and updating it with more recent versions I have come up with a list.

VersionHex 10-13Sibelius 1.200 00 00 0ESibelius 2.x00 08 xx xx Sibelius 3.x00 0A xx xx Sibelius 4.x00 1B xx xx Sibelius 5.000 2D 00 03 Sibelius 5.100 2D 00 0D Sibelius 5.2.x – 5.400 2D 00 10 Sibelius 6.0.x00 36 00 01 Sibelius 6.100 36 00 17 Sibelius 6.200 36 00 1E Sibelius 7.000 39 00 0C Sibelius 7.0.1 – 7.0.200 39 00 0E Sibelius 7.0.300 39 00 13 Sibelius 7.1.000 39 00 15 Sibelius 7.1.2 – 7.1.300 39 00 16 Sibelius 7.5.x00 3D 00 0E Sibelius 8.0.0 – 8.0.100 3D 00 10 Sibelius 8.1.x00 3E 00 00 Sibelius 8.200 3E 00 01 Sibelius 8.300 3E 00 02 Sibelius 8.4.x00 3E 00 06 Sibelius 8.5.x00 3E 00 07 Sibelius 8.6.x, 8.7.0, 8.7.100 3F 00 00 Sibelius 8.7.2, 2018.1, 2018.4.x, 2018.5, 2018.6, 2018.700 3F 00 01 Sibelius 2018.11, 2018.1200 3F 00 02Sibelius 2019.100 3F 00 04Sibelius 2019.4.x, 2019.5, 2019.7, 2019.900 3F 00 06Sibelius 2019.1200 3F 00 07Sibelius 8.6-2019.1200 3F 00 0ASibelius 2020.100 3F 00 0BSibelius 2020.3, 2020.600 40 00 01Sibelius 2020.900 40 00 02Sibelius 2022.500 40 00 03Sibelius 2022.1100 41 00 02Sibelius 2022.1200 42 00 00Sibelius 2023.300 42 00 01Sibelius 2023.800 43 00 07Sibelius 2024.3.100 44 00 01

That is a lot of versions and I feel there may be some gaps that still need to be identified. It appears that the first two bytes are the major version and the second set of bytes is the minor version. Although it looks like a few major version bytes span across a few software versions. With this chart, one could be very specific in identifying which Sibelius version wrote the file, but for archiving purposes it seems we can group many of these capturing just the major version. The export screenshot above seems to have broken down significant changes and grouped similar formats together, the biggest being 8.6 through 2019.12. A comparison of “student” and “first” formats don’t have any obvious bytes which indicate as such, so for now they are all lumped together.

There is one other similar format which needs to be mentioned. Sibelius Scorch was a product made to share scores online. This has been replaced with Sibelius Cloud Publishing, but for awhile was the best way to share a score with others in a way that protected the original. I have no idea how they were made, but sites like scorestreet.net and sibeliusmusic.com were sites you could upload your score to for sharing. Some SCO files appear to have a PDF embedded within them for proper printing.

hexdump -C smd_h_0000000000097761.sco | head 00000000 0f 43 43 53 43 4f 52 43 48 00 00 36 00 1e 00 c0 |.CCSCORCH..6....| 00000010 d4 55 00 00 00 30 00 00 00 01 00 00 00 01 00 00 |.U...0..........| 00000020 00 22 0f 43 43 53 43 4f 52 43 48 00 00 36 00 1e |.".CCSCORCH..6..| 00000030 00 00 00 00 00 00 00 3a 00 00 00 00 03 56 11 b9 |.......:.....V..| 00000040 70 dc fe 90 50 48 30 df eb 39 88 23 8e 88 78 bf |p...PH0..9.#..x.| 00000050 da ab ab 5b e2 13 98 89 66 eb 94 67 8d 16 00 00 |...[....f..g....| 00000060 00 00 cf 6f 0c 67 85 ec 57 90 e5 c1 ea 8a eb 9f |...o.g..W.......| 00000070 c8 13 d2 1d 75 bd a5 9f eb b9 ef 1d 25 79 45 2c |....u.......%yE,| 00000080 05 bb 74 41 e8 8f 27 6a 01 07 d0 f5 3b 17 ce 87 |..tA..'j....;...| 00000090 7b c2 82 d9 41 6b 82 2f d8 b8 17 32 fa d3 59 05 |{...Ak./...2..Y.|

I am not sure the best way to handle all the different versions within the PRONOM registry. I went ahead and made a few signatures based on the export dialog of Sibelius 2024. Even with combining a few together, it leaves us with 17 new PUID’s. Maybe further discussion can refine these down a bit more? Regardless, each file can be associated with a specific Sibelius version, making it easier to open and migrate if needed without fear of opening in the wrong version. Take a look at some samples and my signatures on my GitHub page and let me know if there is a better way.

Shorten

Obsolete Thor - 5 april 2024 - 5:30am

I was recently going through some of my old CD-R’s and came across this 11 year old fun memory.

I remember going to this 2003 Toad the Wet Sprocket concert in Salt Lake City with some friends, I had seen this band perform before, but this was the first time I was able to get a recording of the show. Normally having a recording of a concert of a well known band was a little shady, but for some bands, they not only allow recording of their live concerts, but they encourage it. There has been a few bands over the years who have this philosophy, one most have heard of is the Grateful Dead, because of all the tape trading, the band’s numerous concerts will live on forever.

The scene of recording concerts is still alive and well, and if you are into recording and sharing it is expected you share in a lossless audio format. The world of lossless audio is definitely in the minority of all those who listen to music on the daily. Most of us have been placated with the infinite playlists on services like Apple Music, Spotify, and Amazon Music. Most probably don’t care about owning music anymore, but for the few who consider themselves Audiophiles, having a lossless audio file is the only choice.

When it comes to formats, there are a few lossless formats to choose from, they all come with some advantages as well as some downsides. WAV files contain the full PCM audio stream, and while internet bandwidth today can handle full uncompressed audio, it can still be beneficial to use some compression for archiving or sharing over the web.

The most common lossless format today is the Free Lossless Audio Codec or FLAC, but there are also quite a few who like the Apple Lossless Audio Codec. Both offer many advantages, especially with metadata, cuesheets, and can contain cover album art. But many years ago another lossless format was most often used with bootleg recordings and audio sharing.

Shorten was one of the first lossless formats, developed by Tony Robinson in 1993 for SoftSound. It could cut the size in half of a typical 16-bit WAV file. It achieved this by using Huffman coding, kinda the same way a JPEG works, by reducing the frequency of how often patterns occur. Today FLAC and ALAC have replaced this format and offer improved features and support. Many audio players have dropped support for shorten making it difficult to use this old format.

The Shorten format uses the .SHN extension. It is one of the formats listed on the Library of Congress Sustainability of Digital Formats with the ID fdd000199, although a couple links don’t appear to work as it hasn’t been updated since 2011. Support was ended for this format and many of the links found on various websites are for broken, usually referencing the etree wiki. Much of which is archived on the Internet Archive.

Let’s take a look at the what makes up a lossless compressed SHN file. A quick look at a sample header:

hexdump -C test.shn | head 00000000 61 6a 6b 67 02 fb b1 70 09 f9 25 59 52 a4 d1 a8 |ajkg...p..%YR...| 00000010 dd cf 85 5a 01 57 a0 d5 a8 b6 6b 6d d2 41 10 80 |...Z.W....km.A..| 00000020 40 20 10 18 04 0a 01 44 d6 40 20 11 0d 8c 0a 01 |@ .....D.@ .....| 00000030 04 80 44 20 16 4b 0d d2 c3 b8 f8 55 a0 11 80 59 |..D .K.....U...Y| 00000040 98 56 1d b1 79 51 9f 39 f1 12 d2 d3 75 5c cd 08 |.V..yQ.9....u\..| 00000050 06 25 68 6b 52 5e 9f 4c 39 cd c1 32 c4 0d a9 b7 |.%hkR^.L9..2....| 00000060 69 34 56 f0 96 fa 46 89 a2 6e 8c ba d5 d0 58 de |i4V...F..n....X.| 00000070 f5 44 5b aa 61 82 c7 85 88 37 d6 ee cb ab 4e 44 |.D[.a....7....ND| 00000080 91 19 b7 38 d4 20 ae 98 98 d1 2c 4a 4e 88 dd 3e |...8. ....,JN..>| 00000090 36 68 1b 59 a8 7d 84 23 76 0a 84 21 a1 cd 80 8e |6h.Y.}.#v..!....|

The first four bytes seem to be consistent among my samples. It makes me wonder if the ascii values have something to do with the author, Anthony (Tony) J. Robinson. In the source code for the shorten software, the file shorten.h defines the ascii “ajkg” as the magic header for the SHN format. Also found in current ffmpeg code. Although the tools don’t have much to say about them.

mediainfo test.shn General Complete name : test.shn Format : Shorten Format version : 2 File size : 3.17 MiB Audio Format : Shorten Compression mode : Lossless ffprobe -i test.shn Input #0, shn, from 'test.shn': Duration: N/A, start: 0.000000, bitrate: N/A Stream #0:0: Audio: shorten, 44100 Hz, 2 channels, s16p

Using the older SHNTOOL, we can get more information.

shntool info test.shn ------------------------------------------------------------------------------- File name: test.shn Handled by: shn format module Length: 0:32.23 WAVE format: 0x0001 (Microsoft PCM) Channels: 2 Bits/sample: 16 Samples/sec: 44100 Average bytes/sec: 176400 Rate (calculated): 176400 Block align: 4 Header size: 44 bytes Data size: 5697720 bytes Chunk size: 5697756 bytes Total size (chunk size + 8): 5697764 bytes Actual file size: 3325489 File is compressed: yes Compression ratio: 0.5836 CD-quality properties: CD quality: yes Cut on sector boundary: no Sector misalignment: 1176 bytes Long enough to be burned: yes WAVE properties: Non-canonical header: no Extra RIFF chunks: no Possible problems: File contains ID3v2 tag: no Data chunk block-aligned: yes Inconsistent header: no File probably truncated: unknown Junk appended to file: unknown Odd data size has pad byte: n/a Extra shn-specific info: Seekable: yes

Many Shorten Audio Files are found out there in archives and file sharing sites, so even though the format isn’t used to create new files, it will still be around for awhile. My GitHub has my signature proposal and a couple of samples.

Canvas

Obsolete Thor - 29 maart 2024 - 6:21am

When it comes to design software there were many options over the years, many being released with a lot of hype and others disappearing not long after they released. There are few which lasted long enough to not be gobbled up by big names such as Adobe. One of those is Canvas by Deneba Systems.

First released in 1987, it is still available over at Canvas GFX. It’s amazing it was never bought by one of the big names, Adobe, Corel, Aldus, etc and remained under Deneba Systems until 2003 when it was bought by ACD Systems, but kept the name Deneba Canvas for a time. The later versions were not popular to all, and Mac support was dropped, but the software continued. Awhile back I was looking through a few of my old ZIP disks and found some software my father used in the mid 1980’s. He had a copy of Canvas version 2 for Macintosh. At that time I was more interested in playing games on our family’s Macintosh 128k than using design software.

Over the years I have come across many Canvas documents. With each version released, changes were made to the file format used to store the drawings and artwork. There were many file format changes as well as the extensions used with each version. Some are easily identifiable and others have some confusing structures. Lets look into it.

VersionPlatformExtensionDescriptionCanvas 1-3 & artWORKSMacintoshnoneno strong patternCanvas 3.5Mac & WindowsCVSSimilar to v1-3Canvas 5Mac & WindowsCV5CANVAS5 stringCanvas 6-8Mac & WindowsCNVCANVAS6 stringCanvas 9-XMac & WindowsCVXSimilar to 6-8Canvas DrawMacCVDDifferent than othersCanvas Image FileCVIDAD5PROX

The first three versions of Canvas were Macintosh only and in those early days there was no extension, just a Type / Creator indicating to the Finder how to open them. Deneba Systems used the Creator codes DAD2, DAD5, through DADX.

The first versions are quite frustrating. I have gathered samples from Version 2, 3, 3.5 and artWORKS version 1. Even with numerous samples, there are no patterns I can discern from them. I even reached out to the current CanvasX technical support for answers. They wanted to be helpful, but their answers didn’t offer much help.

With “CVS” or ‘drw2’ for mac, the header contains ranges inside a structure, and other data like if it was compressed. When we see if it’s a valid file we check the ranges. There is no easy way to determine what hex values would be written because of flipping, Intel vs (PPC or 68K). Unfortunately, the research needed to identify the Hex value will require the original code for version 3.5 which we do not have access to easily. Canvas 3.5 code is 16 bit… this would also be an issue.

Let’s take a look at a couple samples:

hexdump -C Canvas2.1-Sample | head 00000000 00 00 03 06 00 00 3d 9c 00 00 00 2a 00 00 00 0a |......=....*....| 00000010 00 00 00 76 00 00 00 36 00 00 00 2e 00 00 00 1e |...v...6........| 00000020 00 00 00 12 00 00 00 42 00 00 00 1a 00 00 00 82 |.......B........| 00000030 00 00 00 3c 00 66 00 01 00 00 3d 9c 00 48 00 00 |...<.f....=..H..| 00000040 40 02 90 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............| 00000050 00 01 00 00 01 00 00 00 00 20 00 40 00 60 00 80 |......... .@.`..| 00000060 00 c0 01 40 01 80 01 c0 02 40 02 80 00 00 00 00 |...@.....@......| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 |................| 00000080 00 00 00 00 00 01 00 10 00 00 00 01 00 03 3f fc |..............?.| 00000090 80 00 00 00 00 00 00 00 00 07 00 01 00 01 00 0b |................| hexdump -C Canvas2-s02 | head 00000000 00 00 03 b2 00 00 07 ec 00 00 00 2a 00 00 00 0a |...........*....| 00000010 00 00 00 76 00 00 00 36 00 00 00 2e 00 00 00 1e |...v...6........| 00000020 00 00 00 12 00 00 00 42 00 00 00 1a 00 00 00 82 |.......B........| 00000030 00 00 00 3c 00 66 00 01 00 00 07 ec 00 48 00 00 |...<.f.......H..| 00000040 40 02 90 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............| 00000050 00 01 01 00 01 00 00 00 00 20 00 40 00 60 00 80 |......... .@.`..| 00000060 00 c0 01 40 01 80 01 c0 02 40 02 80 00 00 00 00 |...@.....@......| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 |................| 00000080 00 00 00 00 00 01 00 10 00 00 00 01 00 03 3f fc |..............?.| 00000090 80 00 00 00 00 00 00 00 00 07 00 01 00 01 00 0b |................| hexdump -C Canvas3.04 | head 00000000 00 00 02 5a 00 00 00 1c 00 00 00 2a 00 00 00 0a |...Z.......*....| 00000010 00 00 00 76 00 00 00 36 00 00 00 2e 00 00 00 1e |...v...6........| 00000020 00 00 00 12 00 00 00 42 00 00 00 1a 00 00 00 82 |.......B........| 00000030 00 00 00 3c 00 68 00 02 00 00 00 1c 00 48 00 00 |...<.h.......H..| 00000040 40 02 90 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............| 00000050 00 01 01 00 01 03 00 00 00 20 00 40 00 60 00 80 |......... .@.`..| 00000060 00 c0 01 40 01 80 01 c0 02 40 02 80 00 00 00 00 |...@.....@......| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000080 00 01 00 00 00 01 00 10 00 00 00 01 00 03 3f fc |..............?.| 00000090 80 00 00 00 00 00 00 00 00 07 00 01 00 01 00 0b |................| hexdump -C Canvas5-3.5-Sample1.CVS | head 00000000 00 00 01 58 00 00 01 30 00 00 00 2a 00 00 00 00 |...X...0...*....| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000030 00 00 00 00 00 69 00 02 00 00 01 30 00 48 00 00 |.....i.....0.H..| 00000040 40 02 90 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............| 00000050 00 01 01 01 00 00 00 00 00 20 00 40 00 60 00 80 |......... .@.`..| 00000060 00 c0 01 40 01 80 01 c0 02 40 02 80 00 00 00 00 |...@.....@......| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000080 00 01 00 00 00 01 00 10 00 00 00 01 00 03 3f fc |..............?.| 00000090 80 00 00 00 00 00 00 00 00 07 00 01 00 01 00 01 |................| hexdump -C C3-5-S01.CVS | head 00000000 78 11 00 00 10 00 00 00 2a 00 00 00 0a 00 00 00 |x.......*.......| 00000010 26 00 00 00 26 00 00 00 26 00 00 00 26 00 00 00 |&...&...&...&...| 00000020 96 00 00 00 2a 00 00 00 2e 00 00 00 32 00 00 00 |....*.......2...| 00000030 00 00 00 00 01 6b 01 00 50 14 00 00 28 00 00 00 |.....k..P...(...| 00000040 6e 00 00 00 5b 00 00 00 01 00 04 00 00 00 00 00 |n...[...........| 00000050 e8 13 00 00 12 0b 00 00 12 0b 00 00 00 00 00 00 |................| 00000060 00 00 00 00 00 00 00 00 00 00 80 00 00 80 00 00 |................| 00000070 00 80 80 00 80 00 00 00 80 00 80 00 80 80 00 00 |................| 00000080 c0 c0 c0 00 80 80 80 00 00 00 ff 00 00 ff 00 00 |................| 00000090 00 ff ff 00 ff 00 00 00 ff 00 ff 00 ff ff 00 00 |................|

In the version 2 & 3 samples you can see some patterns, which I thought would allow for proper identification, but looking at more samples I found differences. One pattern I was hopeful might be consistent was the hex values “002000400060008000C00140018001C002400280”, but there are some which don’t match this pattern. If the file is truly compressed, it will be hard to know which values would be consistent among all files. I have over 8,000 samples and have a signature that only excludes around 20, so it will have to do for now.

When we start with Version 5 we get into some more identifiable headers, there is some oddness with some samples. But with an ascii string like “CANVAS5”, it should be easy, right? Not so fast, in version 5 you can compress the file structure. This removes the easily identifiable “CANVAS5” string. But some have a small string at the tail end, but others do not.

hexdump -C Canvas5-Sample1.CV5 | head 00000000 02 00 00 80 00 00 00 00 00 00 00 4e 96 00 00 4e |...........N...N| 00000010 96 18 02 00 00 00 0e a8 da 43 41 4e 56 41 53 35 |.........CANVAS5| 00000020 00 01 00 00 00 00 00 05 03 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 21 00 00 00 21 00 00 00 79 00 00 |.....!...!...y..| 00000040 00 03 00 00 01 6b 00 00 00 03 00 00 00 01 ff ff |.....k..........| 00000050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| hexdump -C Canvas5-Sample3-cmp.CV5 | head 00000000 02 00 00 80 00 00 00 00 08 00 00 80 00 00 00 03 |................| 00000010 5c ff ff ff ff 00 00 40 22 00 00 03 50 10 00 89 |\......@"...P...| 00000020 07 60 bd 0f f0 00 00 10 03 04 10 56 00 20 05 00 |.`.........V. ..| 00000030 e0 18 02 10 35 04 30 4e 05 30 72 07 f0 a8 0d a1 |....5.0N.0r.....| 00000040 17 11 81 19 05 50 5c 00 60 0f 00 10 80 02 90 80 |.....P\.`.......| 00000050 03 f0 56 05 50 55 05 b0 75 12 51 29 05 e0 55 05 |..V.PU..u.Q)..U.| hexdump -C Canvas5-Sample3-cmp.CV5 | tail 00001ff0 00 00 00 01 08 a5 ab c0 00 00 00 00 3f 89 2c 58 |............?.,X| 00002000 00 00 00 00 08 a5 ab 80 00 00 00 00 ff d4 11 e4 |................| 00002010 00 00 00 00 08 a5 ab 90 00 02 3e d8 ff d3 12 cc |..........>.....| 00002020 00 00 00 00 00 00 00 00 00 02 3e d8 00 01 00 09 |..........>.....| 00002030 00 00 00 00 00 00 00 00 00 00 00 00 08 a5 ab f8 |................| 00002040 00 00 00 00 43 4e 56 35 |....CNV5|

Canvas 6 uses a new extension, but has a similar structure to the file format. With compression as an option. But some of the compressed files on Windows has a reversed string, “5VNC“. So many Canvas 5 compressed look identical to Canvas 6 compressed, complicating identification.

hexdump -C Canvas6-Sample.CNV | head 00000000 01 00 80 00 00 90 07 cd 07 00 80 00 00 00 80 00 |................| 00000010 00 17 01 00 00 59 f5 0e 00 43 41 4e 56 41 53 36 |.....Y...CANVAS6| 00000020 00 01 00 00 00 00 06 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 21 7a 00 00 00 7a 00 00 00 03 00 |.....!z...z.....| 00000040 00 00 6e 01 00 00 03 00 00 00 01 00 00 00 ff ff |..n.............| 00000050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| hexdump -C Canvas6-Sample1-c.CNV | head 00000000 01 00 80 00 00 58 ea 2b 00 c2 1d 00 00 d0 09 00 |.....X.+........| 00000010 00 00 00 0f 2e 00 00 0b 07 00 00 09 c4 10 00 01 |................| 00000020 00 00 03 00 20 04 00 70 ff 00 80 05 00 c0 06 06 |.... ..p........| 00000030 50 20 03 00 0f 06 10 6b 00 a0 12 01 00 48 07 20 |P .....k.....H. | 00000040 6d 07 30 40 06 40 11 06 00 0b 05 00 10 00 10 71 |m.0@.@.........q| 00000050 01 40 21 00 00 59 01 00 0f 05 10 00 00 e1 14 00 |.@!..Y..........| hexdump -C Canvas6-Sample1-c.CNV | tail 000016a0 00 00 00 12 f6 00 00 c0 f0 12 00 3c d0 80 7c 58 |...........<..|X| 000016b0 2f 14 00 00 00 00 00 bc f4 8d 00 0f 00 00 00 00 |/...............| 000016c0 f1 12 00 7f 00 00 00 f8 2e 14 00 bc f4 8d 00 1c |................| 000016d0 f2 12 00 04 f3 12 00 fc d1 80 7c 09 04 00 00 00 |..........|.....| 000016e0 00 00 40 00 f2 12 00 ff ff ff ff 00 f1 12 00 1c |..@.............| 000016f0 f1 12 00 bc f4 8d 00 00 00 00 40 35 56 4e 43 |..........@5VNC|

While most have the “CANVAS6” string near the beginning, quite a few are missing the CNV5/5VNC string at the end. Instead, many have the string “%SI-0200” near the end, which I use in my signature suggestion. This structure remained the same from version 6 to 8.

hexdump -C Canvas8-S01.CNV | head 00000000 02 00 00 80 00 00 12 b8 80 00 00 11 19 00 00 11 |................| 00000010 19 18 02 00 00 00 0e f5 59 43 41 4e 56 41 53 36 |........YCANVAS6| 00000020 00 01 00 00 00 00 00 08 01 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 21 00 00 00 00 00 00 00 00 00 00 |.....!..........| 00000040 00 03 00 00 00 00 00 00 00 03 00 00 00 01 00 00 |................| 00000050 00 01 ff ff ff ff 00 00 00 02 00 00 00 02 00 00 |................|

But…….. There are plenty without these strings, just the “%SI-0200” near the end.

hexdump -C TELEGRPH.CNV | head 00000000 02 00 00 80 00 00 00 00 08 00 00 80 00 00 00 3d |...............=| 00000010 f2 ff ff ff ff 00 00 75 76 00 00 3d e6 10 00 ff |.......uv..=....| 00000020 00 00 b3 0d 90 a9 03 b0 8a 07 f0 98 07 60 80 08 |.............`..| 00000030 d0 35 01 c0 58 01 e0 59 04 80 b8 03 90 38 02 f0 |.5..X..Y.....8..| 00000040 e2 00 20 0b 03 70 1d 03 20 36 0f 30 00 01 80 09 |.. ..p.. 6.0....| hexdump -C TELEGRPH.CNV | tail 00006850 2b 2c f9 ae 30 00 00 00 20 00 00 00 01 00 00 00 |+,..0... .......| 00006860 0f 00 00 00 10 00 00 00 1e 00 00 00 07 00 00 00 |................| 00006870 64 65 6e 65 62 61 00 00 00 00 01 4c 25 53 49 2d |deneba.....L%SI-| 00006880 30 32 30 30 6d 61 63 00 00 00 00 00 00 00 00 00 |0200mac.........| 00006890 00 00 00 00 |....|

In version 9 and forward we have an extension change to CVX, but the format is similar with the “CANVAS6” string, but is a slightly different offset. It is still used with the current version of Canvas X.

hexdump -C Canvas9-Sample1.cvx | head 00000000 00 00 00 00 00 00 00 00 00 00 02 00 00 80 00 07 |................| 00000010 d1 84 d0 00 00 80 00 00 00 80 00 18 02 00 00 00 |................| 00000020 0f b7 ef 43 41 4e 56 41 53 36 00 01 00 00 00 00 |...CANVAS6......| 00000030 00 09 00 00 00 03 34 00 00 00 04 00 00 00 00 00 |......4.........| 00000040 00 00 00 3c 42 45 47 49 4e 5f 50 52 45 56 49 45 |...<BEGIN_PREVIE| 00000050 57 5f 54 41 47 3e 21 00 00 00 75 00 00 00 79 00 |W_TAG>!...u...y.| 00000060 00 00 03 00 00 01 6b 00 00 00 03 00 00 00 01 ff |......k.........| 00000070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| hexdump -C Canvas9-Sample1-compressed.cvx | tail 00004090 00 00 e0 20 00 57 80 00 00 00 00 00 0a 13 00 09 |... .W..........| 000040a0 00 00 04 00 00 00 00 01 00 00 00 00 bf ff e0 80 |................| 000040b0 bf ff e0 40 01 8c 5e 00 02 4a 22 d0 00 00 01 60 |...@..^..J"....`| 000040c0 bf ff e0 40 00 5c 08 18 00 00 00 00 00 0d 84 80 |...@.\..........| 000040d0 43 61 6e 76 61 73 39 2d 53 61 6d 70 6c 65 31 2d |Canvas9-Sample1-| 000040e0 63 6f 6d 70 72 65 73 73 65 64 2e 63 76 78 00 18 |compressed.cvx..| 000040f0 bf ff e0 70 0a 12 6a a0 02 43 22 b4 00 0c aa 9c |...p..j..C".....| 00004100 bf ff e0 80 00 00 00 01 00 00 00 00 00 0d 84 80 |................| 00004110 bf ff e0 b0 43 4e 56 35 |....CNV5| hexdump -C CanvasX2019-S01.cvx | head 00000000 00 00 00 00 00 00 00 00 00 00 01 00 80 00 00 00 |................| 00000010 6e ab 03 00 80 00 00 00 80 00 00 17 01 00 00 ef |n...............| 00000020 b7 0f 00 43 41 4e 56 41 53 36 00 01 00 00 00 00 |...CANVAS6......| 00000030 09 00 00 4d 01 00 00 eb 4c 00 00 41 00 00 00 31 |...M....L..A...1| 00000040 52 45 56 03 00 00 00 01 00 00 00 00 00 00 00 00 |REV.............| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This collection of file formats is very hard to make sense of. Some really great consistent patterns on many samples, with lots of exceptions. Super confusing. This software has had a long run, with the latter years staying pretty stagnate in terms of new development. It is worth defining and creating a signature for the consistent patterns, then we can dial in the variants over time?

The signatures I have built miss about 23 files in versions 1-3 out of the ~9000 samples I have and for Canvas 5, only some of the compressed files are currently not identified. But so far all my CNV and CVX files identify correctly, so probably good for now.

CanvasX dropped supported for the Macintosh, but did release an entirely different product called Canvas X Draw, which does support the Macintosh. Here is what a CVD file looks like:

hexdump -C CanvasXDraw7-Sample1.cvd | head 00000000 25 43 61 6e 76 61 73 43 56 44 09 31 2e 30 25 bb |%CanvasCVD.1.0%.| 00000010 54 48 65 61 64 65 72 00 00 00 00 00 00 00 00 00 |THeader.........| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 bb 52 4d 61 63 4f 53 56 65 72 73 69 6f 6e 20 |..RMacOSVersion | 00000040 31 30 2e 31 33 2e 36 20 28 42 75 69 6c 64 20 31 |10.13.6 (Build 1| 00000050 37 47 31 34 30 34 32 29 31 30 2e 32 33 30 34 08 |7G14042)10.2304.| 00000060 00 00 00 70 6c 61 74 66 6f 72 6d 0a 73 00 00 00 |...platform.s...| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000080 00 00 00 00 00 05 00 00 00 02 00 00 00 00 00 00 |................| 00000090 00 08 00 00 00 6f 73 0a 73 00 00 00 00 00 00 00 |.....os.s.......|

There is also the matter of a Canvas Image, which the User Guide calls proxy images. They are Raster images used in placements within Canvas Documents. Should be easy to identify.

hexdump -C Canvas5-Sample1.CVI | head 00000000 00 00 00 01 44 41 44 35 50 52 4f 58 00 00 09 99 |....DAD5PROX....| 00000010 00 00 00 11 00 00 00 2d 00 00 00 03 00 00 00 08 |.......-........| 00000020 00 48 00 00 00 00 00 06 00 03 00 08 00 00 00 11 |.H..............| 00000030 00 00 00 2d 00 03 00 03 00 48 00 00 00 48 00 00 |...-.....H...H..| 00000040 00 00 00 00 00 00 00 00 00 00 00 11 00 00 00 2d |...............-| 00000050 00 00 00 02 00 00 00 08 00 00 00 01 00 00 00 11 |................| 00000060 00 00 00 2d ff ff ff ff ff ff ff ff ff ff ff ff |...-............| 00000070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

Phew, if you held on for this whole post you must really like confusing file format structures. This format has been on my mind on and off for about 6 years. Hopefully these signatures will work for the vast majority of the Canvas files found in archives and personal systems. As always here is my GitHub with the signatures I am proposing and a few samples to get you confused.

MAGIX

Obsolete Thor - 22 maart 2024 - 6:13am

There are probably many reasons why a software developer might want to create a proprietary format to store their files in. The software may require special features that don’t fit into an existing format. I would hope a developer would try to use existing formats, or even better open formats, but for many reasons, which probably include profits, they choose to re-invent the wheel often.

MAGIX is a German company which started making software in 1994. In 2001 they developed their first video editing software which was called Movie Edit Pro. The software seems to be well received and is still in use today.

Like most video editing software, project files are used to store all the edits and links to video files. These are usually smaller text based, with many using XML as the project format. Not MAGIX, they decided to go with a different yet known format for their project files.

hexdump -C MAGIX15-s01.MVP | head 00000000 52 49 46 46 6c 37 01 00 53 45 4b 44 4d 56 50 48 |RIFFl7..SEKDMVPH| 00000010 08 00 00 00 00 00 00 00 00 00 00 00 4c 49 53 54 |............LIST| 00000020 0c 16 01 00 4d 56 50 4c 4c 49 53 54 00 16 01 00 |....MVPLLIST....| 00000030 56 49 50 4c 53 56 49 50 0c 07 00 00 00 dc 05 00 |VIPLSVIP........| 00000040 00 00 00 00 20 00 00 00 0c 00 00 00 80 bb 00 00 |.... ...........| 00000050 10 00 00 00 29 6b 55 e2 53 f8 3d 40 00 00 f0 42 |....)kU.S.=@...B| 00000060 01 00 00 00 bd 04 ef fe 00 00 01 00 06 00 08 00 |................| 00000070 00 00 01 00 06 00 08 00 00 00 01 00 3f 00 00 00 |............?...| 00000080 28 00 00 00 04 00 04 00 01 00 00 00 00 00 00 00 |(...............| 00000090 00 00 00 00 00 00 00 00 bd 8f 32 01 d0 02 00 00 |..........2.....|

Yes, they used the RIFF container format for their projects. Seems an odd choice, especially for video production although it is well suited for it. AVI is another video format which uses the RIFF container. The MVP project file uses the ID SEKD with the format MVPH. Earlier versions of Movie Edit Pro used a different extension.

hexdump -C MAGIXv11-s01.MVD | head 00000000 52 49 46 46 38 57 00 00 53 45 4b 44 53 56 49 50 |RIFF8W..SEKDSVIP| 00000010 70 00 00 00 00 dc 05 00 00 00 00 00 04 00 00 00 |p...............| 00000020 02 00 00 00 80 bb 00 00 10 00 00 00 8e 23 d6 e2 |.............#..| 00000030 53 f8 3d 40 00 00 f0 42 01 00 00 00 bd 04 ef fe |S.=@...B........| 00000040 00 00 01 00 00 00 06 00 00 00 04 00 00 00 06 00 |................| 00000050 00 00 04 00 3f 00 00 00 28 00 00 00 04 00 04 00 |....?...(.......| 00000060 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000070 c8 1b 32 01 d0 02 00 00 e0 01 00 00 52 d7 da fb |..2.........R...| 00000080 54 55 f5 3f 4c 49 53 54 04 00 00 00 70 68 79 73 |TU.?LIST....phys| 00000090 4c 49 53 54 d0 3d 00 00 74 72 6b 73 4c 49 53 54 |LIST.=..trksLIST|

The MVD format used on an earlier version of Movie Edit Pro is also a RIFF, and with the ID of SEKD, but has a format of SVIP.

RIFFpad can break down the chunks we see in an MVP file. Each of the LIST chunks has their own subchunks as well. I assume this his how the editing software stores each video/audio track references, etc. So I give it to MAGIX for at least using an understandable format to store their projects.

MAGIX has also used RIFF in many of its supporting formats. So far I have found mfx, afx, ifx, cfx, ctf, tfx, ufx, mmt, mmm, hdp, each having their own format:

hexdump -C 101_Loud.mfx | head 00000000 52 49 46 46 a8 6f 00 00 53 45 4b 44 4d 41 46 58 |RIFF.o..SEKDMAFX| 00000010 00 00 00 00 4c 49 53 54 94 6f 00 00 41 55 46 58 |....LIST.o..AUFX| 00000020 4c 49 53 54 88 6f 00 00 41 46 58 45 46 58 48 44 |LIST.o..AFXEFXHD| 00000030 20 00 00 00 00 00 25 0d 00 00 00 00 02 00 00 00 | .....%.........| 00000040 01 00 00 00 00 00 00 00 03 18 00 00 00 00 00 00 |................| 00000050 00 00 00 00 4c 49 53 54 54 6f 00 00 41 46 58 44 |....LISTTo..AFXD| 00000060 4c 49 53 54 50 6a 00 00 41 46 58 45 46 58 48 44 |LISTPj..AFXEFXHD| 00000070 20 00 00 00 00 00 25 0d 00 00 00 00 05 00 00 00 | .....%.........| 00000080 01 00 00 00 00 00 00 00 03 18 00 00 00 00 00 00 |................| 00000090 00 00 00 00 4c 49 53 54 1c 6a 00 00 41 46 58 44 |....LIST.j..AFXD|

Not sure the best way to manage all of these in terms of identification, as I am not sure what what is the purpose of each format. Maybe for now I’ll make a generic to catch them all as a MAGIX File.

ExtensionIDFORMATAFXSEKDSAFXCFXSEKDSCFXCTFSEKDSVIPHDPSEKDSHDPIFXSEKDSIFXMFXSEKDMAFXMMMSEKDSVIPMMTSEKDSVIPMVDSEKDSVIPMVPSEKDMVPHMXMMXMDmxmiTFXSEKDSTFXUFXSEKDSVIP

But, when it comes to their proprietary MAGIX Video format, I think they may have pushed things a little too far. Meet the MXV format:

hexdump -C MAGIXv11-s01.mxv | head 00000000 4d 58 52 49 46 46 36 34 9a cb 2b 00 00 00 00 00 |MXRIFF64..+.....| 00000010 4d 58 4a 56 49 44 36 34 4d 58 4a 56 48 32 36 34 |MXJVID64MXJVH264| 00000020 70 00 00 00 00 00 00 00 70 00 00 00 03 00 00 00 |p.......p.......| 00000030 42 93 2b 00 00 00 00 00 f0 00 00 00 00 00 00 00 |B.+.............| 00000040 7b 2e 00 00 4b 00 00 00 01 00 00 00 00 00 00 00 |{...K...........| 00000050 8e 23 d6 e2 53 f8 3d 40 80 02 00 00 e0 01 00 00 |.#..S.=@........| 00000060 80 02 00 00 e0 01 00 00 04 00 00 00 43 15 00 00 |............C...| 00000070 f0 00 00 00 00 00 00 00 28 19 00 00 00 00 00 00 |........(.......| 00000080 55 55 55 55 55 55 f5 3f 00 00 00 00 00 00 00 00 |UUUUUU.?........| 00000090 7f dd 05 00 00 00 00 00 4d 58 4a 56 48 44 36 34 |........MXJVHD64|

I am not sure what I am looking at, is it a RIFF? Is it a RIFF variant like RF64? MAGIX claims the format is:

This is the MAGIX video format for quicker processing with MAGIX products. It offers very low loss of quality, but it cannot be played via conventional DVD players.

MAGIX Video Pro X6

A look around the internet doesn’t bring much up in reference to this format. Just my recent page on the format wiki. A search for MXRIFF64 bring up nothing. But a closer look at other strings within the MXV file reveal we are probably looking at some sore of MPEG format.

I was able to locate a project on GitHub which claims to be able to demux the MXV format. The software is written in GO and appears to indicate this format is chunked based and has most of the chunks figured out. So if you find yourself stuck with some MXV files and don’t want to use the latest from MAGIX, this might be the tool for you.

This demuxer also has an interesting file you can download. It is called a “GRAMMAR” file and can be loaded into hex viewers like Synalyze It! can show the parts of a file you load. Its a great way to explore a format!

None of these formats are found in PRONOM, project files are not usually kept in archives, but if would be good to know about the RIFF files if they do turn up. The video format is for sure something the archival world should know about. MediaInfo is currently not aware of this format, but seems like it might be an easy task.

As usual, you can see some samples and my proposal signatures on my GitHub.

Digitising The Histories of Islamic West Africa

Endangered Archives Blog - 19 maart 2024 - 2:29pm
Our project, Digital Preservation of Fuuta Jalon Scholars’ Arabic and Ajami Materials in Senegal and Guinea, is funded by a grant from the Endangered Archives Programme (EAP1430). It seeks to digitally preserve 50,000 pages of endangered Arabic and Ajami manuscripts (texts written with modified Arabic script) produced by Fuuta Jalon... Endangered Archives

Designer

Obsolete Thor - 15 maart 2024 - 6:55am

Micrografx / Corel Designer

Many software titles we have all used began life under a different brand or even title. Larger software companies gobble up smaller developers, some brands merge, and others change names for whatever reason. Adobe has bought many smaller companies over the years, sometimes developing the acquired software and other times burying the software to avoid competition. Pagemaker was bought to give InDesign life, many Macromedia titles were incorporated or shelved. Such is life in the software world.

In understanding a file format, often times you need to follow this trail backwards to understand when file formats changed and compatibility is dropped. Often times the formats remained the same, but the extension is changed. Or the software name changes and formats are updated, but the extension remains the same. There can also be multiple titles which all use a common format, further complicating the identification of the formats.

Let’s look closer at the a title which changed names and file formats a few times over the years. Micrografx was founded in 1982 and were pretty well known for their innovation in computer graphics. They have released many titles over the years, but one of the first was In*A*Vision graphic software for Windows 1.0 in 1986. This software used a format with the .PIC extension. A couple years later version 2, was renamed to Micrografx Designer and used the .DRW extension. This extension was also used by Micrografx Draw, another similar program.

Micrografx Designer continued to be released until version 9 which is when it was purchased by Corel who continued to release new versions, although it is said the software was just a variation of CorelDraw, and now Designer is part of the CorelDraw Technical Suite. Other Micrografx software such as Picture Publisher was discontinued and customers were encouraged to use Corel’s PaintShop Pro instead. Somewhere in the middle of all this, Micrografx spun off a separate business unit called iGrafx, which Designer was marketed under for a short time.

Let’s break down the names, extensions used, and format type.

  • In*A*Vision & Draw, binary format, PIC extension
  • Micrografx Designer & Draw, binary format, DRW extension
  • Micrografx Designer version 4, RIFF format, DS4 & MGX extension
  • Micrografx Designer versions 6-9, OLE Container format, DSF extension
  • Micrografx/Corel Designer versions 10-12, RIFF format, DES extension
  • Corel Designer version X4-Current, ZIP/XML format, DES extension

According to the 2021 Corel DesignerUser Guide:

Corel DESIGNER (DES, DSF, DS4, or DRW)

You can import Corel DESIGNER files. Files from version 10 and later have the filename extension .des. Files from Micrografx versions 6 to 9 have the filename extension .dsf. Version 4 files have the filename extension .ds4. The .drw filename extension is used for a Micrografx 2.x or 3.x file. Micrografx template files (DST) are also supported.

The PRONOM registry has a few of these formats with signatures and documented, but not all, let’s see where the gaps are.

PUIDFormat NameFormat VersionExtensionx-fmt/151 Micrografx Designer dsfx-fmt/296 Micrografx Designer 3.1drwx-fmt/47 Micrografx Draw 1-2drwx-fmt/294 Micrografx Draw 3drwx-fmt/295 Micrografx Draw 4drw, drtfmt/1907Micrografx Icon File icnfmt/1481Micrografx In-A-Vision Drawingpic

So from the PRONOM list, it appears we have good identification on the original PIC and DRW formats. Then the Designer DSF OLE container is taken care of as well. That leaves us with DS4 and DES formats.

hexdump -C DS41-S01.DS4 | head 00000000 52 49 46 46 6e 07 00 00 4d 47 58 20 69 74 70 64 |RIFFn...MGX itpd| 00000010 04 00 00 00 00 02 00 80 70 72 6f 70 23 00 00 00 |........prop#...| 00000020 1f 00 00 30 02 00 00 00 08 00 2c 40 44 00 11 20 |...0......,@D.. | 00000030 20 00 01 10 80 e0 00 00 91 08 21 e0 5c 82 90 72 | .........!.\..r| 00000040 05 ff c0 00 4c 49 53 54 10 04 00 00 64 69 74 6e |....LIST....ditn| 00000050 74 68 6e 6c 03 04 00 00 57 01 00 30 00 00 08 00 |thnl....W..0....| 00000060 08 00 00 41 04 00 01 20 a4 00 82 10 72 14 40 48 |...A... ....r.@H| 00000070 00 58 20 84 04 32 10 40 00 12 c8 98 18 22 63 90 |.X ..2.@....."c.| 00000080 2b 91 32 36 47 08 20 c0 23 e4 80 90 92 22 46 49 |+.26G. .#...."FI| 00000090 09 29 26 24 e4 a0 94 92 a2 56 4b 09 69 2e 25 e4 |.)&$.....VK.i.%.|

Micrografx Designer 4 apparently uses the RIFF container format. The RIFF format is used with many different types of formats. The most common is the WAV format. CorelDRAW also uses the RIFF format so it makes sense they would use it as they took over from Micrografx.

Each RIFF format has a four byte identifier type after the first eight bytes which identify the RIFF. The DS4 file uses the code “MGX ” to identify itself. Which also appears to be used with their clipart format, MGX. We can use the same identification method we use for other RIFF’s to identify this format.

hexdump -C Corel-DES10Sample.des | head 00000000 52 49 46 46 8a 57 00 00 44 45 53 41 76 72 73 6e |RIFF.W..DESAvrsn| 00000010 02 00 00 00 7e 04 4c 49 53 54 54 0c 00 00 69 63 |....~.LISTT...ic| 00000020 63 70 69 63 63 64 48 0c 00 00 00 00 0c 48 4c 69 |cpiccdH......HLi| 00000030 6e 6f 02 10 00 00 6d 6e 74 72 52 47 42 20 58 59 |no....mntrRGB XY| 00000040 5a 20 07 ce 00 02 00 09 00 06 00 31 00 00 61 63 |Z .........1..ac| 00000050 73 70 4d 53 46 54 00 00 00 00 49 45 43 20 73 52 |spMSFT....IEC sR| 00000060 47 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |GB..............| 00000070 f6 d6 00 01 00 00 00 00 d3 2d 48 50 20 20 00 00 |.........-HP ..| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Starting with version 10 of Corel Designer, the RIFF format is used again and has a different type. With Version 10 using “DESA”, then for version 10.5:

hexdump -C Corel-DES10.5Sample.des | head 00000000 52 49 46 46 cc 57 00 00 44 45 53 42 76 72 73 6e |RIFF.W..DESBvrsn| 00000010 02 00 00 00 b0 04 4c 49 53 54 54 0c 00 00 69 63 |......LISTT...ic| 00000020 63 70 69 63 63 64 48 0c 00 00 00 00 0c 48 4c 69 |cpiccdH......HLi| 00000030 6e 6f 02 10 00 00 6d 6e 74 72 52 47 42 20 58 59 |no....mntrRGB XY| 00000040 5a 20 07 ce 00 02 00 09 00 06 00 31 00 00 61 63 |Z .........1..ac| 00000050 73 70 4d 53 46 54 00 00 00 00 49 45 43 20 73 52 |spMSFT....IEC sR| 00000060 47 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |GB..............| 00000070 f6 d6 00 01 00 00 00 00 d3 2d 48 50 20 20 00 00 |.........-HP ..| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

The next version after 10.5 is version 12 and it shows a type:

hexdump -C Corel-DES12-Sample.des | head 00000000 52 49 46 46 ce 57 00 00 44 45 53 43 76 72 73 6e |RIFF.W..DESCvrsn| 00000010 02 00 00 00 e2 04 4c 49 53 54 54 0c 00 00 69 63 |......LISTT...ic| 00000020 63 70 69 63 63 64 48 0c 00 00 00 00 0c 48 4c 69 |cpiccdH......HLi| 00000030 6e 6f 02 10 00 00 6d 6e 74 72 52 47 42 20 58 59 |no....mntrRGB XY| 00000040 5a 20 07 ce 00 02 00 09 00 06 00 31 00 00 61 63 |Z .........1..ac| 00000050 73 70 4d 53 46 54 00 00 00 00 49 45 43 20 73 52 |spMSFT....IEC sR| 00000060 47 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |GB..............| 00000070 f6 d6 00 01 00 00 00 00 d3 2d 48 50 20 20 00 00 |.........-HP ..| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

After version 12, Corel started using numbering consistent with their other products. The first being X4.

hexdump -C Corel-DESX4-Sample.des | head 00000000 50 4b 03 04 14 00 00 08 00 00 f8 bb c9 4e c3 4b |PK...........N.K| 00000010 9c d1 2d 00 00 00 2d 00 00 00 08 00 00 00 6d 69 |..-...-.......mi| 00000020 6d 65 74 79 70 65 61 70 70 6c 69 63 61 74 69 6f |metypeapplicatio| 00000030 6e 2f 78 2d 76 6e 64 2e 63 6f 72 65 6c 2e 64 65 |n/x-vnd.corel.de| 00000040 73 69 67 6e 65 72 2e 64 6f 63 75 6d 65 6e 74 2b |signer.document+| 00000050 7a 69 70 50 4b 03 04 14 00 00 08 00 00 f8 bb c9 |zipPK...........| 00000060 4e 6f 38 b6 64 98 13 00 00 98 13 00 00 14 00 00 |No8.d...........| 00000070 00 63 6f 6e 74 65 6e 74 2f 72 69 66 66 44 61 74 |.content/riffDat| 00000080 61 2e 63 64 72 52 49 46 46 90 13 00 00 44 45 53 |a.cdrRIFF....DES| 00000090 45 76 72 73 6e 02 00 00 00 82 05 4c 49 53 54 54 |Evrsn......LISTT|

Well it looks like things changed, starting with X4 the format changed to a ZIP container. Let’s take a peak inside.

Path = Corel-DESX4-Sample.des Type = zip Physical Size = 8714 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2019-06-09 22:31:47 ..... 45 45 mimetype 2019-06-09 22:31:47 ..... 5016 5016 content/riffData.cdr 2019-06-09 22:31:47 ..... 196662 239 metadata/thumbnails/thumbnail.bmp 2019-06-09 22:31:47 ..... 151606 698 metadata/thumbnails/page1.bmp 2019-06-09 22:31:47 ..... 596 259 metadata/textinfo.xml 2019-06-09 22:31:47 ..... 4977 1314 metadata/metadata.xml 2019-06-09 22:31:47 ..... 53 55 links.xml ------------------- ----- ------------ ------------ ------------------------ 2019-06-09 22:31:47 358955 7626 7 files

Looks like the container holds a RIFF inside along with some thumbnails, metadata, and other things. The mimetype file simple holds “application/x-vnd.corel.designer.document+zip”. The riffData.cdr however looks like this:

hexdump -C Corel-DESX4-Sample/content/riffData.cdr | head 00000000 52 49 46 46 90 13 00 00 44 45 53 45 76 72 73 6e |RIFF....DESEvrsn| 00000010 02 00 00 00 82 05 4c 49 53 54 54 0c 00 00 69 63 |......LISTT...ic| 00000020 63 70 69 63 63 64 48 0c 00 00 00 00 0c 48 4c 69 |cpiccdH......HLi| 00000030 6e 6f 02 10 00 00 6d 6e 74 72 52 47 42 20 58 59 |no....mntrRGB XY| 00000040 5a 20 07 ce 00 02 00 09 00 06 00 31 00 00 61 63 |Z .........1..ac| 00000050 73 70 4d 53 46 54 00 00 00 00 49 45 43 20 73 52 |spMSFT....IEC sR| 00000060 47 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |GB..............| 00000070 f6 d6 00 01 00 00 00 00 d3 2d 48 50 20 20 00 00 |.........-HP ..| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Another RIFF, and seems to be in the same sequence, but going from version 12 to X4 we seemed to have skipped “DESD”. Maybe there was a developer version in between as they transitioned. Version X5 looks similar and has the RIFF sequence “DESF”. When we get to X6 the structure changes.

Path = Corel-DESX6-Sample.des Type = zip Physical Size = 8568 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2019-06-09 22:31:21 ..... 45 45 mimetype 2019-06-09 22:31:21 ..... 12153 1098 content/data/data1.dat 2019-06-09 22:31:21 ..... 439 224 content/data/masterPage.dat 2019-06-09 22:31:21 ..... 613 265 content/data/page1.dat 2019-06-09 22:31:21 ..... 34 28 content/dataFileList.dat 2019-06-09 22:31:21 ..... 960 279 content/root.dat 2019-06-09 22:31:21 ..... 196662 239 metadata/thumbnails/thumbnail.bmp 2019-06-09 22:31:21 ..... 151606 698 metadata/thumbnails/page1.bmp 2019-06-09 22:31:21 ..... 427 208 color/color.xml 2019-06-09 22:31:21 ..... 596 259 metadata/textinfo.xml 2019-06-09 22:31:21 ..... 103 100 color/docPalette.xml 2019-06-09 22:31:21 ..... 14920 1444 styles/document.cdss 2019-06-09 22:31:21 ..... 5500 1462 metadata/metadata.xml 2019-06-09 22:31:21 ..... 53 55 links.xml ------------------- ----- ------------ ------------ ------------------------ 2019-06-09 22:31:21 384111 6404 14 files

The mimetype remains the same, but we see additional files within the structure. Also the riffData.cdr file is missing. Looking at each file we can see the root.dat file is a RIFF and follows the same sequence.

hexdump -C Corel-DESX6-Sample/content/root.dat | head 00000000 52 49 46 46 b8 03 00 00 44 45 53 47 66 76 65 72 |RIFF....DESGfver| 00000010 10 00 00 00 ff ff ff ff 08 00 00 00 5e 06 02 00 |............^...| 00000020 00 00 10 00 76 72 73 6e 10 00 00 00 ff ff ff ff |....vrsn........| 00000030 02 00 00 00 5e 06 00 00 00 00 00 00 4c 49 53 54 |....^.......LIST| 00000040 7c 00 00 00 64 6f 63 20 6d 63 66 67 10 00 00 00 ||...doc mcfg....| 00000050 00 00 00 00 83 20 00 00 00 00 00 00 00 00 00 00 |..... ..........| 00000060 70 72 65 66 10 00 00 00 00 00 00 00 e6 0e 00 00 |pref............| 00000070 83 20 00 00 00 00 00 00 70 74 72 74 10 00 00 00 |. ......ptrt....| 00000080 00 00 00 00 10 00 00 00 69 2f 00 00 00 00 00 00 |........i/......| 00000090 4c 49 53 54 04 00 00 00 66 69 6c 74 4c 49 53 54 |LIST....filtLIST|

As we get to a more recent version. We can see the pattern continues.

hexdump -C Designer2022-s01/content/root.dat | head 00000000 52 49 46 46 88 06 00 00 44 45 53 4e 66 76 65 72 |RIFF....DESNfver| 00000010 10 00 00 00 ff ff ff ff 08 00 00 00 60 09 02 00 |............`...| 00000020 00 00 18 00 76 72 73 6e 10 00 00 00 ff ff ff ff |....vrsn........| 00000030 02 00 00 00 60 09 00 00 00 00 00 00 4c 49 53 54 |....`.......LIST| 00000040 30 01 00 00 64 6f 63 20 6d 63 66 67 10 00 00 00 |0...doc mcfg....| 00000050 00 00 00 00 08 1f 00 00 00 00 00 00 00 00 00 00 |................| 00000060 70 72 65 66 10 00 00 00 00 00 00 00 ae 07 00 00 |pref............| 00000070 08 1f 00 00 00 00 00 00 70 74 72 74 10 00 00 00 |........ptrt....| 00000080 00 00 00 00 10 00 00 00 b6 26 00 00 00 00 00 00 |.........&......| 00000090 4c 49 53 54 4c 00 00 00 66 6e 74 74 66 6f 6e 74 |LISTL...fnttfont|

The last sample I have is for Corel Designer 2022, but there could be more. I created new signatures for all the samples I have, you can see them in my Github as usual. I decided to group some of the versions together to simplify things a bit, but if anyone thinks they should be broken out into individual versions, let me know.

Writing Center

Obsolete Thor - 8 maart 2024 - 8:03am

In honor of #Marchintosh, I threatened in an earlier post to discuss The Writing Center, one of the many writing programs marketed by the Learning Company for the Mac. This one was developed by Datapak Software, Inc and I think they wanted to watch the world burn.

This format was different enough from the Student Writing Center and the “Ultimate Writing & Creativity Center” to need its own post. Moreover, I am pretty sure the developers of this software were actively trying to frustrate anyone trying to document the format. Let me explain.

In the early Macintosh world, very rarely were extensions used. Current systems use extensions to link the file to an application which can open the file. On the Mac, the system would use special attributes called Type / Creator codes. These codes were registered with Apple so they would be unique to a specific software and type of file. The codes used the FourCC system and unfortunately Apple never released a full list of codes used. Some folks over the years have tried to document as many as they can. Many used simple understandable codes, for example, A Microsoft Word document has a Type / Creator of W6BN / MSWD. The creator code of MSWD is very readable, and the type code W6BN is unique to a document from version 6 of Microsoft Word.

This Sample Report file from The Writing Center, when investigated with the ResEdit tool show interesting Type / Creator codes. If we look at the hexadecimals values for the codes. The first four bytes are the Type code and the second set of 4 bytes are the Creator code.

xattr -p com.apple.FinderInfo "Sample Report" 0000 0A 57 50 31 0A 1A 57 50 01 00 00 00 00 00 00 00 .WP1..WP........ getfileinfo "Sample Report" file: "Sample Report" type: "\nWP1" creator: "\n\^ZWP" attributes: avbstclInmedz created: 10/13/1990 00:10:54 modified: 07/25/1991 11:58:20

The first thing to know is the encoding for all Type / Creator codes is MacRoman, so if we look up the hexadecimal code for “0A” we learn it is the character for a new Line Feed, why in the world would you use the line feed character? The developers must have had a sense of humor, or are psychopaths, and I’m leaning toward the latter. Trying to put this character into any sort of spreadsheet or text based document with other codes throws everything off! When I try and use a spreadsheet with a group of codes and then use a script to look them up on the command line I get crazy formatting. Not to mentioned the second character in the creator code is “1A” which is a substitute character.

This is just one example of crazy characters being used in Type / Creator codes. Stay tuned for more on these in future discussions.

Even though the Type / Creator codes are very useful in identification of this format, often times the Finder attribute is lost. This can happen if the file is moved off an HFS disk, usually a network or through the internet. Then all we have is the binary data fork and a file with no extension. So finding a signature to identify this format is useful.

hexdump -C "Sample Report" | head 00000000 00 12 cf fc 00 00 05 78 00 00 00 00 01 18 01 eb |.......x........| 00000010 ff ff ff c4 ff ff ff c4 00 00 02 82 00 00 02 28 |...............(| 00000020 00 00 00 00 00 00 00 00 00 00 05 76 00 00 00 30 |...........v...0| 00000030 00 00 02 70 00 aa 00 00 05 76 00 00 00 30 00 00 |...p.....v...0..| 00000040 02 70 00 aa 00 00 00 00 00 00 00 00 00 00 00 00 |.p..............| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 12 |................| 00000070 d1 2c 00 00 05 3f 00 00 00 00 01 00 06 47 65 6e |.,...?.......Gen| 00000080 65 76 61 00 00 00 00 00 00 00 00 00 00 00 00 00 |eva.............| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0c |................| hexdump -C WC-s01 | head 00000000 03 df cd 9c 00 00 00 09 00 00 00 00 02 c3 02 64 |...............d| 00000010 00 00 00 00 00 00 00 00 00 00 00 59 00 00 02 64 |...........Y...d| 00000020 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 00 |................| 00000030 00 00 00 00 00 79 00 00 00 07 00 00 00 00 00 00 |.....y..........| 00000040 00 00 00 79 00 00 00 00 00 00 00 00 00 00 00 00 |...y............| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 df |................| 00000070 cd 78 00 00 00 00 00 00 00 00 01 00 06 47 65 6e |.x...........Gen| 00000080 65 76 61 00 00 00 00 00 00 00 00 00 00 00 00 00 |eva.............| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0c |................|

Looking at the hexadecimal values of the header of a couple samples doesn’t initially look promising, the first few bytes are very different meaning there is no magic bytes at the beginning of the file. In fact the only thing the same is the mention of the Geneva font used in the document. Looking further into the files.

hexdump -C "Sample Report" 00000000 00 12 cf fc 00 00 05 78 00 00 00 00 01 18 01 eb |.......x........| ... 000000b0 00 00 00 00 00 00 00 02 84 28 ff ff 00 00 00 00 |.........(......| 000000c0 00 17 4e 26 00 12 d2 fc 00 00 00 00 00 12 d0 88 |..N&............| hexdump -C WC-s01 00000000 03 df cd 9c 00 00 00 09 00 00 00 00 02 c3 02 64 |...............d| ... 000000b0 00 00 00 00 00 00 00 02 84 28 ff ff 00 00 00 00 |.........(......| 000000c0 03 e3 a5 70 03 df cd 8c 00 00 00 00 03 df cd 64 |...p...........d| hexdump -C Stationery 00000000 00 12 d2 e8 00 00 00 02 00 00 00 00 01 17 01 ec |................| ... 000000b0 00 00 00 00 00 00 00 02 84 20 ff ff 00 00 00 00 |......... ......| 000000c0 00 17 56 f8 00 12 cd f8 00 00 00 00 00 12 ce 40 |..V............@|

The only bytes I could find near the beginning that seemed semi consistent is the highlighted bytes above. I did however notice some consistent bytes at the end of each of the files.

hexdump -C "Sample Report" | tail 00007250 e5 00 02 e5 00 02 e5 00 02 e5 00 02 e5 00 02 e5 |................| 00007260 00 02 e5 00 02 e5 00 02 e5 00 02 e5 00 ff 00 07 |................| 00007270 00 00 00 05 04 31 2e 30 30 00 09 00 00 00 05 04 |.....1.00.......| 00007280 31 2e 30 30 00 08 00 00 00 05 04 31 2e 30 30 00 |1.00.......1.00.| 00007290 0a 00 00 00 05 04 31 2e 30 30 00 0b 00 00 00 02 |......1.00......| 000072a0 00 00 00 0c 00 00 00 10 00 00 00 00 00 00 00 00 |................| 000072b0 00 00 00 01 00 00 00 01 00 11 00 00 00 08 00 2b |...............+| 000072c0 00 03 01 52 01 fd 00 13 00 00 00 02 00 00 7f ff |...R............| 000072d0 00 00 00 00 00 00 72 dc 7f ff ff ff |......r.....| hexdump -C WC-s01 | tail 000003c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000003d0 01 00 00 80 0c 00 08 00 05 00 00 00 00 01 d2 03 |................| 000003e0 ee dc 3e 00 00 00 00 00 07 00 00 00 01 00 00 09 |..>.............| 000003f0 00 00 00 01 00 00 08 00 00 00 01 00 00 0a 00 00 |................| 00000400 00 01 00 00 0b 00 00 00 02 00 00 00 0c 00 00 00 |................| 00000410 10 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| 00000420 01 00 11 00 00 00 08 00 2b 00 c7 02 fd 03 3a 00 |........+.....:.| 00000430 13 00 00 00 02 00 00 7f ff 00 00 00 00 00 00 04 |................| 00000440 45 7f ff ff ff |E....| hexdump -C Stationery | tail 000039a0 00 02 e3 00 02 e3 00 02 e3 00 02 e3 00 02 e3 00 |................| 000039b0 02 e3 00 02 e3 00 02 e3 00 02 e3 00 02 e3 00 ff |................| 000039c0 00 07 00 00 00 05 04 31 2e 30 30 00 09 00 00 00 |.......1.00.....| 000039d0 05 04 31 2e 30 30 00 08 00 00 00 05 04 31 2e 30 |..1.00.......1.0| 000039e0 30 00 0a 00 00 00 05 04 31 2e 30 30 00 0b 00 00 |0.......1.00....| 000039f0 00 02 00 00 00 0c 00 00 00 10 00 00 00 00 00 00 |................| 00003a00 00 00 00 00 00 01 00 00 00 01 00 11 00 00 00 08 |................| 00003a10 00 2b 00 03 01 51 01 fe 00 13 00 00 00 02 00 00 |.+...Q..........| 00003a20 7f ff 00 00 00 00 00 00 3a 2e 7f ff ff ff |........:.....|

The four bytes at the end of each file by themselves would not be a good signature as there are many formats which end with a few “FF” sequences. But maybe combined with bytes near the beginning, a signature might be found. I added a couple samples to my Github page if you would like to take a look. In order to retain the extended attributes, I encoded the files as MacBinary.

lsar -L "Sample Report.bin" Sample Report.bin: MacBinary Sample Report: Name: Sample Report Size: 29.4 KB (29,404 bytes) Compressed size: 29.4 KB (29,440 bytes) Last modified: Thursday, July 25, 1991 at 12:58:20 PM Created: Saturday, October 13, 1990 at 1:10:54 AM Mac OS type code: ?WP1 (0x0a575031) Mac OS creator code: ??WP (0x0a1a5750) Mac OS Finder flags: 0x0100 Index in file: 0 Length of embedded data: 29404 Start of embedded data: 128 Original archive entry: Is an embedded MacBinary file: Yes

Melco

Obsolete Thor - 1 maart 2024 - 7:58am

I came across another CD-ROM the other day with some fun embroidery formats. It includes the HUS format I recently posted on, plus a few more.

Like I mentioned before, this is a format genre which is not normally seen in the archival world, but is fun to take a peek into the world of embroidery formats. The HUS format from Husqvarna was a unique proprietary format, but looking at another in this set, we see a common container format.

filename : 'CH1604.ofm' filesize : 25600 modified : 2002-04-29T05:58:26-06:00 errors : matches : - ns : 'pronom' id : 'fmt/111' format : 'OLE2 Compound Document Format' version : mime : class : 'Text (Structured)' basis : 'byte match at 0, 30'

First, what is an OFM file? It is the native format for Melco branded embroidery machines. They have been around for a few years. Melco has been around since 1972, but i’m sure the format is much newer. The fact that it is in an OLE container would indicate it was created in the mid 1990’s.

Looking inside the OLE container:

Path = CH1604.ofm Type = Compound Physical Size = 25600 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ ..... 19171 19456 EdsIV Object ..... 2502 2560 Design Icon ..... 130 192 Design Status ------------------- ----- ------------ ------------ ------------------------ 21803 22208 3 files

The EdsIV Object seems specific. Looking back at the web archive it looks like EDS IV was software available for the Melco products. In a user manual there are three formats associated with the software:

  • .CND – Condensed Format
  • .EXP – Expanded Format
  • .OFM – Project (Layout format)

The EdsIV Object file is unique and will work well for identification. There also seems to be some common patterns within the file that can further the correct identification.

hexdump -C EdsIV Object | head 00000000 03 00 00 00 03 00 00 00 00 00 00 00 00 00 ff ff |................| 00000010 0b 00 0c 00 43 50 72 6a 44 65 66 61 75 6c 74 73 |....CPrjDefaults| 00000020 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 f0 3f 28 00 00 00 01 00 00 00 |.......?(.......| 00000040 7f 00 00 00 00 00 00 00 00 00 39 40 00 00 00 00 |..........9@....| 00000050 00 00 10 40 00 00 00 00 00 00 00 00 00 00 00 00 |...@............| 00000060 00 00 00 00 00 00 00 00 00 00 59 40 04 00 00 00 |..........Y@....| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 80 51 40 |..............Q@| 00000080 00 00 00 00 00 00 3e 40 00 00 00 00 00 00 2e 40 |......>@.......@| 00000090 00 00 00 00 00 80 56 40 00 00 00 00 00 80 51 40 |......V@......Q@|

The CND and EXP formats are a different matter. I ran Tridscan across all the CND samples and it could not detect one common pattern among them all.

python tridscan.py *.csd TrIDScan/Py v2.02 - (C) 2015-2016 By M.Pontello File(s) to scan found: 60 Scanning for patterns... Checking file 1/60 './Cf0103.csd' Checking file 2/60 './Cr0005.csd' Pattern(s) found: 11 Checking file 3/60 './Fd0106.csd' tridscan.py: Error: no patterns found!

Being a condensed format, I gather it might have some compression which makes for a difficult binary file to identify.

The EXP format on the other hand has a short pattern at the beginning:

hexdump -C CF0103.EXP | head 00000000 80 02 00 00 80 02 18 e7 80 02 19 e6 80 02 19 e6 |................| 00000010 80 02 19 e7 80 02 19 e6 80 02 19 e6 80 02 19 e6 |................| 00000020 80 02 19 e7 80 02 19 e6 80 02 19 e6 80 02 18 e7 |................| 00000030 00 00 fc 00 04 00 fc 00 04 ff fc 01 ed 00 ec 00 |................| 00000040 21 21 df de da 01 15 14 15 15 15 15 eb eb eb eb |!!..............| 00000050 eb eb da 00 17 17 17 17 17 18 17 17 ea e9 e9 e9 |................| 00000060 e9 e8 e9 e9 ed 00 ec 00 18 18 19 19 18 19 19 19 |................| 00000070 18 18 e8 e8 e8 e7 e7 e7 e8 e7 e8 e8 fa 01 20 00 |.............. .| 00000080 21 00 20 01 21 00 20 00 f8 1e f7 1e f7 1f f7 1e |!. .!. .........| 00000090 da 00 e6 e5 e5 e5 e5 e4 e5 e5 1a 1b 1b 1b 1b 1c |................|

Currently Melco distributes a different software for use with their embroidery machines. Their DesignShop software also works with the OFM format. Downloading a copy of version 11 and using the trial version I get access to a few OFM sample files. Let’s see if they are the same.

hexdump -C BUBBLEBOY1.ofm | head 00000000 52 49 46 46 86 e5 01 00 4f 46 4d 38 76 72 73 6e |RIFF....OFM8vrsn| 00000010 08 00 00 00 39 00 2e 00 30 00 30 00 6e 6f 74 65 |....9...0.0.note| 00000020 a8 00 00 00 ff fe ff 52 44 00 69 00 67 00 69 00 |.......RD.i.g.i.| 00000030 74 00 69 00 7a 00 65 00 72 00 20 00 3a 00 20 00 |t.i.z.e.r. .:. .| 00000040 41 00 45 00 30 00 38 00 33 00 0d 00 0a 00 46 00 |A.E.0.8.3.....F.| 00000050 61 00 62 00 72 00 69 00 63 00 20 00 3a 00 20 00 |a.b.r.i.c. .:. .| 00000060 54 00 77 00 69 00 6c 00 6c 00 20 00 0d 00 0a 00 |T.w.i.l.l. .....| 00000070 4d 00 45 00 4c 00 43 00 4f 00 20 00 2d 00 20 00 |M.E.L.C.O. .-. .| 00000080 41 00 43 00 54 00 49 00 4f 00 4e 00 20 00 49 00 |A.C.T.I.O.N. .I.| 00000090 4c 00 4c 00 55 00 53 00 54 00 52 00 41 00 54 00 |L.L.U.S.T.R.A.T.|

Well that is very different than the earlier example. We can see right away this is a different type of file, in fact the first few bytes tells us this another container format. The Resource Interchange File Format, is used in many various file formats, the most popular are WAVE, AVI, and CorelDRAW. It is a chunk based format and there are a few tools we can use to look closer.

Riffpad can open the file, but claims there is some extra data at the end. It does see four chunks and it gives us the code “OFM8”, which is what identifies this particular RIFF type.

I was also able to get some samples of version 10 of DesignShop and found they are the same OLE container. Also has the same “EdsIV Object” within the container. There is a small paragraph in the EdsIV user manual that indicates there are some versioning within the OFM format.

If you open an EDS III .OFM file and save it, it will be converted into an EDS IV .OFM file, which is no longer readable in EDS III.
Files saved in this version of EDS IV cannot be read by previous versions of EDS IV.

This version of EDS IV is capable of producing two types of OFM files. Files saved as “Melco Project File (.ofm)” can only be read with this version or higher versions of EDS IV. Files saved as “Melco Version 2.00 (.ofm)” can be read by any EDS IV user that has version 2.00.006 or higher software.

It never ceases to amaze me how many formats use the Compound Object Container format. Seems like more and more are documented often. For now, I made a signature to identify the OLE and RIFF version of OFM. I’ll keep my eye out for the older EDS III and other related formats. As always, you can find my signatures and a sample file on my GitHub.

PowerBI

Obsolete Thor - 23 februari 2024 - 7:52am

I think when most of us have some data to sort or make sense of, we tend to gravitate toward a spreadsheet. Using Excel or LibreOffice, or if you really like to party, OpenRefine. There are plenty of meme’s out there representing the frustration people have with bugs, features and limitations of Excel specifically.

Optimist: The glass is ½ full.
Pessimist: The glass is ½ empty.
Excel: The glass is January 2nd.

— jxf@mastodon.social (@jxxf) May 7, 2022

There are more tools out there for making sense of data, one some people have access to is Microsoft’s more advanced PowerBI tool. Marketed as a Data Visualization tool it is accessible to many with a Office 365 subscription. It offers expanded features than excel and isn’t as limited in row maximums.

PowerBi was recently the topic of a Code4Lib editorial issue. The writer of an article for their journal posted two PowerBI datasets which a reader later noticed had private data. After some miscommunications and misunderstandings an open letter was drafted and received some support. Code4Lib did release a statement and lessons were learned.

One statement from the Code4Lib staff caught my eye. “The released files were in a proprietary file format, Microsoft Power BI, with which none of the editors have experience.”

We all use tools for our jobs we are most familiar or available to us. No one can be an expert in all file formats. Some us try, but things change so fast it is impossible. But, we can do more in documenting and making formats identifiable through the tools we use for digital preservation. The File Format Wiki and PRONOM have had no mention of Power BI, so let’s change that.

Microsoft Power BI was released in 2011 and has been part of the Microsoft Power Platform. Power BI can gather data from many sources. The software can be accessed in the Office 365 cloud, but also using a Desktop application. In the desktop application, all the data sources and connections are stored in a single file with the extension PBIX. But there are other related formats.

filename : 'PowerBI-Test.pbix' filesize : 401951 modified : 2024-02-22T11:29:41-07:00 errors : matches : - ns : 'pronom' id : 'x-fmt/263' format : 'ZIP Format' version : mime : 'application/zip' class : 'Aggregate' basis : 'byte match at [[0 4] [401867 3] [401929 4]]' warning : 'extension mismatch' Path = PowerBI-Test.pbix Type = zip Physical Size = 401951 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2024-02-22 18:29:40 ..... 8 10 Version 2024-02-22 18:29:40 ..... 488 230 [Content_Types].xml 2024-02-22 18:29:40 ..... 397312 397312 DataModel 2024-02-22 18:29:40 ..... 2848 882 Report/Layout 2024-02-22 18:29:40 ..... 328 161 Settings 2024-02-22 18:29:40 ..... 136 120 Connections 2024-02-22 18:29:40 ..... 18972 1733 Report/StaticResources/SharedResources/BaseThemes/CY24SU02.json 2024-02-22 18:29:40 ..... 358 357 SecurityBindings ------------------- ----- ------------ ------------ ------------------------ 2024-02-22 18:29:40 420450 400805 8 files

Just like many modern Microsoft formats it is a ZIP container with a mixture of XML and JSON. There is also a DataModel file along with Settings and Connections. A quick peek at some of the contents shows us:

hexdump -C PowerBI-Test/Version | head 00000000 31 00 2e 00 32 00 38 00 |1...2.8.| hexdump -C PowerBI-Test/DataModel | head 00000000 ff fe 53 00 54 00 52 00 45 00 41 00 4d 00 5f 00 |..S.T.R.E.A.M._.| 00000010 53 00 54 00 4f 00 52 00 41 00 47 00 45 00 5f 00 |S.T.O.R.A.G.E._.| 00000020 53 00 49 00 47 00 4e 00 41 00 54 00 55 00 52 00 |S.I.G.N.A.T.U.R.| 00000030 45 00 5f 00 29 00 21 00 40 00 23 00 24 00 25 00 |E._.).!.@.#.$.%.| 00000040 5e 00 26 00 2a 00 28 00 3c 00 42 00 61 00 63 00 |^.&.*.(.<.B.a.c.| 00000050 6b 00 75 00 70 00 4c 00 6f 00 67 00 3e 00 3c 00 |k.u.p.L.o.g.>.<.| 00000060 42 00 61 00 63 00 6b 00 75 00 70 00 52 00 65 00 |B.a.c.k.u.p.R.e.| 00000070 73 00 74 00 6f 00 72 00 65 00 53 00 79 00 6e 00 |s.t.o.r.e.S.y.n.| 00000080 63 00 56 00 65 00 72 00 73 00 69 00 6f 00 6e 00 |c.V.e.r.s.i.o.n.| 00000090 3e 00 31 00 34 00 30 00 3c 00 2f 00 42 00 61 00 |>.1.4.0.<./.B.a.| hexdump -C PowerBI-Test/\[Content_Types\].xml | head 00000000 ef bb bf 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e |...<?xml version| 00000010 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d |="1.0" encoding=| 00000020 22 75 74 66 2d 38 22 3f 3e 3c 54 79 70 65 73 20 |"utf-8"?><Types | 00000030 78 6d 6c 6e 73 3d 22 68 74 74 70 3a 2f 2f 73 63 |xmlns="http://sc| 00000040 68 65 6d 61 73 2e 6f 70 65 6e 78 6d 6c 66 6f 72 |hemas.openxmlfor| 00000050 6d 61 74 73 2e 6f 72 67 2f 70 61 63 6b 61 67 65 |mats.org/package| 00000060 2f 32 30 30 36 2f 63 6f 6e 74 65 6e 74 2d 74 79 |/2006/content-ty| 00000070 70 65 73 22 3e 3c 44 65 66 61 75 6c 74 20 45 78 |pes"><Default Ex| 00000080 74 65 6e 73 69 6f 6e 3d 22 6a 73 6f 6e 22 20 43 |tension="json" C| 00000090 6f 6e 74 65 6e 74 54 79 70 65 3d 22 22 20 2f 3e |ontentType="" />|

So it looks like the ZIP structure follows the standard for OpenXML packages as it contains a “[Content_Types].xml” file. So using this XML alone would clash with too many other formats. From what I could find the “DataModel” file is what stores the data is more unique to this format, even though the name is pretty generic. Using a string within the file would probably help be more accurate. The “DataModel” file does have unicode double byte strings we can use. “STREAM_STORAGE_SIGNATURE” seems like a unique enough string to use, but it looks like it may not be unique to PBIX. Looks like the “DataModel” file is a Microsoft “MS-XLDM” file format and is a “Spreadsheet Data Model File Format“.

There is a variation to the DataModel file and I am not sure when the standard is used verses this variation, “This backup was created using XPress9 compression”. Not sure if it is versioning or how the file is saved, but they both seem to function correctly.

hexdump -C DataModel | head 00000000 54 00 68 00 69 00 73 00 20 00 62 00 61 00 63 00 |T.h.i.s. .b.a.c.| 00000010 6b 00 75 00 70 00 20 00 77 00 61 00 73 00 20 00 |k.u.p. .w.a.s. .| 00000020 63 00 72 00 65 00 61 00 74 00 65 00 64 00 20 00 |c.r.e.a.t.e.d. .| 00000030 75 00 73 00 69 00 6e 00 67 00 20 00 58 00 50 00 |u.s.i.n.g. .X.P.| 00000040 72 00 65 00 73 00 73 00 39 00 20 00 63 00 6f 00 |r.e.s.s.9. .c.o.| 00000050 6d 00 70 00 72 00 65 00 73 00 73 00 69 00 6f 00 |m.p.r.e.s.s.i.o.| 00000060 6e 00 2e 00 00 00 00 b0 07 00 76 75 00 00 2a d7 |n.........vu..*.| 00000070 86 4e 00 b0 07 00 ad ab 03 00 2c cb 06 00 00 00 |.N........,.....| 00000080 00 00 f8 6c 86 7f 00 00 00 00 68 01 56 6e 00 00 |...l......h.Vn..| 00000090 20 82 67 49 52 06 00 f6 ab fc fc fe 2d f6 da 8b | .gIR.......-...|

After a bit of digging it seems like the MS-XLDM format can be found within an XSLX file. I found an example with these datasets. Within an XSLX there can be a found a file “xl/model/item.data” and it has the same structure as DataModel within a PBIX.

hexdump -C Customer Profitability Sample-no-PV/xl/model/item.data | head 00000000 ff fe 53 00 54 00 52 00 45 00 41 00 4d 00 5f 00 |..S.T.R.E.A.M._.| 00000010 53 00 54 00 4f 00 52 00 41 00 47 00 45 00 5f 00 |S.T.O.R.A.G.E._.| 00000020 53 00 49 00 47 00 4e 00 41 00 54 00 55 00 52 00 |S.I.G.N.A.T.U.R.| 00000030 45 00 5f 00 29 00 21 00 40 00 23 00 24 00 25 00 |E._.).!.@.#.$.%.| 00000040 5e 00 26 00 2a 00 28 00 3c 00 42 00 61 00 63 00 |^.&.*.(.<.B.a.c.| 00000050 6b 00 75 00 70 00 4c 00 6f 00 67 00 3e 00 3c 00 |k.u.p.L.o.g.>.<.| 00000060 42 00 61 00 63 00 6b 00 75 00 70 00 52 00 65 00 |B.a.c.k.u.p.R.e.| 00000070 73 00 74 00 6f 00 72 00 65 00 53 00 79 00 6e 00 |s.t.o.r.e.S.y.n.| 00000080 63 00 56 00 65 00 72 00 73 00 69 00 6f 00 6e 00 |c.V.e.r.s.i.o.n.| 00000090 3e 00 31 00 35 00 30 00 3c 00 2f 00 42 00 61 00 |>.1.5.0.<./.B.a.|

Because this file has a different filename and is in a different path, using “DataModel” should keep identification specific to a PBIX file.

The Power BI Report has a template option. This format uses the .PBIT extension and doesn’t contain any data only a template to use with other data. The structure is roughly the same, but doesn’t contain the “DataModel” file, but “DataModelSchema”, which appears to be a JSON file.

hexdump -C DataModelSchema | head 00000000 7b 00 0d 00 0a 00 20 00 20 00 22 00 6e 00 61 00 |{..... . .".n.a.| 00000010 6d 00 65 00 22 00 3a 00 20 00 22 00 38 00 36 00 |m.e.".:. .".8.6.| 00000020 65 00 34 00 32 00 62 00 33 00 30 00 2d 00 30 00 |e.4.2.b.3.0.-.0.| 00000030 34 00 34 00 33 00 2d 00 34 00 36 00 30 00 63 00 |4.4.3.-.4.6.0.c.| 00000040 2d 00 61 00 36 00 66 00 36 00 2d 00 36 00 66 00 |-.a.6.f.6.-.6.f.| 00000050 34 00 35 00 35 00 66 00 64 00 64 00 31 00 61 00 |4.5.5.f.d.d.1.a.| 00000060 35 00 36 00 22 00 2c 00 0d 00 0a 00 20 00 20 00 |5.6.".,..... . .| 00000070 22 00 63 00 6f 00 6d 00 70 00 61 00 74 00 69 00 |".c.o.m.p.a.t.i.| 00000080 62 00 69 00 6c 00 69 00 74 00 79 00 4c 00 65 00 |b.i.l.i.t.y.L.e.| 00000090 76 00 65 00 6c 00 22 00 3a 00 20 00 31 00 35 00 |v.e.l.".:. .1.5.|

The DataModelSchema JSON has some plain text strings which could be used for identification. Later in the file there is a string, “defaultPowerBIDataSourceVersion“.

000001c0 20 00 20 00 20 00 7d 00 2c 00 0d 00 0a 00 20 00 | . . .}.,..... .| 000001d0 20 00 20 00 20 00 22 00 64 00 65 00 66 00 61 00 | . . .".d.e.f.a.| 000001e0 75 00 6c 00 74 00 50 00 6f 00 77 00 65 00 72 00 |u.l.t.P.o.w.e.r.| 000001f0 42 00 49 00 44 00 61 00 74 00 61 00 53 00 6f 00 |B.I.D.a.t.a.S.o.| 00000200 75 00 72 00 63 00 65 00 56 00 65 00 72 00 73 00 |u.r.c.e.V.e.r.s.| 00000210 69 00 6f 00 6e 00 22 00 3a 00 20 00 22 00 70 00 |i.o.n.".:. .".p.| 00000220 6f 00 77 00 65 00 72 00 42 00 49 00 5f 00 56 00 |o.w.e.r.B.I._.V.| 00000230 33 00 22 00 2c 00 0d 00 0a 00 20 00 20 00 20 00 |3.".,..... . . .|

Seems like the best identification of the template format.

As usual you can find my signature proposal on my GitHub along with a couple “safe” samples.

Compact Pro

Obsolete Thor - 16 februari 2024 - 7:37am

In the Classic Macintosh world back in the day it was important to use compression tools to keep files small and also allow you to send Macintosh files through the internet. Floppy disks could only hold a small amount of data so utilizing compression was a way to use the space effectively. I have already made posts on BINHEX and DiskDoubler which where also used for similar purposes. The most popular compression software for Macintosh is Stuffit, which used .SIT and .SEA extensions. One of the other often used tools was called Compact Pro.

Compact Pro, originally know as Compactor, developed by Bill Goodman in the early 1990’s and was quite popular. It was generally faster in its ability to compress and decompress files on the Macintosh. By 1995 the last version was released and by 2002 the software was officially discontinued.

Also, Macintosh files often contain a Resource Fork to go along with the data. Archiving files within a Compact Pro archive could contain both forks along with creation, modification dates and the finder Type/Creator codes. Then an archive could be transferred through the internet or on a non Macintosh file system without loosing these key bits of information.

You can see from the image below, the compression of a PICT file retained the resource fork and finder data with an impressive 60% savings in size.

PICT File within a Compact Pro archive.

Compact Pro could also segment an archive into multiple parts. This was advantageous when needing to copy a larger file on to a set of floppy disks, or for transferring smaller files through the internet and combined later. Segments would be extracted by opening the final segment.

The other nifty feature of Compact Pro is it could create a Self-Extracting Archive. Archiving as an SEA, would compress the file into an archive, but contained within an application which could extract the archive without the use of the the full Compact Pro application. This was used mainly for use on distributed Macintosh file system disks as the application could only be run on a Mac OS system.

Let’s look at the actual Compact Pro file format.

hexdump -C CompactProTest.cpt | head 00000000 01 01 6f 07 00 00 00 cb 80 35 04 56 00 60 50 50 |..o......5.V.`PP| 00000010 00 50 50 00 60 05 60 50 00 00 00 00 00 00 00 00 |.PP.`.`P........| 00000020 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00 00 |..`.............| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 |...............0| 00000040 00 00 04 60 00 05 00 06 00 55 40 00 00 00 00 00 |...`.....U@.....| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 60 00 00 00 |............`...| 00000070 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 |.....@..........| 00000080 00 00 00 00 00 00 00 00 05 08 00 01 20 00 00 00 |............ ...| 00000090 00 20 01 10 88 c1 04 f6 05 41 3e 47 56 e4 09 5f |. .......A>GV.._| hexdump -C CP-s01.cpt | head 00000000 01 01 90 69 00 00 10 55 80 46 78 67 77 67 78 67 |...i...U.Fxgwgxg| 00000010 86 88 09 89 9a 70 8b 90 ba 97 0a a7 90 87 a6 bb |.....p..........| 00000020 90 8a a0 90 ab b7 aa a0 a0 80 a8 a0 98 89 00 9a |................| 00000030 99 80 98 99 69 a9 60 0a 79 ab 86 0a b7 98 a7 90 |....i.`.y.......| 00000040 98 a0 97 7a 90 00 09 00 07 77 80 00 aa 9b 00 ba |...z.....w......| 00000050 99 a0 90 00 08 08 a0 8a 08 a0 00 00 b9 b0 09 7a |...............z| 00000060 08 0a aa 90 0a aa 00 00 98 60 90 b9 9b 9a 9a 57 |.........`.....W| 00000070 a8 88 bb aa aa 00 00 77 89 7a 09 b9 89 79 9b 78 |.......w.z...y.x| 00000080 86 80 8a 96 65 55 56 66 65 17 00 02 24 35 46 47 |....eUVfe...$5FG| 00000090 57 67 67 78 88 8a 70 80 80 90 00 a0 90 a0 00 00 |Wggx..p.........|

The file format is not recognized by PRONOM, and as you can see from the headers above, identification is not easy as there are no magic bytes. Using Unarchiver they identify as Compact Pro.

lsar CP-s01.cpt CP-s01.cpt: Compact Pro CP.PICT

The only bytes which seem to be consistent is the first two, but “01 01” is not a signature which is unique to Compact Pro. The Unarchiver uses a more complicated calculation of file size and the CRC for identification, from what I can tell.

hexdump -C CP-s01.sea | head 00000000 01 01 8a 89 00 00 10 55 80 46 78 67 77 67 78 67 |.......U.Fxgwgxg| 00000010 86 88 09 89 9a 70 8b 90 ba 97 0a a7 90 87 a6 bb |.....p..........| 00000020 90 8a a0 90 ab b7 aa a0 a0 80 a8 a0 98 89 00 9a |................| 00000030 99 80 98 99 69 a9 60 0a 79 ab 86 0a b7 98 a7 90 |....i.`.y.......| 00000040 98 a0 97 7a 90 00 09 00 07 77 80 00 aa 9b 00 ba |...z.....w......| 00000050 99 a0 90 00 08 08 a0 8a 08 a0 00 00 b9 b0 09 7a |...............z| 00000060 08 0a aa 90 0a aa 00 00 98 60 90 b9 9b 9a 9a 57 |.........`.....W| 00000070 a8 88 bb aa aa 00 00 77 89 7a 09 b9 89 79 9b 78 |.......w.z...y.x| 00000080 86 80 8a 96 65 55 56 66 65 17 00 02 24 35 46 47 |....eUVfe...$5FG| 00000090 57 67 67 78 88 8a 70 80 80 90 00 a0 90 a0 00 00 |Wggx..p.........|

The self extracting archive has the same basic structure. I have also noticed on all the archive samples I have, the byte at offset 8 is always “80”. This could be significant.

Another thing to note, when looking at a segmented archive, the first two bytes are in sequence, 0101 for the first, 0102 for the second and so on.

CompactPro could use some further investigation. You can find quite a few on site such as: https://websites.umich.edu/~archive/mac

For now, it would be good to add the CPT extension to PRONOM with the name CompactPro Archive.

Finale

Obsolete Thor - 9 februari 2024 - 8:10am

The amazing Ashley recently did a little writeup on the Sibelius music notation software. I thought I would take the opportunity to talk about another music notation software which needs a little update. Finale was created in 1987 for the Macintosh by a company called Coda Music and became quite popular with musicians and composers. The ability to use a computer to typeset a musical score was a huge advancement. This was all possible by the use of music notation fonts.

Finale was originally written by Coda Music Technology, owned for a time by Net4Music, now currently owned by MakeMusic. Over the years there has been additional products developed along side Finale.

The first version of Finale was developed for the Macintosh and didn’t have an extension. But by version 3.5 there was a comparable Windows version and the use of the extension .MUS. In order to share the files between the different platforms Finale also created an ETF file, which instead of the binary MUS the ETF is a plain text “transportable” file.

Finale 1.0 HyperCard HelpStack

Both formats are based on the Enigma or “Environment for Notation Intuitive Graphic Music Algorithms” format. These formats were last used with Finale 2012 when a new format took over in 2014. Let’s start from the beginning.

hexdump -C Finale1-s01 | head 00000000 46 69 6e 61 6c 65 aa 20 31 2e 30 2e 30 20 45 4e |Finale. 1.0.0 EN| 00000010 49 47 41 20 53 74 72 75 63 74 75 72 65 73 20 43 |IGA Structures C| 00000020 6f 70 79 72 69 67 68 74 20 31 39 38 37 20 62 79 |opyright 1987 by| 00000030 20 43 6f 64 61 2e 20 41 6c 6c 20 72 69 67 68 74 | Coda. All right| 00000040 73 20 72 65 73 65 72 76 65 64 2e 20 50 61 74 65 |s reserved. Pate| 00000050 6e 74 20 50 65 6e 64 69 6e 67 00 00 00 00 00 00 |nt Pending......| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

This is a sample of the very first version of Finale. Currently not identifiable by PRONOM. You may also noticed in this version it was called ENIGA.

hexdump -C Finale2.6.3 | head 00000000 46 69 6e 61 6c 65 28 54 4d 29 20 31 2e 38 20 43 |Finale(TM) 1.8 C| 00000010 6f 70 79 72 69 67 68 74 20 31 39 38 37 20 62 79 |opyright 1987 by| 00000020 20 43 6f 64 61 2e 20 41 6c 6c 20 72 69 67 68 74 | Coda. All right| 00000030 73 20 72 65 73 65 72 76 65 64 2e 00 00 00 00 00 |s reserved......| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000200 00 00 00 09 00 00 02 00 00 00 46 4e 50 65 74 72 |..........FNPetr|

A file from version 2.6.3 shows a different format structure, also not currently identified by PRONOM.

hexdump -C F35-s01.mus | head 00000000 45 4e 49 47 4d 41 20 42 49 4e 41 52 59 20 46 49 |ENIGMA BINARY FI| 00000010 4c 45 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |LE..............| 00000020 46 69 6e 61 6c 65 28 52 29 20 33 2e 35 20 43 6f |Finale(R) 3.5 Co| 00000030 70 79 72 69 67 68 74 20 28 63 29 20 31 39 39 35 |pyright (c) 1995| 00000040 20 43 6f 64 61 20 4d 75 73 69 63 20 54 65 63 68 | Coda Music Tech| 00000050 6e 6f 6c 6f 67 79 00 00 00 00 00 00 00 00 00 00 |nology..........| 00000060 00 02 00 00 00 00 7c 02 08 00 00 00 03 03 50 03 |......|.......P.| 00000070 46 49 4e 00 57 49 4e 00 02 04 50 03 03 03 50 03 |FIN.WIN...P...P.| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 7c 02 08 00 |............|...| 00000090 00 00 03 03 50 03 46 49 4e 00 57 49 4e 00 02 04 |....P.FIN.WIN...|

By Version 3 we see the format stabilize and this header is used until Finale 2012. There was other various products which also used the format so there is some variation.

hexdump -C Tutorial1a.mus | head 00000000 45 4e 49 47 4d 41 20 42 49 4e 41 52 59 20 46 49 |ENIGMA BINARY FI| 00000010 4c 45 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |LE..............| 00000020 50 72 69 6e 74 4d 75 73 69 63 28 52 29 20 32 30 |PrintMusic(R) 20| 00000030 31 30 20 43 6f 70 79 72 69 67 68 74 20 31 39 39 |10 Copyright 199| 00000040 38 2d 32 30 30 39 20 4d 61 6b 65 4d 75 73 69 63 |8-2009 MakeMusic| 00000050 20 49 6e 63 2e 00 00 00 00 00 00 00 00 00 00 00 | Inc............| 00000060 00 02 0e 01 00 00 6a 02 0e 00 00 00 04 02 02 0b |......j.........| 00000070 46 49 4e 00 57 49 4e 00 03 04 02 0b 0d 02 00 0b |FIN.WIN.........| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 6d 08 0d 00 |............m...| 00000090 00 00 31 02 00 0f 4e 54 52 00 4d 41 43 00 10 02 |..1...NTR.MAC...|

The current PRONOM identification for fmt/397 is looking for the “ENIGMA BINARY FILE” bytes but also the string “Finale(R)”, so this PrintMusic variation is not identified correctly.

Another format that is a little more rare to see, but is part of the Finale formats collection. Finale Performance Assessment File (.fpa) is an older format discontinued in 2007, but has a similar format. It was a tool similar to the current SmartMusic tool.

hexdump -C Tuba.FPA | head 00000000 46 49 4e 41 4c 45 20 50 45 52 46 4f 52 4d 41 4e |FINALE PERFORMAN| 00000010 43 45 20 41 53 53 45 53 53 4d 45 4e 54 00 00 00 |CE ASSESSMENT...| 00000020 46 69 6e 61 6c 65 28 52 29 20 32 30 30 35 20 43 |Finale(R) 2005 C| 00000030 6f 70 79 72 69 67 68 74 20 28 63 29 20 31 39 38 |opyright (c) 198| 00000040 37 2d 32 30 30 34 20 4d 61 6b 65 4d 75 73 69 63 |7-2004 MakeMusic| 00000050 21 20 49 6e 63 2e 00 6f 6c 6f 67 79 00 00 00 00 |! Inc..ology....| 00000060 00 02 06 00 00 00 68 06 09 00 00 00 16 02 00 09 |......h.........| 00000070 46 49 4e 00 57 49 4e 00 01 04 01 09 16 02 00 09 |FIN.WIN.........| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 68 07 0d 00 |............h...| 00000090 00 00 0a 01 00 0a 46 49 4e 00 57 49 4e 00 03 03 |......FIN.WIN...|

As for the Enigma Transportable File, there is a couple variations.

hexdump -C Finale1-s02.etf | head 00000000 45 4e 49 47 4d 41 20 74 72 61 6e 73 70 6f 72 74 |ENIGMA transport| 00000010 61 62 6c 65 20 66 69 6c 65 0d 45 4e 49 47 4d 41 |able file.ENIGMA| 00000020 20 53 74 72 75 63 74 75 72 65 73 20 43 6f 70 79 | Structures Copy| 00000030 72 69 67 68 74 20 31 39 38 37 20 62 79 20 43 6f |right 1987 by Co| 00000040 64 61 2e 20 41 6c 6c 20 52 69 67 68 74 73 20 52 |da. All Rights R| 00000050 65 73 65 72 76 65 64 2e 20 50 61 74 65 6e 74 20 |eserved. Patent | 00000060 50 65 6e 64 69 6e 67 2e 0d 0d 5e 6f 74 68 65 72 |Pending...^other| 00000070 73 0d 5e 46 4e 28 30 29 20 22 50 65 74 72 75 63 |s.^FN(0) "Petruc| 00000080 63 69 22 0d 5e 49 55 28 30 29 20 31 20 30 20 2d |ci".^IU(0) 1 0 -| 00000090 38 30 20 32 20 30 20 2d 33 31 36 20 0d 5e 49 55 |80 2 0 -316 .^IU| hexdump -C Finale37-Sample.etf | head 00000000 45 4e 49 47 4d 41 20 54 52 41 4e 53 50 4f 52 54 |ENIGMA TRANSPORT| 00000010 41 42 4c 45 20 46 49 4c 45 0d 0d 5e 68 65 61 64 |ABLE FILE..^head| 00000020 65 72 0d 5e 30 31 20 22 46 69 6e 61 6c 65 28 52 |er.^01 "Finale(R| 00000030 29 20 33 2e 37 20 43 6f 70 79 72 69 67 68 74 20 |) 3.7 Copyright | 00000040 28 63 29 20 31 39 38 37 2d 31 39 39 36 20 43 6f |(c) 1987-1996 Co| 00000050 64 61 20 4d 75 73 69 63 20 54 65 63 68 6e 6f 6c |da Music Technol| 00000060 6f 67 79 22 0d 5e 30 32 20 31 20 30 20 30 20 30 |ogy".^02 1 0 0 0| 00000070 20 0d 5e 30 33 20 31 32 30 20 31 31 20 39 20 0d | .^03 120 11 9 .| 00000080 5e 30 34 20 22 22 0d 5e 30 35 20 35 37 36 37 32 |^04 "".^05 57672| 00000090 32 30 34 20 0d 5e 30 36 20 22 46 49 4e 22 0d 5e |204 .^06 "FIN".^|

The current signature of ETF files is only able to correctly identify the later version of the string in all caps. The fmt/398 PRONOM ID could use an alternate signature to ensure all variations are identified correctly. There is a couple versions of the specification out there, but does not add much to what is known.

Starting in 2014 Finale starting using a new file format to store its notations. The native format now uses the MUSX extension. This new format uses a ZIP container to store all the data. Let’s take a look at the inside.

Path = Finale26-s01.musx Type = zip Physical Size = 98608 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2022-12-19 16:28:36 ..... 34 34 mimetype 2022-12-19 16:28:36 ..... 252 168 META-INF/container.xml 2022-12-19 16:28:36 ..... 347 218 NotationMetadata.xml 2022-12-19 16:28:36 ..... 1163 821 presets/10001.preset 2022-12-19 16:28:36 ..... 649 544 presets/1.preset 2022-12-19 16:28:36 ..... 96140 96155 score.dat ------------------- ----- ------------ ------------ ------------------------ 2022-12-19 16:28:36 98585 97940 6 files

The mimetype file appears to be “application/vnd.makemusic.notation”

The NotationMetadata.xml file stores much of the information needed and begins with the root tag.

<metadata version="26.2" xmlns="http://www.makemusic.com/2012/NotationMetadata">

It seems the presence of the NotationMetadata.xml file and the mimetype would be sufficient for identification in a container signature.

The current version of Finale can export to a few different “Music XML” versions. This includes MUSICXML, regular XML, and a compressed MXL file. The only one needs attention is the compressed MXL file and added to PRONOM. It already has a PUID, fmt/897, but no signature. Here is what it looks like inside the ZIP container.

Path = Finale27-s01.mxl Type = zip Physical Size = 4737 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2024-02-07 23:55:50 ..... 34 34 mimetype 2024-02-07 23:55:50 D.... 0 2 META-INF 2024-02-07 23:55:50 ..... 202 144 META-INF/container.xml 2024-02-07 23:55:50 ..... 18004 1996 Finale27-s01.musicxml 2024-02-07 23:55:52 ..... 17554 1953 p1.musicxml ------------------- ----- ------------ ------------ ------------------------ 2024-02-07 23:55:52 35794 4129 4 files, 1 folders

Looks like a standard identifiable MUSICXML file within the container with a mimetype of “application/vnd.recordare.musicxml”. The MUSICXML file will be impossible to use for identification because of the variable file name, but the mimetype should do just fine.

Hopefully that covers all the major formats that need identification. I saw on a list that I will soon be working on an old Macintosh which has hundreds of Finale files, I hope these updates cover those needs! Take a look at my GitHub for my signatures and plenty of samples.

SolidWorks

Obsolete Thor - 2 februari 2024 - 8:01am

The Digital Preservation Coalition recently released their tech watch report on Preserving Geospatial Data. This adds to reports on CAD, Construction, and others. One of the many areas of difficulties in Digital Preservation is understanding these areas of GIS, CAD, and 3D Modeling software and the file formats which belong to the software titles in this space. Not only are the file formats plentiful but the software is extensive and expensive. Documentation is lacking in understanding the different file formats associated with each software title. These tech watch reports are super useful, but more is needed to enhance the tools we use to better identify, validate, and transform these formats in order to preserve them long term.

I was processing some data sets from a recent collection added to our Scholarly repository and came across some models in the SolidWorks part format. I was surprised to find that this format has been around since 1995 and has yet to be added to the PRONOM registry.

SolidWorks is mechanical design software used for making 3D models which can be made to be individual parts, part of larger assemblies and added to drawings giving engineers access to 3D deisgn on their desktops. Bought by Dassault Systèmes in 1997, they are the makers of the CATIA CAD software. Since 1995 a new version was released almost every year, adding new features and improvements to the format. The original versions made use of the Microsoft OLE object container, but in 2015 the format shifted to a proprietary binary format. Let’s take a look at some samples.

There are three types of SolidWorks file formats, the SolidWork part (sldprt), the assembly (sldasm), and drawing (slddrw). The first versions of SolidWorks used prt, asm, and drw, but quickly added “sld” to avoid confusion with other CAD tools.

Path = flatann.sldprt Type = Compound Physical Size = 5851648 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 1997-08-05 08:34:21 D.... Contents ..... 60844 60928 Header ..... 45022 45056 Preview 1997-08-05 08:34:06 D.... ThirdPty ..... 237 256 [5]SummaryInformation 1997-08-05 08:34:18 D.... _MO_VERSION_629 ..... 157 192 _MO_VERSION_629/History ..... 126 128 [5]DocumentSummaryInformation ..... 996343 996352 Contents/Definition ..... 1003198 1003520 Contents/Default ..... 781536 781824 Contents/DisplayLists ------------------- ----- ------------ ------------ ------------------------ 1997-08-05 08:34:21 2887463 2888256 8 files, 3 folders

We can see this file is a compound (OLE) container file. It’s very useful to have a directory within the container with a version number. With this version number we can use the chart on the file format wiki to see this file was last modified by SolidWorks 97 Plus. The problem comes in when we look at an assembly file and compare.

Path = dispenser.sldasm Type = Compound Physical Size = 2143232 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 1997-03-19 17:29:16 D.... ThirdPty ..... 16812 16896 Preview ..... 4655 5120 Header 1997-09-04 15:30:48 D.... Contents ..... 1009461 1009664 Contents/DisplayLists ..... 23931 24064 Contents/Definition ..... 237 256 [5]SummaryInformation 1997-09-04 15:35:39 D.... _MO_VERSION_629 ..... 107 128 _MO_VERSION_629/History ..... 126 128 [5]DocumentSummaryInformation ------------------- ----- ------------ ------------ ------------------------ 1997-09-04 15:35:39 1055329 1056256 7 files, 3 folders

Almost the same contents, the same version directory. The only difference in content is the file Defaults in the Contents directory. But hard to know if all have the same difference. We will have to look closer at the individual files to hopefully find what sets the different formats apart.

The SolidWorks 2000 format added additional files to the container which can help.

Path = SW2000-s01.SLDPRT Type = Compound Physical Size = 20992 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2024-01-16 20:00:51 D.... _DL_VERSION_1500 ..... 5300 5632 Preview ..... 481 512 Header 2024-01-16 20:00:51 D.... Contents 2024-01-16 20:00:51 D.... ThirdPty ..... 4 64 Contents/OleItems ..... 69 128 Contents/CMgrHdr ..... 343 384 Contents/CMgr ..... 5456 5632 Contents/Config-0 ..... 592 640 Contents/DisplayLists__Zip ..... 957 960 Contents/Definition ..... 252 256 [5]SummaryInformation 2024-01-16 20:00:51 D.... _MO_VERSION_1500 ..... 840 896 _MO_VERSION_1500/Biography ..... 98 128 _MO_VERSION_1500/History ..... 148 192 [5]DocumentSummaryInformation ..... 120 128 ISolidWorksInformation ..... 6 64 _DL_VERSION_1500/DLUpdateStamp ------------------- ----- ------------ ------------ ------------------------ 2024-01-16 20:00:51 14666 15616 14 files, 4 folders

The introduction of the “ISolidWorksInformation” file helps give positive identification of the SolidWorks format.

hexdump -C SW2000-s01.SLDPRT/ISolidWorksInformation 00000000 fe ff 00 00 04 0a 02 00 02 d5 cd d5 9c 2e 1b 10 |................| 00000010 93 97 08 00 2b 2c f9 ae 01 00 00 00 05 d5 cd d5 |....+,..........| 00000020 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae 30 00 00 00 |........+,..0...| 00000030 48 00 00 00 02 00 00 00 02 00 00 00 18 00 00 00 |H...............| 00000040 00 00 00 00 24 00 00 00 1e 00 00 00 01 00 00 00 |....$...........| 00000050 00 00 00 00 02 00 00 00 00 00 00 00 01 00 00 00 |................| 00000060 00 02 00 00 00 0d 00 00 00 53 57 2d 46 69 6c 65 |.........SW-File| 00000070 20 4e 61 6d 65 00 00 00 | Name...| hexdump -C SW2000-s02.SLDASM/ISolidWorksInformation 00000000 fe ff 00 00 04 0a 02 00 02 d5 cd d5 9c 2e 1b 10 |................| 00000010 93 97 08 00 2b 2c f9 ae 01 00 00 00 05 d5 cd d5 |....+,..........| 00000020 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae 30 00 00 00 |........+,..0...| 00000030 6c 00 00 00 03 00 00 00 02 00 00 00 20 00 00 00 |l........... ...| 00000040 03 00 00 00 2c 00 00 00 00 00 00 00 34 00 00 00 |....,.......4...| 00000050 1e 00 00 00 01 00 00 00 00 00 00 00 0b 00 00 00 |................| 00000060 00 00 00 00 03 00 00 00 00 00 00 00 01 00 00 00 |................| 00000070 00 03 00 00 00 0e 00 00 00 41 73 73 65 6d 62 6c |.........Assembl| 00000080 79 20 74 79 70 65 00 02 00 00 00 0d 00 00 00 53 |y type.........S| 00000090 57 2d 46 69 6c 65 20 4e 61 6d 65 00 |W-File Name.| hexdump -C SW2000-s01.SLDDRW/ISolidWorksInformation 00000000 fe ff 00 00 04 0a 02 00 02 d5 cd d5 9c 2e 1b 10 |................| 00000010 93 97 08 00 2b 2c f9 ae 01 00 00 00 05 d5 cd d5 |....+,..........| 00000020 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae 30 00 00 00 |........+,..0...| 00000030 bc 01 00 00 0a 00 00 00 02 00 00 00 58 00 00 00 |............X...| 00000040 03 00 00 00 64 00 00 00 04 00 00 00 70 00 00 00 |....d.......p...| 00000050 05 00 00 00 7c 00 00 00 06 00 00 00 88 00 00 00 |....|...........| * 000000d0 05 00 00 00 52 27 a0 89 b0 e1 d1 3f 05 00 00 00 |....R'.....?....| 000000e0 51 6b 9a 77 9c a2 cb 3f 03 00 00 00 00 00 00 00 |Qk.w...?........| 000000f0 0a 00 00 00 00 00 00 00 01 00 00 00 00 04 00 00 |................| 00000100 00 15 00 00 00 53 57 2d 53 68 65 65 74 20 46 6f |.....SW-Sheet Fo| 00000110 72 6d 61 74 20 53 69 7a 65 00 05 00 00 00 11 00 |rmat Size.......| 00000120 00 00 53 57 2d 43 75 72 72 65 6e 74 20 53 68 65 |..SW-Current She| 00000130 65 74 00 08 00 00 00 19 00 00 00 41 63 74 69 76 |et.........Activ| 00000140 65 20 73 68 65 65 74 20 70 61 70 65 72 20 77 69 |e sheet paper wi| 00000150 64 74 68 00 02 00 00 00 0d 00 00 00 53 57 2d 46 |dth.........SW-F| 00000160 69 6c 65 20 4e 61 6d 65 00 09 00 00 00 14 00 00 |ile Name........| 00000170 00 41 63 74 69 76 65 20 73 68 65 65 74 20 48 65 |.Active sheet He| 00000180 69 67 68 74 00 07 00 00 00 0e 00 00 00 53 57 2d |ight.........SW-| 00000190 53 68 65 65 74 20 4e 61 6d 65 00 0a 00 00 00 18 |Sheet Name......| 000001a0 00 00 00 41 63 74 69 76 65 20 73 68 65 65 74 20 |...Active sheet | 000001b0 70 61 70 65 72 20 73 69 7a 65 00 03 00 00 00 0f |paper size......| 000001c0 00 00 00 53 57 2d 53 68 65 65 74 20 53 63 61 6c |...SW-Sheet Scal| 000001d0 65 00 06 00 00 00 10 00 00 00 53 57 2d 54 6f 74 |e.........SW-Tot| 000001e0 61 6c 20 53 68 65 65 74 73 00 00 00 |al Sheets...|

Starting in 2015 the format changed from an OLE container, to a binary file. Here is what the first few bytes look like from a 2015 file and a later 2023 file:

hexdump -C Bracket.SLDPRT | head 00000000 9f e4 18 9f 00 00 00 04 26 00 42 15 14 00 06 00 |........&.B.....| 00000010 08 00 06 00 40 a5 c3 a7 0e 51 5b 03 00 00 91 07 |....@....Q[.....| 00000020 00 00 0d 00 00 00 34 f6 e6 47 56 e6 47 37 f2 34 |......4..GV.G7.4| 00000030 d4 76 27 b5 55 5d 48 14 51 14 3e ab 2e f6 63 65 |.v'.U]H.Q.>...ce| 00000040 8b be 55 2e 42 0f 45 89 05 16 68 3a 93 eb f6 03 |..U.B.E...h:....| 00000050 ab 2e ae 89 d4 c2 3a ee ce ae 53 bb 3b cb cc 2e |......:...S.;...| 00000060 18 42 0d f8 16 41 3d 95 42 94 24 41 b0 3d 54 14 |.B...A=.B.$A.=T.| 00000070 fd 68 ad 52 0f 45 54 06 61 84 d1 0f 52 3e 44 20 |.h.R.ET.a...R>D | 00000080 48 af 6e e7 cc cc dd 3f 5d ea a5 3b dc 3d df f9 |H.n....?]..;.=..| 00000090 b9 e7 9c 7b ef b9 67 d3 69 0b 54 41 44 76 c8 d1 |...{..g.i.TADv..| hexdump -C SW2023-s01.SLDPRT | head 00000000 f4 e9 02 fc 00 00 00 04 51 3f 60 ad 6a 35 f9 b3 |........Q?`.j5..| 00000010 14 00 06 00 08 00 a8 8c 60 c0 d0 05 00 00 74 01 |........`.....t.| 00000020 00 00 e8 02 00 00 07 00 00 00 05 27 56 67 96 56 |...........'Vg.V| 00000030 77 d6 df ea 07 e7 cf ed c6 8e 6c a1 48 70 d6 76 |w.........l.Hp.v| 00000040 cd 16 7f e9 6b 95 3a 4e bb 6e 95 cc d2 b3 69 a9 |....k.:N.n....i.| 00000050 72 6b af c7 82 38 95 6f bc 37 d2 4e a6 28 36 bd |rk...8.o.7.N.(6.| 00000060 c3 cf 85 46 0a 85 63 97 83 56 88 a1 38 02 64 14 |...F..c..V..8.d.| 00000070 00 06 00 08 00 a8 8c 60 c0 44 07 00 00 d1 01 00 |.......`.D......| 00000080 00 a2 03 00 00 22 00 00 00 34 f6 e6 47 56 e6 47 |....."...4..GV.G| 00000090 37 f2 34 f6 e6 66 96 76 d2 03 d2 25 56 37 f6 c6 |7.4..f.v...%V7..|

The newer version of the format is much different and is in a proprietary binary format with no specifications, which makes it much more difficult to know which parts of the file can be used for identification. All these new formats have the hex values “00 00 00 04” as bytes 4 through 7. Not very unique for identification. There is another set of bytes which does seem to be consistent for all samples so far, but they vary in their location. The values “34 f6 e6 47 56 e6 47 37 f2” seem to be in every sample. The 10th byte often has the value 34, but in many samples either has 34, B4, 44, 64, or 33. The other formats, SLDASM and SLDDRW also have this pattern which might give us enough to make a good signature. At this time we may not be able to distinguish the different formats, but maybe in the future.

More work is needed to really develop signatures that can identify each format from SolidWorks definitely. My initial assumptions we not completely correct and there are a few exceptions to the patterns I felt were good enough. One unknown is the formats from SolidWorks 95 through 99 and properly identifying them. More samples are needed. I have placed my initial signature and some samples on my GitHub. Please get in tough if you have additional samples or ideas on better identification.

AskSam

Obsolete Thor - 26 januari 2024 - 7:52am

I was recently asked to look at a set of files with the extension of .ASK. A quick little search led me to find they belong to AskSam which was a free-form database software often used by researchers and libraries as early as 1985. The first few versions of Access Stored Knowledge via Symbolic Access Method were released for DOS and later Windows. The company askSam Systems disappeared around 2015.

The AskSam software competed with other personal information managers with unstructured data storage and retrieval. It was used to keep track of e-mail, special collections, letters, articles, web sites, etc. It could index all the contents and make searching and retrieval easy. By setting up fields the data could be exported to delimitated text. The software also appears to have been localized in German, but file format is the same.

AskSam had many import filters which included:

  • Microsoft Word
  • WordPerfect
  • Text (ASCII files)
  • HTML Files (from the Internet)
  • RTF Files (Rich Text Format)
  • Eudora E-Mail
  • Microsoft Outlook
  • Microsoft Outlook Express
  • Text delimited files – Comma Separated Values, Fixed position, etc.
  • dBASE
  • FoxPro
  • Paradox
  • Microsoft Access
  • Microsoft Excel

AskSam has its own proprietary format to store the database using the .ASK extension. They appear to have a 256 byte header. All the DOS versions of the software use the simple BOF string of “askSam”.

hexdump -C TEST.ASK 00000000 61 73 6b 53 61 6d 00 00 00 00 00 07 0f 01 00 00 |askSam..........| 00000010 01 00 00 00 00 01 00 05 00 37 00 02 00 00 00 01 |.........7......| 00000020 33 00 32 00 00 00 00 00 50 00 00 00 00 00 00 00 |3.2.....P.......| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000d0 00 14 00 01 00 00 01 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 01 00 00 00 00 03 1d 42 00 01 00 |............B...| 000000f0 00 13 01 00 00 00 00 01 00 00 00 00 00 00 00 00 |................| 00000100 00 00 00 00 f6 00 00 00 00 54 65 73 74 01 01 01 |.........Test...| 00000110 01 01 00

When the first Windows version came out in 1993, the header changed to the logical string:

hexdump -C DOS-WIN.ASK | head 00000000 61 73 6b 77 69 6e 00 00 00 00 00 07 0f 01 00 04 |askwin..........| 00000010 01 00 00 00 01 01 00 05 01 37 03 00 00 00 00 01 |.........7......| 00000020 64 00 32 2e 01 4e 00 00 a0 00 00 00 00 00 00 00 |d.2..N..........| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 76 43 00 |.............vC.| 00000050 00 8c 00 00 00 00 00 00 00 00 00 00 00 01 00 00 |................| 00000060 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000070 00 41 72 69 61 6c 00 72 20 4e 65 77 00 00 00 00 |.Arial.r New....| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 00 5b 3a 00 10 10 10 |..........[:....|

With Version 2 for Windows we start seeing a slightly different header:

hexdump -C AS2W-S01.ASK 00000000 61 73 6b 57 69 53 00 00 00 00 00 07 0f 01 00 04 |askWiS..........| 00000010 01 00 00 00 01 01 00 05 00 37 03 00 00 00 00 01 |.........7......| 00000020 c8 00 32 2f 02 4c 00 00 a0 00 00 00 00 00 00 00 |..2/.L..........| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000070 00 43 6f 75 72 69 65 72 20 4e 65 77 00 00 00 00 |.Courier New....| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 00 5b 3a 00 10 10 14 |..........[:....| 000000a0 14 02 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 |................| 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000c0 00 00 00 00 00 00 00 00 00 00 00 60 00 00 00 00 |...........`....| 000000d0 05 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 01 00 00 00 00 00 00 00 00 01 00 |................| 000000f0 00 1d 01 00 00 00 00 01 00 00 00 00 00 00 0a 00 |................| 00000100 00 00 00 00 f6 00 00 00 0a 54 65 73 74 69 6e 67 |.........Testing| 00000110 20 20 00 0a 01 09 10 c0 14 14 42 07 01 | ........B..|

Then all samples from version 4 to the final version 7 all have the same header, although I know there is some features in the later versions that make them incompatible, there isn’t a easy way to identify the different versions after version 4.

hexdump -C Asksam4-s01.ask | head 00000000 61 73 6b 77 34 30 00 00 00 00 25 00 00 00 00 00 |askw40....%.....| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 00 00 00 00 00 00 00 00 00 00 02 00 00 00 e5 38 |...............8| 00000100 0c 3a 67 31 4d 38 dd b5 9c 65 00 00 00 00 90 01 |.:g1M8...e......| 00000110 00 00 01 01 0c 43 00 00 00 00 00 00 be 00 00 00 |.....C..........| 00000120 24 14 00 00 00 00 00 00 10 14 00 00 00 00 00 00 |$...............| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000150 7b 4c 00 00 00 00 00 00 af 4f 00 00 00 00 00 00 |{L.......O......| hexdump -C AskSam6-s01.ask | head 00000000 61 73 6b 77 34 30 00 00 00 00 38 00 00 00 00 00 |askw40....8.....| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 00 00 00 00 00 00 00 00 00 00 02 00 00 00 21 f1 |..............!.| 00000100 ad 41 61 9f c0 39 cd 4a af 65 00 00 00 00 58 02 |.Aa..9.J.e....X.| 00000110 00 00 01 01 84 2e 00 00 00 00 00 00 be 00 00 00 |................| 00000120 24 14 00 00 00 00 00 00 50 13 00 00 00 00 00 00 |$.......P.......| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 c6 5b 00 00 |.............[..| 00000150 ba 33 00 00 00 00 00 00 53 33 00 00 00 00 00 00 |.3......S3......| hexdump -C AskSam7-s01.ask | head 00000000 61 73 6b 77 34 30 00 00 00 00 87 04 00 00 00 00 |askw40..........| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 00 00 00 00 00 00 00 00 00 00 02 00 00 00 b2 fd |................| 00000100 b5 47 61 9f c0 39 5c 4b af 65 00 00 00 00 bc 02 |.Ga..9\K.e......| 00000110 00 00 01 01 db 34 00 00 00 00 00 00 be 00 00 00 |.....4..........| 00000120 24 14 00 00 00 00 00 00 50 13 00 00 00 00 00 00 |$.......P.......| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000150 aa 39 00 00 00 00 00 00 de 3c 00 00 00 00 00 00 |.9.......<......|

Even though everything after version 4 for Windows has the same header, files create din version 7 will not open in version 6. There must be some additional byte sequences which identify the files with the version which created the file. I have been unable to located the free askSam 7 viewer, but here is a link to the version 6 free viewer. It runs in the latest Windows OS. If you open an older version it will ask you to upgrade your file, so be sure to keep a copy of your original.

Once you have your ASK Database opened, you can export to a few formats, an RTF or a delimitated text file based on fields you have entered in the form. Word of warning, if you entered a password to protect modifying of your data in an earlier version, you have to re-enter the password in order to open/upgrade the file, but the viewer will not open password protected files, you will need the full version.

Here are two files created in AskSam 5.11 DOS, one without a password one with. You can see the 16 byte hex values from offset 41 to 57 are zeros in the file with no password and full of values in the protected file. I’m sure someone with more skills could figure out the encryption.

hexdump -C AS5-OPEN.ASK 00000000 61 73 6b 53 61 6d 00 00 00 00 00 07 0f 01 00 00 |askSam..........| 00000010 01 00 00 00 00 01 00 05 00 37 00 02 00 00 00 01 |.........7......| 00000020 33 00 32 00 00 00 00 00 50 00 00 00 00 00 00 00 |3.2.....P.......| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000d0 00 14 00 01 00 00 01 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 01 00 00 00 00 03 1d 42 00 01 00 |............B...| 000000f0 00 13 01 00 00 00 00 01 00 00 00 00 00 00 00 00 |................| 00000100 00 00 00 00 f6 00 00 00 00 54 65 73 74 69 6e 67 |.........Testing| 00000110 01 01 00 |...| hexdump -C AS5-PASS.ASK 00000000 61 73 6b 53 61 6d 00 00 01 00 00 07 0f 01 00 00 |askSam..........| 00000010 01 00 00 00 00 01 00 05 00 37 00 02 00 00 00 01 |.........7......| 00000020 33 00 32 00 00 00 00 00 50 66 5f 14 66 42 53 40 |3.2.....Pf_.fBS@| 00000030 42 71 29 59 6a 61 62 60 6e 00 00 00 00 00 00 00 |Bq)Yjab`n.......| * 000000d0 00 14 00 01 00 00 01 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 01 00 00 00 00 03 1d 42 00 01 00 |............B...| 000000f0 00 13 01 00 00 00 00 01 00 00 00 00 00 00 00 00 |................| 00000100 00 00 00 00 f6 00 00 00 00 54 65 73 74 69 6e 67 |.........Testing| 00000110 01 01 00 |...|

You can check out my samples and my recommendation to PRONOM on my Github page.

FlashPix

Obsolete Thor - 19 januari 2024 - 8:18am

Is there a perfect raster image format? TIFF has been around quite some time and is generally accepted as a preferred preservation format. There have been a few attempts to have a single file contain multiple resolutions with the purpose of providing resolutions for different uses, lower-resolution for web and higher-resolution for print. Even the semi popular JPEG2000 added multiple resolutions to improve the JPEG format. Kodak came up with a few ideas to do this as well. The Kodak PCD, PhotoCD or Image PAC files was one that was used for awhile before it was abandoned. Another was FlashPix.

I briefly mentioned FlashPix on an earlier post about the Microsoft Picture It! format. They are extremely similar. Both. have the same basic structure in a Compound Object format. Some of the FlashPix files generated by Picture It! even have the same identifiers in the CompObj header.

FlashPix was supposed to be the answer to all the problems with storing bitmap image data and how we view the web. Kodak partnered with some big names, Microsoft Corporation, Hewlett-Packard Company and Live Picture, Inc, were among them. Kodak marketed the format and even included it as a native file format to some of its new digital cameras. The format was made official in June of 1996, with a Whitepaper explaining all the benefits and architecture. There was a lot of hype, some even calling it, “Not your Grandma’s format“. Many graphics software started to include support for the new format, including Adobe Photoshop. So what happened, why didn’t the format catch on? Some say it was the size of storing multiple resolutions in one file, others believe it was the complicated Compound Object structure that lead to its demise. Either way, the format had a lot of hype in the late 1990’s, but by the year 2000, it had gone silent and all the websites went away.

FlashPix did have a big impact, and there were many software and hardware devices which were made compatible. There are a few stories left behind of those who scanned all their photos to the FlashPix format only to find a few years later it was unsupported on more modern computers. There was also a few early digital camera’s which could capture directly to the format. Take my Kodak DC260 zoom camera, circa 1998. Changing the Capture Preferences, I can switch between a JPG and FPX.

Using exiftool we can take a look at one of the images from the camera:

exiftool P0004795.FPX ExifTool Version Number : 12.73 File Name : P0004795.FPX Directory : GitHub/digicam_corpus/Kodak/DC260/DC260_01 File Size : 251 kB File Modification Date/Time : 2024:01:06 12:54:20-07:00 File Access Date/Time : 2024:01:06 13:20:46-07:00 File Inode Change Date/Time : 2024:01:06 13:04:34-07:00 File Permissions : -rwxrwxrwx File Type : FPX File Type Extension : fpx MIME Type : image/vnd.fpx Code Page : Unicode UTF-16, little endian Data Object ID : 13BC5A58-6B90-1B6B-12C9-0800201177F8 Data Object Status : Exists, Not Purgeable Creating Transform : Source Image Using Transforms : Cached Image Height : 1024 Cached Image Width : 1536 Comp Obj User Type Len : 16 Comp Obj User Type : FlashPix_Object Visible Outputs : 1 Maximum Image Index : 1 Maximum Transform Index : 0 Maximum Operation Index : 0 Thumbnail Clip : (Binary data 18480 bytes, use -b option to extract) Revision Number : 1 Create Date : 2024:01:06 12:53:29 Modify Date : 2024:01:06 12:53:29 Software : KODAK DIGITAL SCIENCE DC260 Image Width : 1536 Image Height : 1024 Subimage Width : 1536 Subimage Height : 1024 Subimage Color : RGB Subimage Numerical Format : 8-bit, Unsigned Decimation Method : None (Full-sized Image) JPEG Tables : (Binary data 558 bytes, use -b option to extract) Number Of Resolutions : 1 Max JPEG Table Index : 1 Scene Type : Original Scene Software Release : KODAK DIGITAL SCIENCE DC260 Make : Eastman Kodak Company Camera Model Name : KODAK DIGITAL SCIENCE DC260 Serial Number : 7577 Exposure Time : 1/180 F Number : 4.7 Exposure Program : Program AE Exposure Compensation : 0 Subject Distance : 0.520 m Metering Mode : Center-weighted average Light Source : Unknown Focal Length : 24.0 mm Max Aperture Value : 4.6 Flash : No Flash Exposure Index : 90 Sharpness Approximation : 0 File Source : Digital Camera Sensing Method : One-chip color area Extension Create Date : 2024:01:06 12:53:29 Extension Modify Date : 2024:01:06 12:53:29 Creating Application : Picoss Extension Name : ijuhsimasa Extension Persistence : Always Valid Extension Description : Data Object Store 000001 Storage-Stream Pathname : /Data Object Store 000001 Extension Class ID : 56616000-C154-11CE-8553-00AA00A1F95B Used Extension Numbers : 1 Screen Nail : (Binary data 4304 bytes, use -b option to extract) Subimage Tile Count : 384 Subimage Tile Width : 64 Subimage Tile Height : 64 Num Channels : 3 Audio Stream : (Binary data 30780 bytes, use -b option to extract) Aperture : 4.7 Image Size : 1536x1024 Megapixels : 1.6 Shutter Speed : 1/180 Preview Image : (Binary data 4164 bytes, use -b option to extract) Focal Length : 24.0 mm

The file also does identify in PRONOM:

sf P0004795.FPX --- siegfried : 1.11.0 scandate : 2024-01-17T23:13:59-07:00 signature : default.sig created : 2023-12-17T15:54:41+01:00 identifiers : - name : 'pronom' details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml' --- filename : 'P0004795.FPX' filesize : 250880 modified : 2024-01-06T12:54:20-07:00 errors : matches : - ns : 'pronom' id : 'x-fmt/56' format : 'Kodak FlashPix Image' version : mime : 'image/vnd.fpx' class : 'Image (Raster)' basis : 'extension match fpx; container name CompObj with byte match at 53, 36 (signature 2/2)' warning :

If you notice, PRONOM has two signatures for the FlashPix format, this image was identified with signature #2. The first signature looks for the string “FlashPix Object”, but the second looks for the CLSID which is unique to each compound object format. FlashPix has the CLSID: {56616700-c154-11ce-8553-00aa00a1f95b}. Looking at many of the other samples I have there is much variation on the use of the string and CLSID.

FlashPix samples: FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B} FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B} Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B} LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b} FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B} '{56616700-C154-11CE-8553-00AA00A1F95B} Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b} Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

The images from the Kodak Camera use “FlashPix_Object” string so with the underscore it doesn’t match the first signature, but others I made using Picture It! software used a couple variations. Many don’t use the string at all. Others use a sightly different CLSID in both uppercase and lowercase. We will have to suggest adjustments to the current signature to identify them all.

Looking at the contents of the OLE container we can see some interesting things.

Path = P0004795.FPX Type = Compound Physical Size = 250880 Extension = compound Cluster Size = 512 Sector Size = 64 Size Compressed Name ------------ ------------ ------------------------ 188 192 [5]Data Object 000001 272 320 [1]CompObj 388 448 [5]Extension List 144 192 [5]Global Info Data Object Store 000001 18704 18944 [5]SummaryInformation 816 832 Data Object Store 000001/[5]Image Contents 272 320 Data Object Store 000001/[1]CompObj 988 1024 Data Object Store 000001/[5]Extension List 1624 1664 Data Object Store 000001/[5]Image Info 4332 4608 Data Object Store 000001/[5]Screen Nail_bd0100609719a180 Data Object Store 000001/Resolution 0005 Data Object Store 000001/Audio_bd0100609719a180 1112 1152 Data Object Store 000001/[5]KDC_bd0100609719a180 72 128 Data Object Store 000001/[5]SummaryInformation 108 128 Data Object Store 000001/Audio_bd0100609719a180/[5]Audio Info 30808 31232 Data Object Store 000001/Audio_bd0100609719a180/Audio Stream 000000 6208 6656 Data Object Store 000001/Resolution 0005/Subimage 0000 Header 176378 176640 Data Object Store 000001/Resolution 0005/Subimage 0000 Data ------------ ------------ ------------------------ 242414 244480 16 files, 3 folders

The main CompObj is where we find the identification information, but the Data Object Store 000001 directory is where all the image data is stored. In a multiple resolution image we might see additional Resolution directories. You may also notice a mention of an Audio directory. Yes, this image was captured and then audio was recorded with it. Not a video, but an audio clip associated with the image. FlashPix can contain audio streams. This isn’t the first time we have seen this, HP camera’s also have this function which as it turns out is stored in a FlashPix exif extension within a JPEG.

The FlashPix native format may have disappeared, but the format lives on as an extension to Exif data, allowing you to embed audio and other media within a JPEG file. The code for FlashPix was given to ImageMagick and is maintained by them.

Presto!

Obsolete Thor - 12 januari 2024 - 8:01am

Working in preservation and archiving for the last few years has caused me to change a habit most people use everyday. The double-click. I am usually opening a file in a hex editor or control clicking on a file to open it in a different software application than is default. Maybe it’s just me, but having control over opening a file is essential. The thought of double-clicking on a file and the uncertainty of what is actually happening scares me a little.

Of course opening an application executable requires a double-click or a right-click/open process and from there you can open the file of your choosing. Executables are run-able files because they have the required pieces for the operating system and cpu to interpret and well; run. We need executables in order to make sense of the files we preserve. Without something to interpret our the data in our files they are just a bunch of one’s & zero’s.

Take a PDF for example. By itself, it is hard to make sense of the file. You need Acrobat Reader, or any number of other executable software programs to open and render the PDF.

But what if you could take a file and wrap it in an executable so it is all self contained, the file format and an executable in one file! No separate software needed! On the surface this seems like a great idea, which is why a few software companies had this as an option. An early competitor of PDF, Common Ground had the option to embed the DP file into a self contained viewer. Many archive software tools have the ability to make “self-extracting” executables as well. One obvious downside is being unable to execute on a different platform or a later operating system. But at the time they were very convenient.

One software in particular added the option to export a few different formats into a special wrapper making them viewable on any Windows machine.

New Soft Technology Corporation Presto! PageManager is document management software which can view many different file types. The software helps manage document and photo scanning and keep everything organized. The software often came bundled with home consumer scanners, such as the UMAX Astra scanner I bought years ago. With the Windows version of the software you can take one or more photos and “wrap” them into a Presto! Wrapper.

Once exported to a Presto! Wrapper the files within have a portable viewer wrapped up with them. One double-click and Presto!, you can view, rotate, export, and print your images. The wrapper has a your typical .EXE extension and identifies as such.

sf Presto6-s02.EXE --- siegfried : 1.11.0 scandate : 2024-01-09T23:39:36-07:00 signature : default.sig created : 2023-12-17T15:54:41+01:00 identifiers : - name : 'pronom' details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml' --- filename : 'Presto6-s02.EXE' filesize : 818301 modified : 2024-01-07T23:48:01-07:00 errors : matches : - ns : 'pronom' id : 'fmt/899' format : 'Windows Portable Executable' version : '32 bit' mime : 'application/vnd.microsoft.portable-executable' class : basis : 'extension match exe; byte match at [[0 2] [232 94]]' hexdump -C Presto6-s02.EXE | head 00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 |MZ..............| 00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 e8 00 00 00 |................| 00000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68 |........!..L.!Th| 00000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f |is program canno| 00000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 |t be run in DOS | 00000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 |mode....$.......| 00000080 99 72 8f bf dd 13 e1 ec dd 13 e1 ec dd 13 e1 ec |.r..............| 00000090 5e 0f ef ec dc 13 e1 ec b2 0c eb ec d6 13 e1 ec |^...............|

The preservation of executables is, in my opinion, complicated. Running a 32 bit executable on a computer today might not even work. Then we have to get into the license of using the software and wether the license allows us to use it freely in perpetuity. So as much as this is an executable, knowing it is also a wrapper for regular images is important to know as an option for preservation. The files wrapped inside can be exported and preserved as a solution. So what makes this executable unique. Let’s look a little closer.

00005000 00 00 00 00 11 2e 40 00 00 10 40 00 80 1f 40 00 |......@...@...@.| 00005010 c0 24 40 00 00 00 00 00 00 00 00 00 00 00 00 00 |.$@.............| 00005020 50 6d 76 69 65 77 20 69 73 20 63 6c 6f 73 65 2e |Pmview is close.| 00005030 00 00 00 00 5c 00 00 00 74 6d 70 00 5c 54 45 4d |....\...tmp.\TEM| 00005040 50 00 00 00 20 4e 65 77 53 6f 66 74 20 56 69 65 |P... NewSoft Vie| 00005050 77 65 72 00 34 31 36 44 37 30 36 43 36 31 37 39 |wer.416D706C6179| 00005060 36 35 37 32 00 00 00 00 41 6d 70 6c 61 79 65 72 |6572....Amplayer| 00005070 00 00 00 00 70 6d 76 69 65 77 2e 65 78 65 00 00 |....pmview.exe..| 00005080 41 6d 70 6c 61 79 65 72 2e 65 78 65 20 67 72 65 |Amplayer.exe gre| 00005090 65 74 2e 69 64 20 56 00 41 6d 70 6c 61 79 65 72 |et.id V.Amplayer| 000050a0 2e 65 78 65 00 00 00 00 2e 2e 00 00 2e 00 00 00 |.exe............| 000050b0 5c 2a 2e 2a 00 00 00 00 4c 6f 63 61 6c 20 41 70 |\*.*....Local Ap| 000050c0 70 57 69 7a 61 72 64 2d 47 65 6e 65 72 61 74 65 |pWizard-Generate| 000050d0 64 20 41 70 70 6c 69 63 61 74 69 6f 6e 73 00 00 |d Applications..| 000050e0 57 72 61 70 70 65 72 00 43 45 78 70 76 77 44 6f |Wrapper.CExpvwDo| 000050f0 63 00 00 00 43 45 78 70 76 77 56 69 65 77 00 00 |c...CExpvwView..|

It is indeed a wrapper, the header looks like any other EXE file, but a little further into the file we can see some specifics to the viewer. In all my samples I can see the string “NewsSoft Viewer“. That might be enough to distinguish it from other executables. See some samples here.

I guess part of the question is wether identifying specific software executables is needed in preservation. Arn’t they all executables and should be treated similar? This isn’t the first type of executables I have seen like this. awhile back I came across another home software which allowed you to make a slideshow, complete with audio and wrap it into an executable to put on a disk so playback was easy for the user and nothing additional was needed. The software is called Family Album Creator, use at your own risk.

PNG Plus

Obsolete Thor - 5 januari 2024 - 8:06am

Usually in the software world file formats are fairly efficient, the structure is meant to provide a way to store the data of the software being used. There isn’t much need to add additional unnecessary additions. This isn’t always true, but in the early days, disk space was expensive so compression and efficiency ruled. There also wasn’t much need to hide anything or complicate things. That is unless it is intended. This makes me think of two things, Polyglots and Steganography.

Steganography is the art of embedding data within an image. With digital images you can hide another image within the main image by using the most and least significant bits. Fun use of technology, but not something you normally would find in your regular desktop software.

Ange is the master at polyglots. If you haven’t watched his presentation on funky file formats, you are missing out.

.@Gynvael’s png/zip polyglot visualized (cf his article in the latest @pagedout_zine) pic.twitter.com/5BR6GLoB98

— Ange (@angealbertini) December 18, 2023

Imagine my surprise when I was researching the Picture It! software and the MIX file format only to discover Microsoft decided to make their own polyglot of sorts for their PNG Plus format which replaced the MIX format, then both obsolete when Digital Image was discontinued in 2007. The PNG Plus format was the native format for the Microsoft Picture It! and Digital Image software often found with the Microsoft Works or Digital Imaging suite of software.

Save Menu from Digital Image Pro

According to the help within Digital Image:

The PNG Plus format uses the standard PNG extension but provides saving of layers and pages within the PNG format. Since the PNG format cannot do this natively, how did Microsoft accomplish this? Well, by throwing an OLE container into the middle of the file of course!

PNG Plus files are your regular PNG format and will identify as such. But they are just a low resolution thumbnail of the full image. Let’s take a look:

exiftool PictureIt7-s02.png ExifTool Version Number : 12.70 File Name : PictureIt7-s02.png File Size : 26 kB File Modification Date/Time : 2023:12:26 22:01:58-07:00 File Access Date/Time : 2024:01:01 12:31:07-07:00 File Inode Change Date/Time : 2023:12:26 22:01:58-07:00 File Permissions : -rwx------ File Type : PNG File Type Extension : png MIME Type : image/png Image Width : 500 Image Height : 333 Bit Depth : 8 Color Type : RGB with Alpha Compression : Deflate/Inflate Filter : Adaptive Interlace : Noninterlaced SRGB Rendering : Perceptual Gamma : 2.2 White Point X : 0.3127 White Point Y : 0.329 Red X : 0.64 Red Y : 0.33 Green X : 0.3 Green Y : 0.6 Blue X : 0.15 Blue Y : 0.06 Warning : [minor] Text/EXIF chunk(s) found after PNG IDAT (may be ignored by some readers) Title : PictureIt7-s02 Image Size : 500x333 Megapixels : 0.167

Looks like there is some additional data after the IDAT chunk.

hexdump -C PictureIt7-s02.png | head 00000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR| 00000010 00 00 01 f4 00 00 01 4d 08 06 00 00 00 f6 13 9d |.......M........| 00000020 37 00 00 00 01 73 52 47 42 00 ae ce 1c e9 00 00 |7....sRGB.......| 00000030 00 04 67 41 4d 41 00 00 b1 8f 0b fc 61 05 00 00 |..gAMA......a...| 00000040 00 20 63 48 52 4d 00 00 7a 26 00 00 80 84 00 00 |. cHRM..z&......| 00000050 fa 00 00 00 80 e8 00 00 75 30 00 00 ea 60 00 00 |........u0...`..| 00000060 3a 98 00 00 17 70 9c ba 51 3c 00 00 24 f4 49 44 |:....p..Q<..$.ID| 00000070 41 54 78 5e ed dd 4d a8 15 57 be 28 f0 1e 08 1e |ATx^..M..W.(....| 00000080 e3 47 8e 49 ab c7 d8 81 03 09 41 9c 28 38 e8 80 |.G.I......A.(8..| 00000090 d0 9c 0e 08 0e 1a 11 c2 15 07 5e 5a 07 4d c7 2b |..........^Z.M.+|

The header looks the same as any PNG file, so lets look a little further:

00002560 ff 1f fa 5f 90 66 c9 e6 ad 88 00 00 00 00 63 6d |..._.f........cm| 00002570 4f 44 4e 88 09 c1 00 00 40 00 63 70 49 70 d0 cf |ODN.....@.cpIp..| 00002580 11 e0 a1 b1 1a e1 00 00 00 00 00 00 00 00 00 00 |................| 00002590 00 00 00 00 00 00 3e 00 03 00 fe ff 09 00 06 00 |......>.........| 000025a0 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 |................| 000025b0 00 00 00 00 00 00 00 10 00 00 02 00 00 00 01 00 |................| * 00002970 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 52 00 |..............R.| 00002980 6f 00 6f 00 74 00 20 00 45 00 6e 00 74 00 72 00 |o.o.t. .E.n.t.r.| 00002990 79 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |y...............| 000029a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000029b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 |................| 000029c0 05 00 ff ff ff ff ff ff ff ff 01 00 00 00 7e 7f |..............~.| 000029d0 3f b5 a5 f6 86 43 a1 a1 a3 02 24 d2 88 ef 00 00 |?....C....$.....| 000029e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000029f0 00 00 03 00 00 00 40 12 00 00 00 00 00 00 44 00 |......@.......D.| 00002a00 61 00 74 00 61 00 53 00 74 00 6f 00 72 00 65 00 |a.t.a.S.t.o.r.e.| 00002a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00003930 00 00 00 00 00 00 00 00 00 00 00 00 00 00 43 48 |..............CH| 00003940 4e 4b 49 4e 4b 20 04 00 07 00 0c 00 00 03 00 02 |NKINK ..........| 00003950 00 00 00 0a 00 00 f8 01 0c 00 ff ff ff ff 18 00 |................| 00003960 54 45 58 54 00 00 01 00 00 00 54 45 58 54 00 02 |TEXT......TEXT..| 00003970 00 00 22 00 00 00 18 00 46 44 50 50 00 00 43 00 |..".....FDPP..C.| 00003980 4f 00 4e 00 54 00 45 00 4e 00 54 00 53 00 00 00 |O.N.T.E.N.T.S...| 00003990 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000039f0 00 00 1f 00 00 00 00 0a 00 00 00 00 00 00 01 00 |................| 00003a00 43 00 6f 00 6d 00 70 00 4f 00 62 00 6a 00 00 00 |C.o.m.p.O.b.j...| 00003a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00004530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 |................| 00004540 fe ff 03 0a 00 00 ff ff ff ff 00 00 00 00 00 00 |................| 00004550 00 00 00 00 00 00 00 00 00 00 1a 00 00 00 51 75 |..............Qu| 00004560 69 6c 6c 39 36 20 53 74 6f 72 79 20 47 72 6f 75 |ill96 Story Grou| 00004570 70 20 43 6c 61 73 73 00 ff ff ff ff 01 00 00 00 |p Class.........| 00004580 00 00 00 00 f4 39 b2 71 00 00 00 00 00 00 00 00 |.....9.q........| 00004590 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00006570 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ba 84 |................| 00006580 43 51 00 00 00 18 69 54 58 74 54 69 74 6c 65 00 |CQ....iTXtTitle.| 00006590 00 00 00 00 50 69 63 74 75 72 65 49 74 37 2d 73 |....PictureIt7-s| 000065a0 30 32 3a 70 9c 00 00 00 00 14 74 45 58 74 54 69 |02:p......tEXtTi| 000065b0 74 6c 65 00 50 69 63 74 75 72 65 49 74 37 2d 73 |tle.PictureIt7-s| 000065c0 30 32 f2 8f d5 89 00 00 00 00 49 45 4e 44 ae 42 |02........IEND.B| 000065d0 60 82 |`.|

What what do we have here? Near the end of the file before the IEND chunk is an OLE file with the very recognizable hex values of “D0CF11E0“. Let’s strip out the OLE file and take a look.

Path = PictureIt7-s02-ole Type = Compound WARNINGS: There are data after the end of archive Physical Size = 8704 Tail Size = 7764 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2023-12-26 22:01:58 D.... DataStore 2023-12-26 22:01:58 D.... Text ..... 2560 2560 Text/CONTENTS ..... 86 128 Text/[1]CompObj ..... 96 128 DataStore/3 ..... 4 64 DataStore/1 ..... 121 128 DataStore/0 ..... 57 64 DataStore/2 ..... 98 128 DataStore/5 ..... 4 64 DataStore/4 ..... 1254 1280 DataStore/7 ..... 4 64 DataStore/6 ..... 4 64 DataStore/8 ------------------- ----- ------------ ------------ ------------------------ 2023-12-26 22:01:58 4288 4672 11 files, 2 folders

Interesting, I don’t think I have come across a standard format with a container embedded within. I have come across many OLE and ZIP containers which contain other common formats within, but this format is definitely unique. Others have added features in the IDAT chunk, such as a web shell. I am sure there are others out there. The CompObj file found within the Text directory is very similar to the Microsoft Works and Publisher format. Although trying to open the file in Publisher doesn’t work!

hexdump -C PictureIt7-s02-ole/Text/\[1\]CompObj | head 00000000 01 00 fe ff 03 0a 00 00 ff ff ff ff 00 00 00 00 |................| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 1a 00 00 00 |................| 00000020 51 75 69 6c 6c 39 36 20 53 74 6f 72 79 20 47 72 |Quill96 Story Gr| 00000030 6f 75 70 20 43 6c 61 73 73 00 ff ff ff ff 01 00 |oup Class.......| 00000040 00 00 00 00 00 00 f4 39 b2 71 00 00 00 00 00 00 |.......9.q......| 00000050 00 00 00 00 00 00 |......|

PRONOM uses binary and container signatures to identify file formats. Even though this file format contains a valid OLE container, because it is within a regular binary file format, I don’t believe a container signature would work. The difficulty will be to clearly identify this new format without falsely identifying a regular PNG instead. The OLE file format header is not in a consistent location to use a specific offset. Making the string a variable location can causes some undo processing, so lets look to see if there is anything else we can use to make a positive ID.

The PNG file format is based on chunks, you have to have IHDR, then an IDAT and the IEND chunk. If we take a look at a regular PNG file using a libpng tool pngcheck, we see this:

pngcheck -cvt rgb-8.png File: rgb-8.png (759 bytes) chunk IHDR at offset 0x0000c, length 13 256 x 256 image, 24-bit RGB, non-interlaced chunk tEXt at offset 0x00025, length 44, keyword: Copyright ? 2013,2015 John Cunningham Bowler chunk iTXt at offset 0x0005d, length 116, keyword: Licensing compressed, language tag = en no translated keyword, 101 bytes of UTF-8 text chunk IDAT at offset 0x000dd, length 518 zlib: deflated, 32K window, maximum compression chunk IEND at offset 0x002ef, length 0 No errors detected in rgb-8.png (5 chunks, 99.6% compression).

The required chunk are there, but a couple extra, the tEXt and iTXt, which are textual metadata you can add. Now lets look at a PNG Plus file:

pngcheck -cvt PictureIt7-s02.png File: PictureIt7-s02.png (26066 bytes) chunk IHDR at offset 0x0000c, length 13 500 x 333 image, 32-bit RGB+alpha, non-interlaced chunk sRGB at offset 0x00025, length 1 rendering intent = perceptual chunk gAMA at offset 0x00032, length 4: 0.45455 chunk cHRM at offset 0x00042, length 32 White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33 Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06 chunk IDAT at offset 0x0006e, length 9460 zlib: deflated, 32K window, fast compression chunk cmOD at offset 0x0256e, length 0 Microsoft Picture It private, ancillary, unsafe-to-copy chunk chunk cpIp at offset 0x0257a, length 16384 Microsoft Picture It private, ancillary, safe-to-copy chunk chunk iTXt at offset 0x06586, length 24, keyword: Title uncompressed, no language tag no translated keyword, 15 bytes of UTF-8 text chunk tEXt at offset 0x065aa, length 20, keyword: Title PictureIt7-s02 chunk IEND at offset 0x065ca, length 0 No errors detected in PictureIt7-s02.png (10 chunks, 96.1% compression).

It looks like we have the required chunks and some textual chunks but also a couple chunks which pngcheck describes as private and identify’s them as Microsoft Picture It chunks. The cpIp chunk is the one which contains the OLE container. This is the chunk we need to identify in a signature. The problem is the offset for the cpIp chunk is not the same each time. Here is one from Digital Image 10 Pro.

chunk cpIp at offset 0x737a7, length 245760 Microsoft Picture It private, ancillary, safe-to-copy chunk

Significantly further in the file that the other example. These samples currently identify as PNG 1.2 files. PRONOM fmt/13 so we can use the signature and add to it, but it currently doesn’t look for IDAT only the iTXt chunk, which is probably not optimal. For PNG Plus, lets get the header which includes IHDR, IDAT, then the cpIp chunk then an end of file sequence for IEND. Take a look at my signature and samples, I am curious how many PNG Plus files are out there hidden to the world.

Turns out there is another PNG flavor which has been enhanced to allow for layers and pages. Adobe Fireworks uses a PNG format as their native format. They also use private chunks, but not within an OLE container. They use additional chunks, but before the IDAT chunk:

chunk prVW at offset 0x00092, length 1700 Macromedia Fireworks preview chunk (private, ancillary, unsafe to copy) chunk mkBF at offset 0x00742, length 72 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkTS at offset 0x00796, length 36716 Macromedia Fireworks(?) private, ancillary, unsafe-to-copy chunk chunk mkBS at offset 0x0970e, length 190 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x097d8, length 1251 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x09cc7, length 1358 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0a221, length 1145 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0a6a6, length 339 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0a805, length 695 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0aac8, length 3799 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0b9ab, length 7733 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0d7ec, length 2741 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0e2ad, length 5153 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk chunk mkBT at offset 0x0f6da, length 10775 Macromedia Fireworks private, ancillary, unsafe-to-copy chunk

It’s hard to know which each of the chunks are for and if they are all required for the Fireworks PNG format. From the book on PNG.

In addition to supporting PNG as an output format, Fireworks actually uses PNG as its native file format for day-to-day intermediate saves. This is possible thanks to PNG’s extensible “chunk-based” design, which allows programs to incorporate application-specific data in a well-defined way. Macromedia has embraced this capability, defining at least four custom chunk types that hold various things pertinent to the editor. Unfortunately, one of them (pRVW) violates the PNG naming rules by claiming to be an officially registered, public chunk type, but this was an oversight and should be fixed in version 2.0.

Picture It!

Obsolete Thor - 29 december 2023 - 8:11am

Most everyone has heard of Microsoft Office, the suite of applications used by millions everyday. Less people know about Microsoft Works, which was a lower cost alternative, but was quite popular as a home office suite of applications. One tool which often came with the Works suite was a digital image tool called Picture It!

Picture It! was a photo editing tool first released by Microsoft in 1996 geared to making photo editing easy and affordable.

Picture It! used a wizard type interface which walked you through acquiring an image and adding to it. One of the key features of the software was the ability to “stack” objects like layers. Because of this feature a new file format was used to save this information to disk. Meet the Microsoft Image (Picture) Extension format, commonly known as the MIX file format. It is very similar to the FlashPix image format, which was supposed to be an image file format to solve many delivery issues, but didn’t seem to gain hold despite being created by Kodak, HP, and others. In fact many of the MIX files I found on Microsoft disks are actually FlashPix files.

The MIX extension was also used by another Microsoft program, PhotoDraw, which causes confusion as they were similar, but PhotoDraw has some added features which may not be compatible with Picture It!. Both formats are based on the Microsoft Compound Object (OLE) container, and have a similar structure. Let’s take a look at a MIX file from Picture It! version 1.

7z l PictureIt1-s02.mix -- Path = PictureIt1-s02.mix Type = Compound Physical Size = 48128 Extension = compound Cluster Size = 512 Sector Size = 64 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ ..... 328 384 [5]Data Object 000001 ..... 396 448 [5]Transform 000004 ..... 872 896 [5]Operation 000001 ..... 320 320 [1]CompObj ..... 292 320 [5]Global Info ..... 872 896 [5]Operation 000002 ..... 144 192 [5]Operation 000003 ..... 684 704 [5]Transform 000008 ..... 1028 1088 [5]Transform 000009 ..... 328 384 [5]Data Object 000009 ..... 324 384 [5]Data Object 000005 2023-12-27 11:04:39 D.... Data Object Store 000001 ..... 328 384 [5]Data Object 000010 ..... 20932 20992 [5]SummaryInformation ..... 200 256 [5]Microsoft Embedding Info 2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0001 ..... 1400 1408 Data Object Store 000001/[5]Image Contents ..... 230 256 Data Object Store 000001/[1]CompObj 2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0000 ..... 28 64 Data Object Store 000001/Resolution 0000/Subimage 0000 Data ..... 80 128 Data Object Store 000001/Resolution 0000/Subimage 0000 Header 2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0003 2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0002 ..... 28 64 Data Object Store 000001/Resolution 0002/Subimage 0000 Data ..... 208 256 Data Object Store 000001/Resolution 0002/Subimage 0000 Header 2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0005 2023-12-27 11:04:39 D.... Data Object Store 000001/Resolution 0004 ..... 28 64 Data Object Store 000001/Resolution 0004/Subimage 0000 Data ..... 1792 1792 Data Object Store 000001/Resolution 0004/Subimage 0000 Header ..... 124 128 Data Object Store 000001/[5]SummaryInformation ..... 28 64 Data Object Store 000001/Resolution 0005/Subimage 0000 Data ..... 6976 7168 Data Object Store 000001/Resolution 0005/Subimage 0000 Header ..... 28 64 Data Object Store 000001/Resolution 0003/Subimage 0000 Data ..... 544 576 Data Object Store 000001/Resolution 0003/Subimage 0000 Header ..... 28 64 Data Object Store 000001/Resolution 0001/Subimage 0000 Data ..... 128 128 Data Object Store 000001/Resolution 0001/Subimage 0000 Header ------------------- ----- ------------ ------------ ------------------------ 2023-12-27 11:04:39 38698 39872 29 files, 7 folders

This is a simple MIX file with one line of text, but contains a lot of content inside the OLE container. If I try and use the PRONOM registry to identify the file, I get:

sf PictureIt1-s02.mix --- siegfried : 1.11.0 scandate : 2023-12-27T11:06:32-07:00 signature : default.sig created : 2023-12-17T15:54:41+01:00 identifiers : - name : 'pronom' details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml' --- filename : 'PictureIt1-s02.mix' filesize : 48128 modified : 2023-12-27T11:04:40-07:00 errors : matches : - ns : 'pronom' id : 'fmt/111' format : 'OLE2 Compound Document Format' version : mime : class : 'Text (Structured)' basis : 'byte match at 0, 30' warning :

Hmm, we know it is an OLE compound document, but it should identify as a Picture It! file as PRONOM has defined a PUID for the format. fmt/936 has been defined as “Microsoft Picture It! Image File 1”. So I am not sure why this file from version 1 is not identifying correctly. Let’s take a look. The PRONOM container signature for fmt/936 is looking for this:

<ContainerSignature Id="17015" ContainerType="OLE2"> <Description>Microsoft Picture It! Image File</Description> <Files> <File> <Path>CompObj</Path> <BinarySignatures> <InternalSignatureCollection> <InternalSignature ID="17015"> <ByteSequence Reference="BOFoffset"> <SubSequence Position="1" SubSeqMinOffset="32" SubSeqMaxOffset="32"> <Sequence>'Microsoft Picture It! version 1 Picture'</Sequence> </SubSequence> </ByteSequence> </InternalSignature> </InternalSignatureCollection> </BinarySignatures> </File> </Files> </ContainerSignature>

The container signature is looking into the OLE container for the “CompObj” file (which seems to be required), then looks for the string “Microsoft Picture It! version 1 Picture” starting at the 32nd byte. That is pretty specific. The sample file I am using as an example has the following string of bytes.

hexdump -C PictureIt1-s02/\[1\]CompObj 00000000 01 00 fe ff 03 0a 00 00 ff ff ff ff 00 68 61 56 |.............haV| 00000010 54 c1 ce 11 85 53 00 aa 00 a1 f9 5b 1e 00 00 00 |T....S.....[....| 00000020 4d 69 63 72 6f 73 6f 66 74 20 50 69 63 74 75 72 |Microsoft Pictur| 00000030 65 20 49 74 21 20 50 69 63 74 75 72 65 00 27 00 |e It! Picture.'.| 00000040 00 00 7b 35 36 36 31 36 38 30 30 2d 43 31 35 34 |..{56616800-C154| 00000050 2d 31 31 43 45 2d 38 35 35 33 2d 30 30 41 41 30 |-11CE-8553-00AA0| 00000060 30 41 31 46 39 35 42 7d 00 13 00 00 00 50 69 63 |0A1F95B}.....Pic| 00000070 74 75 72 65 49 74 21 2e 50 69 63 74 75 72 65 00 |tureIt!.Picture.|

Ok, so this sample has a similar string but is missing the “version 1” text. It seems the samples used to created the PRONOM signature was working off samples which included the version 1 in the header of CompObj. Maybe when Microsoft learned they would be making a version 2, they decided a version number should be included going forward. Let’s take a look a file from version 2 to compare:

hexdump -C PictureIt2-s01/\[1\]CompObj 00000000 01 00 fe ff 03 0a 00 00 ff ff ff ff 50 28 72 2d |............P(r-| 00000010 4b 8c d0 11 a9 6f 00 a0 c9 05 41 0d 28 00 00 00 |K....o....A.(...| 00000020 4d 69 63 72 6f 73 6f 66 74 20 50 69 63 74 75 72 |Microsoft Pictur| 00000030 65 20 49 74 21 20 76 65 72 73 69 6f 6e 20 32 20 |e It! version 2 | 00000040 50 69 63 74 75 72 65 00 27 00 00 00 7b 32 44 37 |Picture.'...{2D7| 00000050 32 32 38 35 30 2d 38 43 34 42 2d 31 31 44 30 2d |22850-8C4B-11D0-| 00000060 41 39 36 46 2d 30 30 41 30 43 39 30 35 34 31 30 |A96F-00A0C905410| 00000070 44 7d 00 f4 39 b2 71 50 00 00 00 4d 00 69 00 63 |D}..9.qP...M.i.c|

Ok, so it looks like they did update the version string for version 2. This file also does not identify correctly. A quick look at the wikipedia page for Microsoft Picture It! tells us they continued to release the software until version 10. Is there a different string for each version?

Diving into this and gathering many samples has brought a lot of variants to surface. Let’s see if we can list all the CompObj header variants.

Version 1 samples: Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B} Microsoft Picture It! Picture'{56616800-C154-11CE-8553-00AA00A1F95B} Microsoft Picture It! version 1 Picture'{56616800-C154-11CE-8553-00AA00A1F95B} Picture It! Collage'{56616800-C154-11CE-8553-00AA00A1F95B} Version 2 samples: Microsoft Picture It! version 2 Picture'{2D722850-8C4B-11D0-A96F-00A0C905410D} Version 3 samples: Microsoft Picture It! version 3 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D} Version 4 samples: Microsoft Picture It! version 4 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D} PhotoDraw version 1 samples: Microsoft PhotoDraw version 1 Picture'{18B8D020-B4FD-11D0-A97E-00A0C905410D} PhotoDraw version 2 samples: Microsoft PhotoDraw version 2 Picture'{18B8D021-B4FD-11D0-A97E-00A0C905410D} FlashPix samples: FlashPix Object({56616000-C154-11CE-8553-00AA00A1F95B} FlashPix Object({56616800-C154-11CE-8553-00AA00A1F95B} Picture It! FlashPix'{56616700-C154-11CE-8553-00AA00A1F95B} LPI FlashPix'{56616700-c154-11ce-8553-00aa00a1f95b} FlashPix_Object'{56616700-C154-11CE-8553-00AA00A1F95B} '{56616700-C154-11CE-8553-00AA00A1F95B} Picture It!'{56616700-c154-11ce-8553-00aa00a1f95b} Flashpix Toolkit Application'{56616700-c154-11ce-0000-000000000000}

Ok, there is a lot to discuss here. First of all, it seems MIX was only used in Picture It! until version 5 (2001), then the Picture It! software used a new format, PNG Plus to store the layered stacks. More on that in a future post! Although some later versions seems to be able to open the older MIX format. Version 4 of the MIX format seems to be the last as the 2001 software had only version 4 files on it. Probably safe to say only the 4 versions are needed for identification.

You may notice the additional unique identifier I included in each format. This is called a Class ID for the OLE format, which A LOT of formats use. Each “format” has a unique ID associated with it to help distinguish it from other formats. This Unique ID could possibly be a better solution for identification. It does cross over with the PhotoDraw format, but the FlashPix format seems to have a unique ID. With all the variations in the version 1 strings, the ID remains the same. For version 3 and 4 the ID is the same, which could mean they are interchangeable. It is also the same as PhotoDraw version 1. Not to complicate things.

So it seems in order to get proper identification of these similar formats we need to:

  • Clean up version 1 identification for fmt/936
  • Add a signature for 2, 3, and 4
  • Add a version 2 signature for the PhotoDraw format
  • Add some additional signature variations for the FlashPix format.

The Class ID’s could be used to distinguish different versions and formats, but many of the ID’s are identical, this could mean they are the same format. But for now we can just add the additional variation strings and it should identify everything for now. The FlashPix format needs more research as there is so many different variations and it’s so close to the MIX format. Take a look at my GitHub submission, maybe you have some additional variations to add?

Digital Negatives

Obsolete Thor - 22 december 2023 - 8:06am

One of the important parts about Digital Preservation is to gather significant properties of the digital files we hope to preserve. This can allow us to base our risk assessments off of more data than just an extension. For example, a TIFF file is a mighty good preservation format. Well documented and adopted by the preservation community, and with hundreds if not thousands of software tools to render and make use of the format. But if a TIFF file uses compression like LZW, or if it happens to have multiple pages, those are good things to know about. Most formats might have a stable set of properties, but sometimes can have properties which adds more risk to the format becoming difficult to render or migrate.

A DNG or Digital Negative developed by Adobe was supposed to solve the issues with proprietary RAW digital camera formats. Rendering a PhaseOne IIQ file often times requires the full CaptureOne software which can be expensive. Adobe spends quite a bit of resources in adding support to its Camera RAW toolkit and adding the ability to take majority of these RAW formats and move them into a DNG. There is also more and more camera manufacturers who image directly to a DNG as their native RAW format. This is the case for Apple’s ProRAW format which uses the DNG specification.

Another manufacturer is the Insta360 camera’s. Their 360 camera’s can use two lenses to capture 180 degrees from each and then stitch into a 360 photo or video. They can capture compressed images and videos, but also in RAW. Because of the two lenses and sensors, their DNG’s can get quite large. For this reason I recently asked PRONOM to adjust their signatures to allow for a bigger offset of DNG information in the larger RAW images.

exiftool IMG_20230913_141939_00_039.dng ExifTool Version Number : 12.70 File Name : IMG_20230913_141939_00_039.dng File Size : 143 MB File Type : DNG File Type Extension : dng MIME Type : image/x-adobe-dng Exif Byte Order : Little-endian (Intel, II) Subfile Type : Full-resolution image Image Width : 5984 Image Height : 11968 Bits Per Sample : 16 Compression : Uncompressed Photometric Interpretation : Color Filter Array Make : Arashi Vision Camera Model Name : Insta360 X3

DNG files are actually based on the TIFF format, TIFF/EP to be precise, which means there is some good history behind the format and understanding of its structure. DNG does add many new tags and new features, so there is much more going on. Here is a TIFFInfo view of a DNG. Lots of new tags…..

tiffinfo IMG_20230913_141939_00_039.dng TIFFReadDirectory: Warning, Unknown field with tag 33421 (0x828d) encountered. TIFFReadDirectory: Warning, Unknown field with tag 33422 (0x828e) encountered. TIFFReadDirectory: Warning, Unknown field with tag 50937 (0xc6f9) encountered. TIFFReadDirectory: Warning, Unknown field with tag 50938 (0xc6fa) encountered. TIFFReadDirectory: Warning, Unknown field with tag 50940 (0xc6fc) encountered. TIFFReadDirectory: Warning, Unknown field with tag 51009 (0xc741) encountered. TIFFReadDirectory: Warning, Unknown field with tag 51107 (0xc7a3) encountered. === TIFF directory 0 === TIFF Directory at offset 0x889946c (143234156) Subfile Type: (0 = 0x0) Image Width: 5984 Image Length: 11968 Bits/Sample: 16 Sample Format: unsigned integer Compression Scheme: None Photometric Interpretation: 32803 (0x8023) Orientation: row 0 top, col 0 lhs Samples/Pixel: 1 Rows/Strip: 11968 Planar Configuration: single image plane Make: Arashi Vision Model: Insta360 X3 Software: v1.0.69_build1 DateTime: 2023:09:13 14:19:40 Tag 33421: 2,2 Tag 33422: 1,2,0,1 EXIFIFDOffset: 0x8 GPSIFDOffset: 0x3e6 DNGVersion: 1,3,0,0 DNGBackwardVersion: 1,3,0,0 UniqueCameraModel: Insta360 X3

An IFD (Image File Directory) is the building block of a TIFF file. A TIFF file can have multiple IFD’s within a single file. But an IFD can also be a thumbnail, metadata or GPS info. For a DNG, they use the IFD structure as well, but often, the first IFD is a lower resolution of the full image.

<File:FileType>DNG</File:FileType> <File:FileTypeExtension>dng</File:FileTypeExtension> <File:MIMEType>image/x-adobe-dng</File:MIMEType> <File:ExifByteOrder>Little-endian (Intel, II)</File:ExifByteOrder> <IFD0:SubfileType>Reduced-resolution image</IFD0:SubfileType> <IFD0:ImageWidth>256</IFD0:ImageWidth> <IFD0:ImageHeight>171</IFD0:ImageHeight> <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample> <IFD0:Compression>Uncompressed</IFD0:Compression> <IFD0:PhotometricInterpretation>RGB</IFD0:PhotometricInterpretation> <IFD0:Make>Canon</IFD0:Make> <IFD0:Model>Canon EOS RP</IFD0:Model> ... <SubIFD:SubfileType>Full-resolution image</SubIFD:SubfileType> <SubIFD:ImageWidth>6384</SubIFD:ImageWidth> <SubIFD:ImageHeight>4224</SubIFD:ImageHeight> <SubIFD:BitsPerSample>16</SubIFD:BitsPerSample> <SubIFD:Compression>JPEG</SubIFD:Compression>

But not always the same way.

<IFD0:SubfileType>Full-resolution image</IFD0:SubfileType> <IFD0:ImageWidth>5984</IFD0:ImageWidth> <IFD0:ImageHeight>11968</IFD0:ImageHeight> <IFD0:BitsPerSample>16</IFD0:BitsPerSample> <IFD0:Compression>Uncompressed</IFD0:Compression> <IFD0:PhotometricInterpretation>Color Filter Array</IFD0:PhotometricInterpretation> <IFD0:Make>Arashi Vision</IFD0:Make> <IFD0:Model>Insta360 X3</IFD0:Model> <IFD0:SubfileType>Reduced-resolution image</IFD0:SubfileType> <IFD0:ImageWidth>4032</IFD0:ImageWidth> <IFD0:ImageHeight>3024</IFD0:ImageHeight> <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample> <IFD0:Compression>JPEG</IFD0:Compression> <IFD0:PhotometricInterpretation>YCbCr</IFD0:PhotometricInterpretation> <IFD0:Make>Apple</IFD0:Make> <IFD0:Model>iPhone 13 Pro</IFD0:Model> ... <SubIFD:SubfileType>Full-resolution image</SubIFD:SubfileType> <SubIFD:ImageWidth>4032</SubIFD:ImageWidth> <SubIFD:ImageHeight>3024</SubIFD:ImageHeight> <SubIFD:BitsPerSample>12 12 12</SubIFD:BitsPerSample> <SubIFD:Compression>JPEG</SubIFD:Compression>

It can get confusing, especially for tools we use to extract metadata and significant properties from a DNG for preservation. Within Rosetta, the preservation system I use at work, there is no dedicated DNG extractor, so we use JHOVE, as it is the tool we use for our TIFF images. This presents a problem as the process only extracts properties for the first IFD assuming it is the main IFD, but in many cases it reports back the image is much smaller in pixel dimensions than it actually is. More work is needed to improve extracting correct significant properties for DNG and other RAW image formats.

Adobe released a new version of DNG this year. In June, DNG version 1.7.0.0 was finalized. The new version brought a few new features, two of which are including JPEG XL compression and a new HDR colorimetric value. In order to add JPEG XL compression DNG version 1.7 is required. Here is how one looks in exiftool, created with Adobe DNG Converter 16.1.

exiftool _MG_9375_1.dng ExifTool Version Number : 12.70 File Name : _MG_9375_1.dng File Size : 5.4 MB File Type : DNG File Type Extension : dng MIME Type : image/x-adobe-dng Exif Byte Order : Little-endian (Intel, II) Make : Canon Camera Model Name : Canon EOS DIGITAL REBEL XT Preview Image Start : 91884 Orientation : Rotate 270 CW Rows Per Strip : 171 Preview Image Length : 10305 Software : Adobe DNG Converter 16.1 (Macintosh) Modify Date : 2023:12:18 11:45:06 Artist : unknown Image Width : 3516 Image Height : 2328 Bits Per Sample : 16 Compression : JPEG XL DNG Version : 1.7.1.0 DNG Backward Version : 1.7.1.0

I had recently submitted a new signature for DNG 1.7 to PRONOM, but I found this new DNG version falls outside the signature I created. I had made the assumption all DNG’s report their version based on the last two values of 0.0, so I created the signature to look for 1.7.0.0. This is wrong now that I can see an example of version 1.7.1.0.

In order to fix the issue, I would need to change all the DNG signatures to remove the last two bytes so:

12C601000400000001070000 would change to 12C60100040000000107

This would allow for identification if some DNG files have a point version.

The pace at which manufacturers are producing camera’s with new features is much faster than the Digital Preservation community can keep up with. As new technologies get released, we play catch up trying to identify new formats and variations to existing ones. I guess that is job security?

Final Cut Pro

Obsolete Thor - 15 december 2023 - 7:54am

When it comes to Digital Preservation, the easiest types of file formats to preserve are often single self contained formats with lots of documentation. There are plenty of formats which break this norm, but a file format like a simple TIFF file is well understood and can stand on its own. The hardest file formats to preserve, I have found, are the complex under documented formats which often show up when you don’t expect them. There is a file format type which indeed makes things difficult. The project format.

There are many software tools out there which generate a “Project”, this is often proprietary and can only be used by the software which created it. Project files are also interdependent, meaning they require other files in known locations in order to be used. This interdependence is often links to images, audio, video, fonts, and other multimedia. The file format itself is just a reference to all the project settings and the paths to the files included in the project. This makes things very difficult to preserve and maintain the complex structure required. Any renaming, removing, or moving the files out of their original order can render the project useless. Many project formats are human readable in XML, or other human readable text, but others are not. I have made a recent attempt to document more Project formats on the File Format Wiki, including many Label and Optical disc project formats, along with updates to Adobe InDesign, QuarkXPress and other desktop publishing project formats. There is still plenty of work needed in other Video and Audio project formats.

Apple computers over the years has created some very powerful software for content creators to use, especially in Video editing. iMovie was used by many home movie editors and iDVD to burn those movies to DVD to share with family and friends, but Apple also sold a professional Video Editing suite which included Final Cut Pro.

Final Cut Pro started life as a Macromedia software tool called KeyGrip which never was released and later bought by Apple. Final Cut Pro was well used and loved by video editors and was given a major upgrade in 2011 to Final Cut Pro X, which was full re-written to be 64-bit. This change included a change to the Project file format. So for version 1 through version 7, Final Cut Pro used a project format with the extension .FCP. Lets take a closer look at the this project format.

hexdump -C Swing.fcp | head 00000000 a2 4b 65 79 47 0a 0d 0a 00 00 00 00 20 fc c5 5b |.KeyG....... ..[| 00000010 00 de b3 11 d0 93 19 00 05 02 18 66 07 00 00 00 |...........f....| 00000020 03 00 00 00 00 00 00 00 00 01 00 00 00 00 01 00 |................| 00000030 00 00 11 07 73 75 62 74 79 70 65 00 00 00 01 01 |....subtype.....| 00000040 00 00 00 03 00 06 4e 4f 55 4e 44 4f 00 00 00 00 |......NOUNDO....| 00000050 01 01 00 00 00 00 00 00 00 00 00 00 00 07 52 55 |..............RU| 00000060 4e 54 49 4d 45 00 00 00 00 01 01 00 00 00 00 00 |NTIME...........| 00000070 00 00 00 01 07 76 69 65 77 65 72 73 00 00 00 00 |.....viewers....| 00000080 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 08 |................| 00000090 63 68 69 6c 64 72 65 6e 00 00 00 00 01 01 00 00 |children........| * 00000e30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 8c |................| 00000e40 b3 2e 56 40 4d 6f 6f 56 54 56 4f 44 00 02 00 02 |..V@MooVTVOD....| 00000e50 00 00 00 11 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000e60 00 00 00 0b 44 61 6e 63 65 20 53 68 6f 74 73 00 |....Dance Shots.| 00000e70 00 01 00 08 00 00 07 8a 00 00 07 84 00 02 00 2f |.............../| 00000e80 41 54 54 4f 20 52 41 49 44 30 20 47 72 6f 75 70 |ATTO RAID0 Group| 00000e90 3a 54 55 54 4f 52 49 41 4c 3a 44 61 6e 63 65 20 |:TUTORIAL:Dance | 00000ea0 53 68 6f 74 73 3a 49 6e 74 72 6f 2e 6d 6f 76 00 |Shots:Intro.mov.| 00000eb0 00 09 00 a8 00 a8 61 66 70 6d 00 00 00 00 00 03 |......afpm......| 00000ec0 00 18 00 39 00 59 00 75 00 95 00 9e 07 49 4c 31 |...9.Y.u.....IL1| 00000ed0 20 33 72 64 00 00 00 00 00 00 00 00 00 00 00 00 | 3rd............| 00000ee0 00 00 00 00 00 00 00 00 00 00 00 00 00 0f 77 61 |..............wa| 00000ef0 6c 74 d5 73 20 43 6f 6d 70 75 74 65 72 00 00 00 |lt.s Computer...| 00000f00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 41 54 |..............AT| 00000f10 54 4f 20 52 41 49 44 30 20 47 72 6f 75 70 00 00 |TO RAID0 Group..| 00000f20 00 00 00 00 00 00 00 00 00 07 77 73 68 69 72 65 |..........wshire| 00000f30 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............| 00000f40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000f50 00 00 00 00 00 00 00 00 00 00 00 00 ff ff 00 00 |................| 00000f60 00 00 00 00 00 00 00 10 41 54 54 4f 20 52 41 49 |........ATTO RAI| 00000f70 44 30 20 47 72 6f 75 70 00 00 00 00 00 00 00 2b |D0 Group.......+| 00000f80 00 00 00 01 00 00 00 03 00 00 00 03 54 55 54 4f |............TUTO| 00000f90 52 49 41 4c 00 44 61 6e 63 65 20 53 68 6f 74 73 |RIAL.Dance Shots| 00000fa0 00 49 6e 74 72 6f 2e 6d 6f 76 00 00 00 00 00 00 |.Intro.mov......|

From the header we can see a remnant of the original KeyGrip software, but later in the file we find some references to files in the Mac HFS path format which includes a colon instead of a slash. These are the paths to the each of the MOV files used in the Project. This file is from the tutorial disk of Final Cut Pro version 1.2, so lets take a look at the last version released, version 7.

hexdump -C Lesson 1 Project.fcp | head 00000000 a2 4b 65 79 47 0a 0d 0a 01 de 00 00 00 20 08 92 |.KeyG........ ..| 00000010 66 c4 28 d7 11 8a e5 00 30 65 ec fe 98 03 00 00 |f.(.....0e......| 00000020 00 00 00 00 00 00 00 00 00 01 00 00 00 00 01 15 |................| 00000030 00 00 00 07 73 75 62 74 79 70 65 01 00 00 00 01 |....subtype.....| 00000040 03 00 00 00 00 06 4e 4f 55 4e 44 4f 00 00 00 00 |......NOUNDO....| 00000050 01 01 00 00 00 00 00 00 00 00 00 00 00 07 52 55 |..............RU| 00000060 4e 54 49 4d 45 00 00 00 00 01 01 00 00 00 00 00 |NTIME...........| 00000070 01 00 00 00 07 76 69 65 77 65 72 73 00 00 00 00 |.....viewers....| 00000080 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 08 |................| 00000090 63 68 69 6c 64 72 65 6e 00 00 00 00 01 01 01 00 |children........|

Almost identical to the first version, which is helpful for identification, but if we need to identify based on version, it might prove a little more difficult. It appears all the samples I have and have seen reference to all begin with the same 5 hex values, A24B657947, 0xA2 KeyG. It’s hard to know what other hex values might have something to do with versions of the file format. More samples could tell us, but from what I have the 20 bytes starting from offset 12 seems to be consistent among the different version samples. But for now the 5 bytes at the beginning of the file should suffice for identification.

When Final Cut Pro went through a complete re-write in 2011, the FCP format was abandoned. Not only made obsolete, but completely unsupported. The new Final Cut Pro X software was not able to support this now obsolete format. The new format followed the pattern of many other Apple formats of using a folder identified through an extension as a single file. Called a bundle format, Final Cut Pro X used the extension, .FCPBUNDLE. This bundle could include the media assets along with project settings/thumbnails and clips. Because of this “bundle” format, identification would have to be done at the individual file level inside the bundle. This would include formats with extensions such as .flexolibrary and .fcpevent, which appear to be SQLite databases. This complex format makes preservation of this type of object difficult with current methods and practices.

Luckily Apple didn’t leave Final Cut Pro users completely unable to migrate their content. Final Cut Pro could export the project as an XML file. This format is called Final Cut Pro XML Interchange Format and was well documented. The format was not made to bridge the gap from Final Cut Pro to Final Cut Pro X, but rather make the project file more useful outside of Final Cut Pro. Final Cut Pro X actually can’t open these files either, which is why a third party developer came in and developed 7toX (SendtoX) to allow for projects to be converted to a newer XML format.

Lets take a look at the basic Final Cut Pro XML Interchange Format which has a standard XML extension:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xmeml> <xmeml version="5"> <sequence id="Sequence 1 ">...</sequence> </xmeml>

Standard XML with a Doctype/root of xmeml. Clever. A little ways into the XML we also see:

<appspecificdata> <appname>Final Cut Pro</appname> <appmanufacturer>Apple Inc.</appmanufacturer> <appversion>7.0</appversion> </appspecificdata>

Final Cut Pro X also has an XML format which is different than XMEML and has an extension FCPXML:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE fcpxml> <fcpxml version="1.8"> <resources> <format id="r1" name="FFVideoFormatDV720x480i5994" frameDuration="2002/60000s" fieldOrder="lower first" width="720" height="480" paspH="10" paspV="11" colorSpace="6-1-6 (Rec. 601 (NTSC))"/> </resources> <library location="file:///Untitled.fcpbundle/">...</library> </fcpxml>

A different Doctype/root and structure but should be easy to identify.

The preservation of projects files, according to some, is not necessary since they are not the finalized product. Preserving the finalized output would be preferable as it can be managed easier and represent the final render of a project. But identification of the Final Cut Pro project and all the assets gives the option to access a collection more accurately. I was able to create a signature for the FCP, XML, and FCPXML formats. Take a look on my GitHub for the signatures and some test files.

PianoSoft DOM-30

Obsolete Thor - 8 december 2023 - 8:05am

I often find myself at a thrift store looking through the well used Compact Discs. Often see the same ones over and over, but occasionally finding a gem. While looking through a set of discs, a few caught my eye. When I pulled one out to look at the cover I noticed it was not your typical CD. Opening the cover I was greeted with a 3.5 floppy inside the jewel case. That was a fun surprise.

The 3.5 inch floppy disk appears to be made specifically for the Yamaha Disklavier piano’s. The disk had the appearance of your typical double density floppy. Unfortunately, when I inserted the disk I was greeted with the error, “no mountable file systems”. I was however able to use ddrescue and make a disk image. Here is what the disk header looks like:

hexdump -C Yamaha_RazzleDazzle.img | head 00000000 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 |................| * 00000200 f9 ff ff 03 40 00 05 60 00 07 80 00 47 00 00 0b |....@..`....G...| 00000210 c0 00 0d e0 00 0f f0 ff 11 20 01 13 40 01 15 f0 |......... ..@...| 00000220 ff 17 80 01 19 a0 01 1b c0 01 42 00 00 1f 00 02 |..........B.....| 00000230 21 20 02 23 40 02 25 60 02 27 80 02 29 f0 ff 2b |! .#@.%`.'..)..+| 00000240 c0 02 2d e0 02 2f f0 ff 31 20 03 33 40 03 35 60 |..-../..1 .3@.5`| 00000250 03 37 80 03 39 f0 ff 3b c0 03 3d e0 03 3f 00 04 |.7..9..;..=..?..| 00000260 41 f0 ff ff 4f 04 45 60 04 48 f0 ff 49 f0 ff 00 |A...O.E`.H..I...| 00000270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

The header actually has 512 bytes of the E5 hex values. Not a FAT12 file system for sure. A little farther into the disk image I can see what appears to be file listing.

00000e00 53 4f 4e 47 20 20 20 20 50 30 31 00 00 00 00 00 |SONG P01.....| 00000e10 00 00 00 00 00 00 00 00 00 00 02 00 00 20 00 00 |............. ..| 00000e20 53 48 4f 52 54 20 20 20 50 30 32 20 00 00 00 00 |SHORT P02 ....| 00000e30 00 00 00 00 00 00 fa ac a2 16 0a 00 00 18 00 00 |................| 00000e40 54 4f 59 20 20 20 20 20 50 30 33 20 00 00 00 00 |TOY P03 ....| 00000e50 00 00 00 00 00 00 05 ad a2 16 10 00 00 18 00 00 |................| 00000e60 53 57 45 45 54 20 20 20 50 30 34 20 00 00 00 00 |SWEET P04 ....| 00000e70 00 00 00 00 00 00 12 ad a2 16 16 00 00 20 00 00 |............. ..| 00000e80 56 49 4f 4c 49 4e 20 20 50 30 35 20 00 00 00 00 |VIOLIN P05 ....| 00000e90 00 00 00 00 00 00 40 ad a2 16 1e 00 00 30 00 00 |......@......0..| 00000ea0 51 55 49 45 54 20 20 20 50 30 36 20 00 00 00 00 |QUIET P06 ....| 00000eb0 00 00 00 00 00 00 50 ad a2 16 2a 00 00 18 00 00 |......P...*.....| 00000ec0 52 41 5a 5a 4c 45 20 20 50 30 37 20 00 00 00 00 |RAZZLE P07 ....| 00000ed0 00 00 00 00 00 00 83 ad a2 16 30 00 00 28 00 00 |..........0..(..| 00000ee0 4c 45 54 53 20 20 20 20 50 30 38 20 00 00 00 00 |LETS P08 ....| 00000ef0 00 00 00 00 00 00 9a ad a2 16 3a 00 00 20 00 00 |..........:.. ..| 00000f00 50 49 41 4e 4f 44 49 52 46 49 4c 20 00 00 00 00 |PIANODIRFIL ....| 00000f10 00 00 00 00 00 00 a1 ad a2 16 43 00 00 18 00 00 |..........C.....| 00000f20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001c00 fe 00 00 00 20 00 00 43 4f 4d 2d 45 53 45 51 51 |.... ..COM-ESEQQ| 00001c10 31 31 56 31 2e 30 30 80 00 00 00 d9 01 00 00 00 |11V1.00.........| 00001c20 20 00 00 01 58 00 00 20 20 20 20 20 20 20 20 20 | ...X.. | 00001c30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |

This gave me a few clues. A few google searches I came across a project on Hackaday about “Hacking Yamaha Disklavier Floppies“. I was eager to test out the python software and see if could export the files on the disk image. To my disappointment, the python script could not read my disk image. So I reached out to the author. After sharing a couple disk images with him, he was able to enable support for this different type of disklavier disk.

python3 disklav.py -t Yamaha_RazzleDazzle.img Loading file...OK Format: PianoSoft DOM-30 Disk: PPC 1919 Title: RAZZLE DAZZLE -------------------------------------------------------------------- Track 01 - Song Without Words Track 02 - Shortenin' Bread Boogie Track 03 - Toy Bugle Track 04 - Sweet Tooth Track 05 - The Mysterious Violin Track 06 - Quiet Moment Track 07 - Razzle Dazzle Track 08 - Let's Have A Party!

When I used the extract option with the software I was rewarded with eight files with the .FIL extention.

sf Yamaha_RazzleDazzle-track01.fil --- siegfried : 1.10.1 scandate : 2023-12-04T22:38:04-07:00 signature : default.sig created : 2023-12-04T22:37:35-07:00 identifiers : - name : 'pronom' details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml' --- filename : 'Yamaha_RazzleDazzle-track01.fil' filesize : 6409 modified : 2023-12-04T22:28:34-07:00 errors : matches : - ns : 'pronom' id : 'UNKNOWN' format : version : mime : class : basis : warning : 'no match'

No surprise the file was not known to PRONOM. I doubt these files have made a big appearance in many archives.

hexdump -C Yamaha_RazzleDazzle-track01.fil | head 00000000 fe 00 00 00 20 00 00 43 4f 4d 2d 45 53 45 51 51 |.... ..COM-ESEQQ| 00000010 31 31 56 31 2e 30 30 80 00 00 00 d9 01 00 00 00 |11V1.00.........| 00000020 20 00 00 01 58 00 00 20 20 20 20 20 20 20 20 20 | ...X.. | 00000030 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | | * 00000120 20 20 20 20 20 20 20 00 76 04 02 00 1e ff 00 00 | .v.......| 00000130 ff ff ff ff ff 00 ff 00 00 00 00 ff ff 00 00 00 |................| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000150 00 00 00 00 00 02 02 02 02 02 02 02 02 01 01 01 |................| 00000160 01 01 01 01 01 01 01 01 01 01 01 01 01 00 00 00 |................|

The header of a .FIL file has an ascii string “COM-ESEQ”. A little investigation shed a little light on the format. Turns out there is a proprietary format Yamaha developed called “E-SEQ”. The E-SEQ format is compatible with all Disklaviers, Clavinova digital pianos, and a few other Yamaha products. I was curious if the format was something similar to a MIDI file, which was commonly used with early keyboard systems, but I was unable to find anything to suggest they are similar in any way. Yamaha does mention there are tools out there to convert an E-SEQ file to a SMF (Standard Midi File) which was used on other systems.

There is another tool called PPFBU which can be used to extract a disk image from a Disklavier floppy and the E-SEQ files. Along with a companion tool called MID2PianoCD claims to be able to convert a E-SEQ to a WAV or MP3, although I haven’t had much luck.

Another set of tools are available here, they allow for copying of a disk and converting back and forth from E-SEQ and Midi. A text file in DVUtils from the link has the following background about the disk format and the FIL files:

DISKLAVIER FILES AND DISCS Yamaha Disklavier discs are always on Double Density (2DD) media, High Density (HD)discs, which are more common nowadays, will not work. Furthermore, they are formatted to 720 Kbytes not the default of 1.2 Mbytes. The original discs are copy protected. This has been achieved by placing invalid data on the first sector. As DOS and Windows always refer to this sector to check out a floppy, they will report that the discs are bad. The Yamaha machinery ignores the first sector so it reads them normally. The music files on a Disklavier disc have the extension .FIL . They are frequently identified with titles like PIANO001.FIL but sometimes they have names similar to DOS like MUSIC1.FIL. In addition to the music files, there is an index file on the disc. This contains a list of the active music files on the disc, their titles, and pointers to their position on the disc. The index file is always called PIANODIR.FIL and always has a size of 6 Kbytes. In order to set up a Disklavier disc to function on a Disklavier, you must first copy the music files onto it in Disklavier format (ESEQ) and then run the ESEQ EXPLORER program to build the index file.

Although there are many Disklavier Piano’s still out there and quite expensive if you want to pick one up for yourself, the websites dedicated to the format as slowly disappearing. One archived website has plenty of sample files to download and write to a floppy for use if you happen to have a Disklavier.

You can check out the signature I put together for the E-SEQ on my Github. Might be good to explore the disk image format more and add that as well to PRONOM for identification as well.

Pagina's

Abonneren op Informatiebeheer  aggregator - Voortbestaan