U bent hier

Internet Archive

Abonneren op feed Internet Archive
A blog from the team at archive.org
Bijgewerkt: 57 min 1 sec geleden

2020 Census + Internet Archive = Democracy

1 december 2020 - 8:24pm

2020 has been a wild year for all of us. I think we all looked forward to such an interesting-sounding year, and knew, at the onset of a new decade, additional importance would inherently be attached: a US presidential election, the Olympics, the sheer joy of having numerical continuity to lean on. For those who follow the machinations of the US government closely, we also knew there was a census scheduled to occur, the once-a-decade effort by the US Census Bureau to accurately quantify the population of the United States. This valiant effort is instrumental in providing data that drives important decisions, from the amount of federal funding districts get for schools, hospitals, and other services, to the number of seats each state will have in the House of Representatives.



It is critically important to have an accurate count to drive these decisions. At the Internet Archive, we understood that importance and wanted to help. Late in 2019, we were contacted by representatives from the US Census Bureau inquiring to utilize our Headquarters to train workers for the census count — enumerators, as they’re called. We heartily responded with a yes. When our founder, Brewster Kahle, chose this building as our Headquarters, it was in part to be able to support civic measures such as this.



Of course, the COVID-19 pandemic threw a massive wrench into these plans. From our initial conversations, we had pegged late March 2020 as the time to commence the training, which unfortunately coincided with the global shutdown to quell the spread of the virus. So many conflicting emotions colored those days early in the pandemic — fear, uncertainty — but with respect to this commitment we’d made to help, we couldn’t help but feel for the organizers of the census. Many of them are volunteers, many of them in higher-risk categories for this virus and now faced an immensely more difficult challenge. They were now asked to train workers and collect data during this very difficult time, all with the goal of attaining an accurate count. So we kept the lines of communication open, checking every week or so to see how their plans were developing, and continued to offer our support.

Eventually, as the calendar shifted from spring to summer, we settled on a plan and some dates — having groups of eight enumerators be trained in shifts over the course of the first week in August. For those unfamiliar with our Headquarters in San Francisco, we are fortunate to have a 600-person auditorium we refer to as the Great Room, which was well-suited to conduct this safely. It has large windows and doors that are able to remain open, large fans circulating the air, and a separate entrance for folks to come and go.



All told, 40 people were trained to help conduct the census in and around San Francisco. We had hoped to help to a larger extent, but such is life in 2020. It’s hard to tell how much of an impact this had, but we hope it helped, and it remains important to us to support these often unseen but critical aspects of how a community and a society functions.

Hopefully, we’ll all be around in 2030 the next time the census rolls around and we can further assist this effort. To learn more about the census, follow the link below, and be sure to thank a census worker if you run across one.

https://2020census.gov/en.html

The post 2020 Census + Internet Archive = Democracy appeared first on Internet Archive Blogs.

Contest: The Internet Archive is Looking For Creative Short Films Made By You!

1 december 2020 - 3:14am

We are looking for artists of all levels to create and upload a short film of 2-3 minutes to the Internet Archive to help us celebrate Public Domain Day on December 17th!

Public Domain Day is a celebration of all the rich content that will be newly available to the public free of copyright restrictions from the year 1925. We want artists to use this newly available content to create short films that contain content from the archive’s collection from 1925. The uploaded videos will be judged and prizes of up to $1500 awarded!! (Please see details below)

Winners will be announced and shown at the virtual Public Domain Day Celebration on December 17th at 3pm Pacific (registration opens soon), and we will introduce the artists. 

Here are a few examples of some of the rich content that is now available for you to use:

Possible themes include, but are not limited to:  

  • The Great Gatsby (going Public Domain January 1, 2021)
  • Gilded Age, Industrial Age
Guidelines
  • Make a 2-3 minute movie using Newly Public Domain Material from 1925 (If you have something to add to the Internet Archive from 1925, then please add it in and feel free to use it)
  • Mix and Mash content however you like
  • Add a personal touch, make it yours!
  • Keep the videos light hearted and fun (It is a celebration after all!)
Submission Deadline

All submissions must be in by Midnight, December 13th, 2020 (PST)

How to Submit Prizes
  • 1st prize: $1500
  • 2nd prize: $1000
  • 3rd prize: $500

*All prizes sponsored by the Kahle/Austin Foundation

Judges

Judges will be looking for videos that are fun and interesting for showing at the Public Domain Day virtual party and that highlight the value of having cultural materials that can be reused, remixed, and re-contextualized for a new day. Winner’s pieces will be purchased with the prize money, and then put into public domain under a CC0 license.

  • Amir (Director of Special Arts Projects, Internet Archive)
  • TBA (Artist)
  • Brewster Kahle (Founder, Digital Librarian, Internet Archive)

The post Contest: The Internet Archive is Looking For Creative Short Films Made By You! appeared first on Internet Archive Blogs.

FOSS wins again: Free and Open Source Communities comes through on 19th Century Newspapers (and Books and Periodicals…)

23 november 2020 - 7:00am

I have never been more encouraged and thankful to Free and Open Source communities. Three months ago I posted a request for help with OCR’ing and processing 19th Century Newspapers and we got soooo many offers to help.  Thank you, that was heart warming and concretely helpful– already based on these suggestions we are changing over our OCR and PDF software completely to FOSS, making big improvements, and building partnerships with FOSS developers in companies, universities, and as individuals that will propel the Internet Archive to have much better digitized texts.  I am so grateful, thank you.   So encouraging.

I posted a plea for help on the Internet Archive blog: Can You Help us Make the 19th Century Searchable? and we got many social media offers and over 50 comments the post– maybe a record response rate.   

We are already changing over our OCR to Tesseract/OCRopus and leveraging many PDF libraries to create compressed, accessible, and archival PDFs.

Several people suggested the German government-lead initiative called OCR-D that has made production level tools for helping OCR and segment complex and old materials such as newspapers in the old German script Fraktur, or black letter.  (The Internet Archive had never been able to process these, and now we are doing it at scale).   We are also able to OCR more Indian languages which is fantastic.  This Government project is FOSS, and has money for outreach to make sure others use the tools– this is a step beyond most research grants. 

Tesseract has made a major step forward in the last few years.  When we last evaluated the accuracy it was not as good as the proprietary OCR, but that has changed– we have done evaluations and it is just as good, and can get better for our application because of its new architecture.   

Underlying the new Tesseract is a LSTM engine similar to the one developed for Ocropus2/ocropy, which was a project led by Tom Bruel (funded by Google, his former German University, and probably others– thank you!). He has continued working on this project even though he left academia.  A machine learning based program is introducing us to GPU based processing, which is an extra win.  It can also be trained on corrected texts so it can get better.  

Proprietary example from an Anti-Slavery newspaper from my blog post:

New one, based on free and open source software that is still faulty but better:

The time it takes on our cluster to compute is approximately the same, but if we add GPU’s we should be able to speed up OCR and PDF creation, maybe 10 times, which would help a great deal since we are processing millions of pages a day.

The PDF generation is a balance trying to achieve small file size as well as rendering quickly in browser implementations, have useful functionality (text search, page numbers, cut-and-paste of text), and comply with archival (PDF/A) and accessibility standards (PDF/UA). At the heart of the new PDF generation is the “archive-pdf-tools” Python library, which performs Mixed Raster Content (MRC) compression, creates a hidden text layer using a modified Tesseract PDF renderer that can read hOCR files as input, and ensures the PDFs are compatible with archival standards (VeraPDF is used to verify every PDF that we generate against the archival PDF standards). The MRC compression decomposes each image into a background, foreground and foreground mask, heavily compressing (and sometimes downscaling) each layer separately. The mask is compressed losslessly, ensuring that the text and lines in an image do not suffer from compression artifacts and look clear. Using this method, we observe a 10x compression factor for most of our books.

The PDFs themselves are created using the high-performance mupdf and pymupdf python library: both projects were supportive and promptly fixed various bugs, which propelled our efforts forwards.

And best of all, we have expanded our community to include people all over the world that are working together to make cultural materials more available. We have a slack channel for OCR researchers and implementers now, that you can join if you would like.  We look to contribute software and data sets to these projects to help them improve (lead by Merlijn Wajer and Derek Fukumori).

Next steps to fulfill the dream of Vanevar Bush’s Memex, Ted Nelson’s Xanadu, Michael Hart’s Project Gutenberg, Tim Berners-Lee’s World Wide Web,  Raj Ready’s call for Universal Access to All Knowledge (and now the Internet Archive’s mission statement):

  • Find articles in periodicals, and get the titles/authors/footnotes
  • Linking footnote citations to other documents
  • OCR Balinese palm leaf manuscripts based 17,000 hand entered pages.
  • Improve Tesseract page handling to improve OCR and segmentation
  • Improve epub creation, including images from pages
  • Improve OCRopus by creating training datasets

Any help here would be most appreciated.

Thank you, Free and Open Source Communities!  We are glad to be part of such a sharing and open world.

The post FOSS wins again: Free and Open Source Communities comes through on 19th Century Newspapers (and Books and Periodicals…) appeared first on Internet Archive Blogs.

Flash Back! Further Thoughts on Flash at the Internet Archive

22 november 2020 - 5:37am

A little behind the scenes here at the Archive: this blog is the province of a wide range of sub-groups, from books and partnerships over to development and collaborators. There’s usually a little traffic jam to schedule or make sure entries don’t go over each other, so this “sequel” post is being written before we return you to other Archive news.

The big announcement last week about the Internet Archive hosting Flash animations/games and making them run in the browser thanks to the Emularity and Ruffle made a huge splash. If you haven’t read that entry, you should definitely read it first.

Here’s some observations about Flash and the Internet Ecosystem from the last three rambunctious days. Obviously, the story of us including Flash doesn’t end here – we’ll continue to update Ruffle as it improves, and both users and collaborators are adding new animations at a pretty stunning clip. Be sure to keep checking the Flash Collection at the Archive for new additions.

What have we learned so far?

The Idea of Playing Flash in the Browser Past The End of The Year Is Very Popular

It was assumed, and has proven out, that being able to play Flash items, be they animations, toys or games, is an extremely popular idea: Tens of thousands of people have been flooding into the Archive to try things out. The “death” of Flash as a default plugin for browsers and the removal of easy access to it definitely had many people sad and concerned.

That said, assuming that Adobe and any other vendors were not going to throw the significant resources behind security and maintenance that Flash plugins would require, removing default support for it made sense. Sometimes these choices are not great for the historical Web, but sideloading in significant attack surfaces just because people like old games is not ideal either.

Ruffle is not Flash. It is an emulator that takes .SWF files (which worked with Flash) and makes a very good attempt to display what the file means to do. It is written in an entire other language with an entire other team of programmers, and is working with a specification and history that is ossified. In that way, it is hoped that the security issues of Flash can be avoided but the works can live on.

And are they living on!

Even in the very short time that this new feature has been announced, the news was picked up by Boing Boing, Engadget, The Verge, The Register, Gizmodo, PC Gamer, and dozens of other locations (and the top spot at Hacker News for a while). That increased the flood of visitors to our site and we’ve held up pretty well, due to the high compression rates and small file sizes of Flash.

People Have Very Strong Memories of Flash; For Some It Represents Childhood

Everyone has a different timeline with computers and the internet, but for countless people using their phones and connections today, Flash plays as critical a role in their childhood memories as a game console or television show. Students could sneak flash games into the computer labs, or trade USB sticks with Flash, or simply get around filters preventing “obvious” entertainment sites to find a single URL that gave them a racing or RPG game to while away an afternoon on.

And, most notably, not just as players, but as creators. There are, it turns out, a significant amount of professional artists and coders who count Flash and related technologies as their very first “programming language”. Going through our collection, you can find ten-person studio productions side-by-side a game made by a driven teenager at home, and the teenager will have gotten more popular. Intended to be used for creative works, the Flash environments over the years provided the launchpad for thousands of careers and creative outlets.

The Role of Flash Wasn’t Obvious To a Lot of People

An interesting situation as people come face to face with in some of these animations in the Flash collection are that many didn’t know they were Flash.

Video sites, such as Youtube, are a mid to late 2000s addition to the Internet. Previously, with dial-up modems as the main connection to the Internet, streaming video was a distant and hazy dream that seemed impossible to provide beyond a small experimental or well-connected crowd. Filling that need was Flash, which could compress down incredibly small (a full song and video to accompany it could be under five megabytes, or even one megabyte) and they even had quality settings for less powerful computers. Flash animation could “pre-load” the data required that was coming over a modem, giving an update as to progress or a small game to play, until the full “video” was downloaded. This has all been swept away into the dustbin of memory in a world where 4k 60fps video is possible (if still not to everyone).

With the jump to video in the mid 2000s, many Flash animations were transcoded into MPEG files, or animated GIFs, or uploaded to Youtube as fully-realized video, even though Flash was the original medium. As the more well-crafted works gained attention in this new space, the old formats were forgotten.

Since the Ruffle browser has a fullscreen option (right-click, soon to be a button to the right of the animation), if the Flash animation was done using vectors, they will scale up to 4k displays smoothly. Unlike old video, the original works will keep up with the newest technology very nicely and will give added appreciation for the efforts in the original piece.

Flooding All These Old Flash Works Has High and Low Moments

Because nearly anyone could create flash animations and games, nearly anyone did. It also meant that filters on quality, profanity, or unusual subjects were gone.

Sometimes that worked out very nicely: Imagine trying to pitch an animated film like The Ultimate Showdown of Ultimate Destiny to a studio or backers to make for film festivals. A game like Castle Cat is bizarre and a collage of pop culture but plays as well as a professional game at the time. (it even got a sequel.)

Other times, the works are clunky, poorly programmed, and full of offensive jokes and material. They could literally be after-school projects or whipped up in a weekend to make fun of someone or something and then get trapped in amber to the present day. Wandering the stacks, with what will soon be thousands of items, can be daunting.

As a result, the Showcase was created to highlight the best of the best, the handful that really universally stand out as entertaining, well-made, and uplifting (or at least, thought-provoking).

By the way, if the towering piles of Flash works seems daunting now, imagine what it was like 20 years ago for people slowly moving through page after page, taking minutes to download a given animation, and clicking on it with no idea what they’d be seeing next.

Adding Your Own Flash Is Difficult But Rewarding

It is notably complicated to add new working Flash to our collection. This is a side effect of all the different components that need to be activated in the Internet Archive structure. By far, the best document to read about how to test, upload, and describe SWF files is this document by the Flashpoint project:

https://bluemaxima.org/flashpoint/datahub/Uploading_SWFs_for_the_Internet_Archive

(As a side note, the two most common mistakes are setting “emulator-ext” instead of “emulator_ext” (see the difference?) and not setting the item to be a “software” media type. A script has been written that checks new uploads to find common mistakes and will sometimes tweak the uploads to fix them.)

There’s Still A Long Way to Go To “Perfect” or Wayback Playback

We shoved this entire ecosystem into the Archive “hot”, with known gaps in support for Flash features, and with bugs still being ironed out. Most Flash animations used a rather small set of scripting commands within the potential list, and those have been focused on by the Ruffle team, so a lot of animations do just fine. But more than just a few times, a Flash item will go in and there will be a critical failure, be it the inability to hit buttons or missing video/audio. This reflects the continual improvement of the emulator but also that entire swaths of support are still a way to go.

This also provides the answer to the question some are asking, which is how long before the Wayback Machine “just plays” old Flash items when you go to the page. Ruffle is still way too new to shove into the Wayback and the problems it would cause at this stage would be significant. Many improvements to Wayback and its reach have happened over the last year, with connections to Wikipedia, Cloudflare and Brave, but the day when you go to an old Flash-driven site and have it “just work” in Wayback is going to be a significant time in the future.

Which brings up another tangent:

Flash Interfaces to the Web Were The Worst Idea

With the benefit of hindsight, it’s clear that the fad of making Flash boot up and be the “menu” or selections for a website were unusually cruel to anyone in need of portability or accessibility. What’s thought of as “Web 1.0” (HTML files and simple flat files provided to servers) was extremely good for screen readers and keyboard shortcuts, providing important access to blind or disabled users, as well as expanding the amount of devices and systems that could use the Web. Flash took a lot of that away in the name of.. well, Flashiness. As this small burst of interest in Flash has occurred, a not-insignificant amount of people dependent on accessibility have said “Good Riddance to Flash”, and they’re entirely right. Captured inside little boxes on Internet Archive as displays in a museum, they work fine enough. But the Web should never have depended on Flash for navigation.

When Flash Is At Its Best, There’s Nothing Like It On The Internet

As people have been sharing the Flash animations they’ve found on the site, as well as providing their own additions, jewels have been coming to the forefront. Most inspiring have been artists and creators who did work 15 or 20 years ago and have been rifling through floppies and stored ZIP files to upload to our collection.

Watching this as they come in, it strikes us anew how much effort, artistic and otherwise, went into a good Flash animation. Crafting custom artwork, adding little touches and flair, and truly bringing something new into the world… this was the promise of Flash and every time someone in the modern age stumbles on a classic for the first time, all the effort is worth it.

Long Live Flash!

The post Flash Back! Further Thoughts on Flash at the Internet Archive appeared first on Internet Archive Blogs.

Flash Animations Live Forever at the Internet Archive

19 november 2020 - 9:39pm

Great news for everyone concerned about the Flash end of life planned for end of 2020: The Internet Archive is now emulating Flash animations, games and toys in our software collection.

Utilizing an in-development Flash emulator called Ruffle, we have added Flash support to the Internet Archive’s Emularity system, letting a subset of Flash items play in the browser as if you had a Flash plugin installed. While Ruffle’s compatibility with Flash is less than 100%, it will play a very large portion of historical Flash animation in the browser, at both a smooth and accurate rate.

We have a showcase of the hand-picked best or representative Flash items in this collection. If you want to try your best at combing through a collection of over 1,000 flash items uploaded so far, here is the link.

You will not need to have a flash plugin installed, and the system works in all browsers that support Webassembly.

For many people: See you later! Enjoy the Flash stuff!

Others might get this far down and ask “And what exactly is Flash?” or even “I haven’t thought about Flash in a very long time.” For both of these groups, let’s talk about Flash and what it represented in the 1990s and 2000s.

A Short History of of the Rise of Flash

In the early 1990s, web browsers were incredibly powerful compared to what came before – with simple files written in HTML that could generate documents that were mixing images and text, as well as providing links to other websites, it felt like nothing for computers had ever had this level of ease and flexibility. It really did change everything.

But people didn’t stay in a state of wonder.

It quickly became a request, then a demand, then a mission to allow animation, sound, and greater audio/video flexibility into webpages. A huge range of companies were on a mission to make this happen. While looking back it might seem like one or two tried, it was actually a bunch of companies, but out of the wreckage of experimentation and effort came a couple big winners: Shockwave and Flash.

Flash had once been called SmartSketch in 1993, which was rewritten as FutureWave, and was actually a challenger to Shockwave until purchased by Macromedia, who handled creation software and playback software for both products.

Flash had many things going for it – the ability to compress down significantly made it a big advantage in the dial-up web era. It could also shift playback quality to adjust to a wide variety of machines. Finally, it was incredibly easy to use – creation software allowed a beginner or novice to make surprisingly complicated and flexible graphic and sound shows that ran beautifully on web browsers without requiring deep knowledge of individual operating systems and programming languages.

From roughly 2000 to 2005, Flash was the top of the heap for a generation of creative artists, animators and small studios. Literally thousands and thousands of individual works were released on the web. Flash could also be used to make engaging menu and navigation systems for webpages, and this was used by many major and minor players on the Web to bring another layer of experience to their users. (There were, of course, detractors and critics of use of Flash this way – accessibility was a major issue and the locked-in nature of Flash as a menu system meant it was extremely brittle and prone to errors on systems as time went on.)

This period was the height of Flash. Nearly every browser could be expected to have a “Flash Plugin” to make it work, thousands of people were experimenting with Flash to make art and entertainment, and an audience of millions, especially young ones, looked forward to each new release.

However, cracks appeared on the horizon.

The Downfall of Flash

Macromedia was acquired by Adobe in 2005, who renamed Flash to Adobe Flash and began extensive upgrades and changes to the Flash environment. Flash became a near operating system in itself. But these upgrades brought significant headaches and security problems. Backwards compatibility became an issue, as well as losing interest by novice creators. Social networks and platforms became notably hostile to user-created artworks being loaded in their walled gardens.

It all came to a head in 2010, when Apple CEO Steve Jobs released an open letter called “Thoughts on Flash”. The letter was criticized and received strong condemnation from Adobe, and Apple ultimately backed off their plan (although work was done to support alternate tools).

The call-out, even if not initially successful, ended the party.

In November of 2011, Adobe announced it was ending support of Flash for mobile web browsers, and in 2017, announced it was discontinuing Flash altogether for 2020.

Flash’s final death-blow was the introduction of HTML 5 in 2014. With its ground-up acknowledgement of audio and video items being as important as text and images, HTML 5 had significant support for animation, sound and video at the browser level. This mean increased speed, compatibility, and less concern about a specific plugin being installed and from what source – audio/video items just worked and Flash, while still used in some quarters and certainly needed to view older works, stopped being the go-to approach for web designers.

What Are We Losing When We Lose Flash?

Like any container, Flash itself is not as much of a loss as all the art and creativity it held. Without a Flash player, flash animations don’t work. It’s not like an image or sound file where a more modern player could still make the content accessible in the modern era. If there’s no Flash Player, there’s nothing like Flash, which is a tragedy.

As you’ll see in the collection at the Archive, Flash provided a gateway for many young creators to fashion near-professional-level games and animation, giving them the first steps to a later career. Companies created all sorts of unique works that became catchphrases and memes for many, and memories they can still recall. Flash also led to unusual side-paths like “advergames”, banners that played full games to entice you to buy a product. Clones of classic arcade games abounded, as well as truly twisted and unique experiences unfettered by needing a budget or committee to come to reality. A single person working in their home could hack together a convincing program, upload it to a huge clearinghouse like Newgrounds, and get feedback on their work. Some creators even made entire series of games, each improving on the last, until they became full professional releases on consoles and PCs.

Why We Emulate Flash

The Internet Archive has moved aggressively in making a whole range of older software run in the browser over the past decade. We’ve done this project, The Emularity, because one of our fundamental tenets is Access Drives Preservation; being able to immediately experience a version of the software in your browser, while not perfect or universal, makes it many times more likely that support will arrive to preserve these items.

Flash is in true danger of sinking beneath the sea, because of its depending on a specific, proprietary player to be available. As Adobe Flash is discontinued, many operating systems will automatically strip the player out of the browser and system. (As of this writing, it is already coming to fruition a month before the end-of-life deadline.) More than just dropping support, the loss of the player means the ability of anyone to experience Flash is dropping as well. Supporting Ruffle is our line in the sand from oblivion’s gaze.

Credit Where Due

This project is by no means an Internet Archive-only production, although assistance from Dan Brooks, James Baicoianu, Tracey Jacquith, Samuel Stoller and Hank Bromley played a huge part.

The Ruffle Team has been working on their emulator for months and improving it daily. (Ruffle welcomes new contributors for the project at ruffle.rs.)

The BlueMaxima Flashpoint Project has been working for years to provide a desktop solution to playable web animation and multimedia, including Flash. Clocking in at nearly 500 gigabytes of data and growing, the project is located here: https://bluemaxima.org/flashpoint/

A shout-out to Guy Sowden, who first drafted the inclusion of Ruffle in the Emularity before it was refined elsewhere; your efforts set the ball rolling.

And finally, a huge thanks to the community of Flash creators whose creative and wonderful projects over the years led to inspiration in its preservation. We hope you’ll like your new, permanent home.

Bonus Section: Adding Your Own Flash Animations to the Archive!

For the creators, artists and collectors who have .swf files from the era of Flash and would like to see them uploaded to the Archive and working like our collection, here’s some simple instructions to do so.

Please note: Ruffle is a developing emulator, and compatibility with SWF files is continually improving but is not perfect. They have provided a test environment here to see if your SWF file will work. Please take the time to test before uploading to the Archive.

The Archive looks for one mediatype setting (software) and two metadata pairs set (emulator and emulator_ext) to know whether an item can be run in the Ruffle emulator. Here are those two settings:

emulator set to ruffle-swf
emulator_ext set to swf

The emulator only works with a single SWF file at the moment, which should have no spaces in it. With all these conditions in place, the swf item should be offered up to play and the emulator should work.

When uploading to the Archive, accurate or complete descriptions, title, creation date, are all optional but strongly encouraged to provide context for users. Additionally, if you create an image file (jpg, png or gif) and name it itemname_screenshot.ext,, like itemname_screenshot.png, it will become the official screenshot and thumbnail for the item. Notice how we named things here:

https://archive.org/download/flash_loituma

We’re here to help you if you run into any snags or issues. There’s no other location on the internet that does things quite this way, so if you do run into problems, feel free to mail Jason Scott about tech support and whatever assistance can be given will be provided.

The post Flash Animations Live Forever at the Internet Archive appeared first on Internet Archive Blogs.

Where Your Donation Goes

16 november 2020 - 5:00pm

As an independent nonprofit library, the Internet Archive is powered by donations from individual users, and every little bit helps. But have you ever wondered how your donations are used? Or what impact your giving has on our work? The contributions we receive are crucial to continuing our mission—here are a few ways they help!

Infrastructure

The Internet Archive builds and maintains all of its own infrastructure, rather than contracting it out. Right now we’re holding more than 70 petabytes of data, including millions of books, hundreds of millions of webpages, and thousands of collections focused on everything from video gaming to opera music. That’s a lot of storage space!

The donations we receive help us purchase servers, provide bandwidth, and pay the electricity bills, so that anyone, anywhere, can access our resources. This year our systems have seen more use than ever before, and we were able to make some upgrades thanks to the generosity of our patrons. Your donations allow us to serve more than 1.5 million visitors every day!

Staff

All those servers need people to build and maintain them. The website needs programmers to develop it, the collections need archivists to organize them, and our patrons need librarians to answer their questions. We employ 150 people around the world to scan books, build software, maintain data centers, acquire new materials, and find ways to make the archive better for our users. That’s a small staff for one of the world’s top 300 websites—and in 2020, they’ve stretched even farther by working remotely to keep the archive online. Most of our employees could make more at a profit-driven company, but they’ve chosen instead to work at a nonprofit where every dollar counts and the mission comes first.

Our Projects

Most importantly, the generosity of our users is used to fund our work! These projects include the Wayback Machine, a crucial tool for preserving the history of the web. In an era of disinformation and misinformation, having documentation of what’s being said and who’s saying it is absolutely critical—and your donations help us keep the record straight.

We also use patron contributions to run the Open Library, a free, digital lending library of over 4 million eBooks that can be read in a browser or downloaded for reading off-line. It costs us just $20 to acquire, digitize, and preserve a book forever, making it available to readers around the world—and thanks to the contributions from our patrons, we’re always adding to the stacks!

Other projects that your donations fund include the Decentralized Web initiative, the TV News Archive, and our preservation of open access journals. We also use donations to help acquire, transport, and digitize special collections—such as ephemera from the Tytell Typewriter Company, the Marygrove College Library, or a dizzying array of 78 rpm records.

How to Help

If you’d like to make a donation to the Internet Archive, we’d greatly appreciate your support! Your contribution helps us survive, thrive, and keep growing. In addition to our online donations portal, there are several other options for how you can give. If you would like to make a securities donation or receive information about estate planning, email joy@archive.org. You can even You can even donate using cryptocurrency!

If you’re unable to donate at the moment—or if you’ve already given—there are still ways you can lend a hand. Using Amazon Smile and setting the Internet Archive as your preferred charity will mean that we get a small donation every time you make a purchase. If your employer matches charitable contributions, you can easily double your impact—check your company here! And if you’re looking for more small ways you can help out, check out this blog post on how to make a difference right now without leaving the house.

We’re so grateful for each and every person who chooses to contribute to us. Thanks for your support, and enjoy the archive!

The post Where Your Donation Goes appeared first on Internet Archive Blogs.

The Rutgers University Poster Project

13 november 2020 - 5:45pm

Rutgers University and Internet Archive have collaborated to create a limited edition series of risograph posters. Facilitated by Amir Esfahani, Director of Special Art Projects at the Internet Archive, and Mindy Seu, Assistant Professor of Design in the Mason Gross School of the Arts, 14 students in the course Design Practicum gathered unique collections on the Internet Archive and then adapted their findings into an 11×17 graphic. These were printed on a risograph by the Brooklyn-based studio TXT Books. 

The first 40 people to sign up will receive a packet of these tabloid-size posters. Please sign up here! https://forms.gle/72sX8F8vM8sCBDwo6 (Please note: We can only provide shipment to people in the United States). 

http://fall2020-practicum.designforthe.net/

https://mindyseu.com/

www.saberesfahani.com

http://www.txtbooks.us/

Jeepneys – 1950s to Present by Pauline Yanes

Portfolio: https://paulineyanes.smvi.co/

Collection: https://archive.org/details/jeepneys-1950s-to-Present

After World War II, many military Jeeps were left in the Philippines by U.S. troops. These Jeeps were decorated and modified to hold more passengers. Since then, Jeepneys have become the most popular form of transportation in the country. This collection showcases Jeepneys in the Philippines starting from the 1950s, exploring a visual history of this symbol of Filipino culture.

Chinese Calligraphers of the Tang Dynasty 618CE—907CE by Zhongxuan Lin

Collection: https://archive.org/details/chinese-caligraphers-of-the-tang-dynasty-618-907

This collection includes the works of eight famous Chinese calligraphers born in Tang Dynasty. All of the images are photographs of the artwork written on paper or etched on monuments. 

Wartime Utility Furniture by Xinyi Huang

Collection: https://archive.org/details/war-time-utility-furniture

Utility furniture was first produced by the United Kingdom’s government during World War II due to the shortage of materials and usage rations.

Nintendo Box Art — USA vs. Japan by Derek Li

Portfolio: https://artfiles.rutgers.edu/~lid@art.rutgers.edu/projects/Arizona/index.html

Collection: https://archive.org/details/nintendo-game-box-art-usa-vs-japan

For this collection of comparisons between the USA’s and Japan’s box art for specific Nintendo games, it can be observed that the advent of global releases has removed much of the differences in box artwork with newer releases possessing nearly identical covers between the American and Japanese versions.

Souvenir Spoons Collected by The Fajardo-Reyes Family by Alexa Reyes

Portfolio: https://alexafreyes.github.io/

Social Media: https://www.instagram.com/alexareyesart/

Collection: https://archive.org/details/souvenir-objects-collected-by-the-fajardo-reyes-family

A growing collection of spoons gathered over several years by a first generation Filipino-American family from New Jersey. Each souvenir utensil has its own story, own memory, and own journey from traveling anywhere between across the country or across the ocean. 

Qing Dynasty Wealth Gap by Yuchao Wang

Collection: https://archive.org/details/qing-dynasty-wealth-gap

These photos display the extreme wealth gap between the Qing Dynasty’s upper class and civilians, revealing an invisible piece of history typically unseen in textbooks.

Fictional Languages (in video games) by Sarah Poon

Collection: https://archive.org/details/constructed-language

Video games develop fictional languages that cannot be used anywhere else in reality. Some languages are only audio-based instead of having a traditional visual alphabet. 

Rap Album Design 1993-2020 by Sebastian Lijo

Social Media: https://www.instagram.com/lijo.seb/ 

Collection: https://archive.org/details/rap-album-design-1993-2020

This collection was made to highlight the progression of graphic design on rap album covers. It begins in 1993, right in the middle of the golden era of rap, and extends to our current day. Two covers per year are shown in order of their appearance on the highest first-week sales charts.

Double Bass Archives by Yogini Borgaonkar

Collection: https://archive.org/details/double-bass-archives

The Double Bass Archives includes performances of classical compositions and each piece’s correlating sheet music. This collection acts as a resource, providing a deep dive into the sound, documentation, and physicality of the Double Bass.

Steven Universe Monopoly by Nicholas Plyler

Portfolio: art.rutgers.edu/~plyler

Collection: https://archive.org/details/steven-universe-monopoly    

This collection is meant to archive every single unique piece that comes from the Steven Universe Monopoly board game. These unique pieces can be used to traverse and visit iconic Steven Universe locations.

Horror Movie Posters of Dario Argento by Steve Tomori

Social Media: https://www.instagram.com/stevetomori_design   

Collection: https://archive.org/details/italian-horror-covers-by-director-dario-argento

This collection consists of horror movie posters from the director Dario Argento. It features Italian and American posters as well as some alternate versions. These movies were directed, and some even produced, by Dario Argento and span over decades.

Strobridge Lithographing Company’s Circus Posters — 1890s–1950s by Marinelle Manansala

Portfolio: marinellem.com    

Collection:https://archive.org/details/strobridge-lithographing-company-circus-posters-1890s-1950s

Circus posters were created by Cincinnati’s Strobridge Lithographing Company, printed in the 1890s through the 1950s. These posters focus on attracting the audience by depicting the unusual main acts in a dynamic composition. By 1900, they were known as the “Tiffany of Printers” since they had become one of the largest and most popular printing companies in the United States.

Transparencies by Anna Pittas

Portfolio: annapittas.com

Social Media: instagram.com/annapittasphotography

Collection: https://archive.org/details/kodachrome-mounted-color-transparencies

A collection of Kodachrome mounted color transparencies were taken between 1950-1970 by members of the Clarke family. The photos are mostly family photos, capturing fun memories.

Covid-19 Street Art by Catie Esposito 

Social Media: instagram.com/artbycatie

Collection: https://archive.org/details/covid-19-street-art  

A living collection of street art in the U.S.A. focused around the Coronavirus Pandemic. These works of art are often temporary, so I am attempting to document these murals as I see them, either in person or online. This is an ongoing project until the ‘pandemic’ is finally over.

The post The Rutgers University Poster Project appeared first on Internet Archive Blogs.

Internet Archive Participates in DOAJ-Led Collaboration to Improve the Preservation of OA Journals

5 november 2020 - 5:05pm

Since 2017, Internet Archive has pursued dedicated technical and partnership work to help preserve and provide perpetual access to open access scholarly literature and other outputs. See our original announcement related to this work and a recent update on progress. The below official press release announces an exciting new multi-institutional collaboration in this area.

The Directory of Open Access Journals (DOAJ), the CLOCKSS Archive, Internet Archive, Keepers Registry/ISSN International Centre and Public Knowledge Project (PKP) have agreed to partner to provide an alternative pathway for the preservation of small-scale, APC-free, Open Access journals.

The recent study authored by M.Laakso, L.Matthias, and N.Jahn has revived academia’s concern over the disappearance of the scholarly record disseminated in Open Access (OA) journals.

Their research focuses on OA journals as at risk of vanishing, and “especially small-scale and APC-free journals […] with limited financial resources” that often “opt for lightweight technical solutions” and “cannot afford to enroll in preservation schemes.” The authors have used data available in the Directory of Open Access Journals to come up with the conclusion that just under half of the journals indexed in DOAJ participate in preservation schemes. Their findings “suggest that current approaches to digital preservation are successful in archiving content from larger journals and established publishing houses but leave behind those that are more at risk.” They call for new preservation initiatives “to develop alternative pathways […] better suited for smaller journals that operate without the support of large, professional publishers.”

Answering that call, the joint initiative proposed by the five organisations aims at offering an affordable archiving option to OA journals with no author fees (“diamond” OA) registered with DOAJ, as well as raising awareness among the editors and publishers of these journals about the importance of enrolling with a preservation solution. DOAJ will act as a single interface with CLOCKSS, PKP and Internet Archive and facilitate a connection to these services for interested journals. Lars Bjørnhauge, DOAJ Managing Editor, said: “That this group of organisations are coming together to find a solution to the problem of “vanishing” journals is exciting. It comes as no surprise that journals with little to no funding are prone to disappearing. I am confident that we can make a real difference here.”

Reports regarding the effective preservation of the journals’ content will be aggregated by the ISSN International Centre (ISSN IC) and published in the Keepers Registry. Gaëlle Béquet, ISSN IC Director, commented: “As the operator of the Keepers Registry service, the ISSN International Centre receives inquiries from journal publishers looking for archiving solutions. This project is a new step in the development of our service to meet this need in a transparent and diverse way involving all our partners.”

About 50% of the journals identified by DOAJ as having no archiving solution in place use the Open Journal System (OJS). Therefore, the initiative will also identify and encourage journals on PKP’s OJS platform to preserve their content in the PKP Preservation Network (PKP PN), or to use another supported solution if the OJS instance isn’t new enough to be compatible with the PN integration (OJS 3.1.2+). 

The partners will then follow up by assessing the success and viability of the initiative with an aim to open it up to new archiving agencies and other groups of journals indexed in DOAJ to consolidate preservation actions and ensure service diversity.

DOAJ will act as the central hub where publishers will indicate that they want to participate. Archiving services, provided by CLOCKSS, Internet Archive and PKP will expand their existing capacities. These agencies will report their metadata to the Keepers Registry to provide an overview of the archiving efforts. 

Project partners are currently exploring business and financial sustainability models and outlining areas for technical collaboration.

DOAJ is a community-curated list of peer-reviewed, open access journals and aims to be the starting point for all information searches for quality, peer reviewed open access material. DOAJ’s mission is to increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline, geography or language. DOAJ will work with editors, publishers and journal owners to help them understand the value of best practice publishing and standards and apply those to their own operations. DOAJ is committed to being 100% independent and maintaining all of its services and metadata as free to use or reuse for everyone.

CLOCKSS is a not-for-profit joint venture among the world’s leading academic publishers and research libraries whose mission is to build a sustainable, international, and geographically distributed dark archive with which to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community. https://www.clockss.org.

Internet Archive is a non-profit digital library, top 200 website at https://archive.org/, and archive of over 60PB of millions of free books, movies, software, music, websites, and more. The Internet Archive partners with over 800 libraries, universities, governments, non-profits, scholarly communications, and open knowledge organizations around the world to advance the shared goal of “Universal Access to All Knowledge.” Since 2017, Internet Archive has pursued partnerships and technical work with a focus on preserving all publicly accessible research outputs, especially at-risk, open access journal literature and data, and providing mission-aligned, non-commercial open infrastructure for the preservation of scholarly knowledge.

Keepers Registry hosted by the ISSN International Centre, an intergovernmental organisation under the auspices of UNESCO, is a global service that monitors the archiving arrangements for continuing resources including e-serials. A dozen archiving agencies all around the world currently report to Keepers Registry. The Registry has three main purposes: 1/ to enable librarians, publishers and policy makers to find out who is looking after what e-content, how, and with what terms of access; 2/ to highlight e-journals which are still “at risk of loss” and need to be archived; 3/ to showcase the archiving organizations around the world, i.e. the Keepers, which provide the digital shelves for access to content over the long term.

PKP is a multi-university and long-standing research project that develops (free) open source software to improve the quality and reach of scholarly publishing. For more than twenty years, PKP has played an important role in championing open access. Open Journal Systems (OJS) was released in 2002 to help reduce cost as a barrier to creating and consuming scholarship online. Today, it is the world’s most widely used open source platform for journal publishing: approximately 42% of the journals in the DOAJ identify OJS as their platform/host/aggregator. In 2014, PKP launched its own Private LOCKSS Network (now the PKP PN) to offer OJS journals unable to invest in digital preservation a free, open, and trustworthy service. 

For more information, contact: 

DOAJ: Dom Mitchell, dom@doaj.org

CLOCKSS: Craig Van Dyck, cvandyck@clockss.org

Internet Archive: Jefferson Bailey, jefferson@archive.org

Keepers Registry: Gaëlle Béquet, gaelle.bequet@issn.org

PKP: James MacGregor, jbm9@sfu.ca

The post Internet Archive Participates in DOAJ-Led Collaboration to Improve the Preservation of OA Journals appeared first on Internet Archive Blogs.

Fact Checks and Context for Wayback Machine Pages

30 oktober 2020 - 4:54pm

Fact checking organizations and origin websites sometimes have information about pages archived in the Wayback Machine. The Internet Archive has started to surface some of these annotations for Wayback Machine users. We are attempting to preserve our digital history but recognize the issues around providing access to false and misleading information coming from different sources. By providing convenient links to contextual information we hope that our patrons will better understand what they are reading in the Wayback Machine.

As an example, Politifact has investigated a claim included in a webpage that we archived. Our.news has matched this URL to the Politifact review which allowed us to provide a yellow context banner for Wayback Machine patrons. 

In a different case, we surfaced the discovery that a webpage is part of a disinformation campaign according to the researchers at Graphika and link to their research report

As a last example, the Internet Archive archived a Medium post that was subsequently removed based on a violation of their Covid-19 Content Policy.

As a library, our intention is to provide access to source material that might otherwise disappear but doing so with context prominently displayed.

We would like to acknowledge the hard work of the organizations we are building upon in order to provide context for archived web pages: FactCheck.org, Check Your Fact, Lead Stories, Politifact, Washington Post Fact-Checker, AP News Fact Check, USA Today Fact Check, Graphika, Stanford Internet Observatory, and Our.news.

We welcome feedback and suggestions about how to make the Wayback Machine better. 

The post Fact Checks and Context for Wayback Machine Pages appeared first on Internet Archive Blogs.

Library Leaders Forum: Digital Library Practices For a More Equal Society

26 oktober 2020 - 1:00pm

The Library Leaders Forum is an annual opportunity for the libraries community to come together and discuss the 21st-century library. This year’s virtual Forum ended last week with an inspiring session showcasing the impact of controlled digital lending. Let’s look back over some of the key moments from the session and the conference as a whole. 

During the final session, we were honored to present Michelle Wu with our Hero Award for her foundational work on controlled digital lending. COVID-19 demonstrated more than ever the power of this key practice in helping libraries reach vulnerable communities. As the election approaches, the emphasis was also on the role of digital access in supporting democracy. “Reliable access to information is the great equalizer,” Wu said in her acceptance speech. 

The power of digital tools was demonstrated further during the session with the grand reopening of Marygrove College Library. Despite the closure of the college, the library’s valuable collection of social justice scholarship has started a new life online. The materials are now freely available on our website, showcasing the power of digitization for preserving knowledge and expanding access. If you missed the session, you can watch the recording or read a full recap

The conference was packed with insight into the impact of controlled digital lending on libraries and the communities they serve. In our policy session, experts discussed how to build a healthy information ecosystem for the 21st Century. Our community session gave a platform to librarians, educators, and technologists who are developing next-generation library tools. 

The discussions showed a library community deeply committed to digital innovation and its potential for creating a more equal society. A key theme was how COVID-19 lockdowns have made librarians more aware of the necessity of digital lending. The practice, always useful in reaching communities who cannot access physical books, has been shown a powerful tool in emergency response. Practitioners also placed emphasis on the key role of digitization in archiving knowledge for future generations. 

However, it was clear that this is no time for complacency. Librarians face threats that would damage their ability to make knowledge accessible and preserve it for cultural posterity. A new lawsuit challenges their right to digitize collections and make them available to the public. Combined with an increasing lack of shelf space and spates of library closures, this could mean that many valuable collections end up in landfill. 

The community is determined to make sure that libraries stay “open” to all. To this end, we have launched the #EmpoweringLibraries campaign, which defends the right of libraries to own and lend digital books. Although the Forum has ended, the community will stay united through campaign activities. 

We’d like to say a huge thank you to everyone who took part and helped make the Library Leaders Forum a great success. Find out how you can stay connected and protect the key role of libraries in a democratic society here.

The post Library Leaders Forum: Digital Library Practices For a More Equal Society appeared first on Internet Archive Blogs.

RSVP to the Open Library 2020 Community Celebration

26 oktober 2020 - 3:20am

2020 has been a year of difficulties for all of us. Many schools, libraries, and families have had to adapt to unexpected closures and new norms.

At the Internet Archive, volunteers from the OpenLibrary.org community have been stepping up to meet the challenges of this new normal, to ensure that educators, parents, students, and researchers may continue to safely access the educational materials they rely on.

This Tuesday, October 27, at 11:30 am PDT, we invite you to tune-in and join us as we celebrate this year’s efforts, overcoming unprecedented challenges and growing as an open community.

RSVP: https://forms.gle/dNzLDPtZHsrhudUc7

During this online event, you’ll hear from members of the community as we:
* Announce our latest developments and their impacts
* Raise awareness about opportunities to participate
* Show a sneak-peek into our future: 2021

For more updates, consider following us on twitter: @openlibrary

The post RSVP to the Open Library 2020 Community Celebration appeared first on Internet Archive Blogs.

What Information Should we be Preserving in Filecoin?

22 oktober 2020 - 9:48pm

The folks at Protocol Labs love their rockets. And outerspace. And exploration.

So when Filecoin, their cryptocurrency-fueled decentralized storage network launched recently, it was no surprise they called it Filecoin Liftoff. In the payload of that Filecoin rocket are treasures from the Internet Archive:

For 15 years, LibriVox has harnessed a global army of volunteers, creating 14,200 free public domain audiobook projects in 100 different languages. Where else can you listen to Jules Verne’s 20,000 Leagues Under the Sea in French, Spanish, English, German or Dutch…for free? Now, phrases of Shakespeare, Poe, Joyce and Dante will be stored across the Filecoin mainnet, broken into packets to be reconstituted when needed—perhaps in a new century.

The same destiny awaits the home movies, stock footage, educational and amateur films in the public domain, lovingly curated by the Prelinger Archives founder, Rick Prelinger. He encourages creatives to download and reuse these videos, creating countless new works like this one by musician Jordan Paul:

Now filmmakers and connoisseurs can sleep easier, knowing that a new, distributed copy of those films lives in the Filecoin network, (along with the main copy and multiple backups in the Internet Archive’s repositories.)

So what’s next Filecoin explorers?

Today, Protocol Labs and the Internet Archive are happy to announce the Filecoin Archives, a new community project to curate, disseminate and preserve important open access information often at risk of being lost. You can get involved in so many ways: by nominating information to be stored, uploading it to the Internet Archive, preserving the data as a Filecoin node while earning Filecoin for sharing your storage capacity.

What information should we be preserving? Please tell us!

How about 166,000 public domain books (60 terabytes) from the Library of Congress? Including 2100 texts about Abraham Lincoln and slavery?

Or Open Access Journal articles? (The Internet Archive has collected 9.1 million of them.)

It takes a host of global voices with diverse viewpoints to ensure that humanity’s most precious knowledge is represented online and preserved. So we need to hear from you. What open access information or datasets are you interested in preserving?

Between now and November 5, please send us your ideas and vote on the others. We will gather your suggestions, add our own, and publish the list from which we will select information to preserve across a global network of Filecoin nodes.

How to send us your suggestions 

Look for the tweet from @JuanBenet– reply to it with:

  • The Name of the Dataset.
  • The size in GB or TB.
  • An HTTP or @IPFS link to the data.
  • Why it matters.
  • #FilecoinArchives

Bonus points if the data is already stored in the Internet Archive or if you upload it there. Vote for ideas by retweeting them and please help us spread the word!

Juan Benet presents his early vision at the 2016 Decentralized Web Summit at the Internet Archive in San Francisco.

In 2015, a young developer named Juan Benet wandered into the Internet Archive headquarters. He painted a picture of a decentralized stack, something he now calls Web3, where the storage, transport and other layers would be distributed across many machines. Together with the DWeb community, we have imagined a web with our values written into the code: values such as privacy, security, reliability, and control over one’s own identity.  With the launch of Filecoin’s mainnet, a piece of that new web is perhaps within reach. 

Now it’s up to us to make sure the payload includes humanity’s most important knowledge.

The post What Information Should we be Preserving in Filecoin? appeared first on Internet Archive Blogs.

Library Leaders Forum Explores Impact of Controlled Digital Lending

22 oktober 2020 - 2:00pm

The third and final session of the 2020 Library Leaders Forum wrapped up Tuesday with a focus on the impact of Controlled Digital Lending on communities to provide broader access to knowledge. A full recording of the session is now available online.

Michelle Wu was honored with the Internet Archive Hero Award for her vision in developing the legal concept behind CDL. In her remarks, the attorney and law librarian shared her thoughts on the development and future of the lending practice. Wu does not see the theory that she designed 20 years ago as revolutionary, but rather a logical application of copyright law that allows libraries to fulfill their mission.

Despite current legal challenges, Wu predicts CDL can continue if libraries make themselves and their users heard.

“We must make sure that the public interests served are fully described, visible and clear to lawmakers and courts at the time they make their decisions,” Wu said. “If we do that, I believe the public interest will prevail and CDL will survive.”

The pandemic has underscored the need for digital access to materials and changed attitudes about CDL among libraries that had previously been risk averse to the practice, Wu said.  

“The closing of our libraries due to COVID has changed that mindset permanently,” Wu said. “It showed how the desire to avoid risk resulted in the actual and widespread harm to populations, depriving them of content at a time when access was more important than ever.”

Because of the pandemic, libraries are now empowered to try innovative practices to serve their patrons.

“With this new heightened awareness, I think the future of access is brighter,” Wu said. “Not only do I think CDL will flourish, but there seems to be very real chance that libraries will more aggressively fight to regain some of the public interest benefits of copyright that they’ve lost over the years.”

In the future, Wu maintained that CDL can ensure a balance for full and equal access to knowledge for every person.

“Reliable access to information is the great equalizer,” Wu said. “Information shapes each of us, and lack of it is part of what increases our divide.”

(A complete profile of Wu’s work can be found here.)

The event also included the virtual ribbon cutting ceremony announcing the reopening of the Marygrove College Library. The Internet Library now houses its 70,000-volume library online, and has preserved the physical copies, after the institution closed the campus in 2019 and donated its entire collection for digitization. The move preserves books that reflect the college’s rich history of social justice and education programs that largely served women, African Americans and low-income students in Detroit.

“The knowledge that [the books] would still be available and still be utilized just keeps us going as we wrap up the college,” said Marygrove President Elizabeth Burns at the Forum. “It’s a sad, sad time, but it is also a time where we know the impact of the college will continue…It’s a very tangible measure of Marygrove for the future.”

Chris Freeland, director of Open Libraries at the Internet Archive, moderated a panel with Marygrove librarian Mary Kickham-Samy, Mike Hawthorne, a librarian at nearby Wayne State University, and Brenda Bryant, dean and director of Marygrove’s social justice program, to talk about the transformation of the library into a digital format.

“It’s exciting! I’m thrilled that it won’t be in just one small corner,” said Bryant of the library’s move online and value to scholars. Bryant built the nation’s first Master of Arts program in social justice at Marygrove and considered the library one of the best kept secrets on campus. “Like my activist friend Elena Herrada [said], the collection was important because in Detroit, reading is an act of resistance.” 

For more about Marygrove’s story, read our online profile.

The post Library Leaders Forum Explores Impact of Controlled Digital Lending appeared first on Internet Archive Blogs.

Want Some Terabytes from the Internet Archive to Play With?

21 oktober 2020 - 9:58pm

There are many computer science projects, decentralized storage, and digital humanties projects looking for data to play with. You came to the right place– the Internet Archive offers cultural information available to web users and dataminers alike.

While many of our collections have rights issues to them so require agreements and conversation, there are many that are openly available for public, bulk downloading.

Here are 3 collections, one of movies, another of audio books, and a third are scanned public domain books from the Library of Congress. If you have a macintosh or linux machine, you can use those to run these command lines. If you run each for a little while you can get just a few of the items (so you do not need to download terabytes).

These items are also available via bittorrent, but we find the Internet Archive command line tool is really helpful for this kind of thing:

$ curl -LOs https://archive.org/download/ia-pex/ia
$ chmod +x ia
$ ./ia download –search=”collection:prelinger” #17TB of public domain movies
$ ./ia download –search=”collection:librivoxaudio” #20TB of public domain audiobooks
$ ./ia download –search=”collection:library_of_congress” #166,000 public domain books from the Library of Congress (60TB)

Here is a way to figure out how much data is in each:

apt-get install jq > /dev/null
./ia search “collection:library_of_congress” -f item_size | jq -r .item_size | paste -sd+ – | bc | numfmt –grouping
./ia search “collection:librivoxaudio” -f item_size | jq -r .item_size | paste -sd+ – | bc | numfmt –grouping
./ia search “collection:prelinger” -f item_size | jq -r .item_size | paste -sd+ – | bc | numfmt –grouping

Sorry to say we do not yet have a support group for people using these tools or finding out what data is available, so for the time being you are pretty much on your own.

The post Want Some Terabytes from the Internet Archive to Play With? appeared first on Internet Archive Blogs.

Michelle Wu Receives Internet Archive Hero Award for Establishing the Legal Basis for Controlled Digital Lending

20 oktober 2020 - 8:55pm
Michelle Wu, Internet Archive Hero Award 2020 recipient

Michelle Wu is leading libraries to think and act in new ways to fulfill their missions.

For nearly two decades, she has advocated for preserving and expanding access to materials by responsibly digitizing collections. Using her expertise as an attorney, law librarian and professor, Wu crafted the legal theory behind Controlled Digital Lending (CDL) and has dedicated much of her career to showing libraries how to put the concept into practice.

To honor her innovative and tireless work, Wu has been named the recipient of the 2020 Internet Archive Hero Award. The annual award recognizes those who have exhibited leadership in making information available for digital learners all over the world. Past recipients have included Phillips Academy, the Biodiversity Heritage Library, and the Grateful Dead. Michelle received the award during the Library Leaders Forum final session on October 20.

“Michelle Wu was ahead of her time in understanding the transition to the digital era and brought library lending into our new landscape,” said Brewster Kahle, founder of the Internet Archive.

“Not only did Michelle see a problem coming, she did something about it,” Kahle says. “It’s a combination of being both a visionary on how the world could work and then making concrete steps to get us there.”

With library buildings closed now for safety, the demand for digital materials has grown. The pandemic magnifies the importance of using CDL as a strategy to expand services to the public, says Pamela Samuelson, a distinguished professor of law and information management at the University of California, Berkeley, who admires Wu’s insights as a scholar and librarian.

“She set the example and made people feel comfortable with a concept that was initially a little bit questionable,” says Samuelson. In her copyright classes, Samuelson now draws on Wu’s work to inform her students.

“Michelle’s articles explaining the concept have been very useful for students to have not just the reader’s perspective, or law student’s perspective, but how librarians are really taking the challenge of the digital age,” Samuelson says. “They are making good things happen to carry on the grand tradition of libraries to facilitate as much access as lawfully possible to the public they serve.”

Looking back on her career, Wu says she sort of fell into law. She abandoned plans for medical school after helping her roommate at the University of California San Diego study for the Law School Admission Test. Fascinated with the logic puzzles, she took the LSAT on a whim and did well enough to get a scholarship.

“I found I loved the theory of the law, looking at issues from all sorts of angles and finding a path through,” says Wu, who enrolled at the California Western School of Law and worked part-time at the San Diego County Law Library. She soon realized that the adversarial nature of the legal process didn’t suit how she viewed the law. Law librarianship was a better match, one grounded in collaboration and a commitment to using legal knowledge to educate and assist users in finding meaningful solutions to their legal problems . A year after earning her J.D., Wu got her master’s degree in librarianship with a certificate in law librarianship at the University of Washington.

She landed her first job at George Washington University Law School Library. In 2001, she was hired by the University of Houston School of Law. It was there, following the massive destruction of the school’s library due to Tropical Storm Allison, that Wu focused on the need to protect materials through digitization.

Wu says she began to wonder: “Is there a better way for libraries to prepare society for a world in which there are a growing number of natural disasters?” she recalls. “There are so many risks to our collections, and society depends on long-term access for this information,” Wu says.

Wu developed the theory for a digitization program designed with copyright in mind. What came to be known as CDL, she says, strikes a balance between the interest of the users and copyright owners. A library can lend out only the number of copies that it has legitimately acquired, though the copy can be any format.   The flexibility in format facilitates  more effective access for a wide variety of users, including those  who live remotely or have trouble physically coming to a library building, while also ensuring the preservation of content in situations like natural disasters.

After Houston, Wu worked at the Hofstra School of Law and Georgetown University Law Center. As both a library director and law professor, Wu says she has been well-positioned to advocate for CDL and reason with the skeptics.

 “I haven’t heard a lot of substantive objections. I have heard fear, which is common and understandable anytime you are changing the status quo, but it is something that must be overcome for advancement.” says Wu. “In talking with others about CDL, I  focus on what CDL is and what it is intended to accomplish, which pushes people to engage deeply instead of rejecting the idea out of fear. From my perspective, CDL  is the purest form of balance in copyright that you are going to find in a world of technology, and that balance is difficult to deny when you examine CDL in detail.”

Kyle K. Courtney,  the copyright advisor and program manager at the Harvard Library Office for Scholarly Communication, says from the first time he met Wu, he was inspired by her ideas and willingness to challenge norms. Her research was a major influence on Courtney’s work and career. Together, they co-authored a position statement on CDL.

“It is great to meet your heroes sometimes — and even better to be able to work with them side by side,” says Courtney. “She is not a theoretical scholar. This is what’s awesome: She puts the cutting-edge CDL copyright system to work. That’s why she’s a trailblazer in both words and action, putting libraries at the forefront in our field.”

Wu’s leadership has helped advance the collaborative work of libraries and enabled there to be  more transparency in sharing information, says Courtney. He and Wu have presented on CDL at several conferences and discussed the concept with Congressional staff on Capitol Hill last year.

“She is one of the hardest working members of the library field I know,” Courtney says. “She’s oriented toward practical results and addresses 21st century challenges in multiple environments – public, private and academic. She is a person of remarkable integrity.”

Courtney says Wu’s recognition showcases what leaders in librarianship should aspire to: a successful record of progressive scholarship,  influence on the next generation of librarians and a legacy of hard work that reflects an enthusiasm for libraries.

Sharing the story of CDL on Capitol Hill, Lila Bailey, policy counsel for the Internet Archive, says she was struck by Wu’s ability to connect with staffers. “Michelle explains things in such a clear, intuitive, practical way,” says Bailey, who also has collaborated with Wu on research. “She’s so competent and conscientious.”

Wu has been committed to spreading her knowledge of both academic and practical aspects of the CDL to librarians and policymakers across the country. “She is somebody who came up with a legal theory and spent her career creating a proof of concept for why this is important,” Bailey says. “The Internet Archive sets this very ambitious vision of universal access to all knowledge then it tries to live up to the vision. Michelle embodies this ethos of the Internet Archive to be the change you want to see in the world.”

In June, Wu retired from academia, but she continues to research and mentor emerging librarians. Too often, (outside of the sciences) academia gives more weight to the risk in innovation instead of imagining the opportunities that creative problem-solving can provide, but Wu says that attitude doesn’t serve the public in the best way.

“We can’t sit back and expect everyone automatically to understand the importance of libraries long term. We have to stand up for what we believe, advocate for it, and find solutions that better serve society in an ever-changing world.” Wu says.

The post Michelle Wu Receives Internet Archive Hero Award for Establishing the Legal Basis for Controlled Digital Lending appeared first on Internet Archive Blogs.

Digitization Saves Marygrove College Library After Closure

20 oktober 2020 - 6:45pm

When Marygrove College in Detroit decided to close its doors in 2019 due to financial pressures, the first question on the minds of many community members was: what about the library?  Today, the entire Marygrove College community is celebrating the reopening of the Marygrove College Library in partnership with the Internet Archive.

Valerie Deering, Marygrove College Class of 1972, in the closed Marygrove College Library stacks.

Marygrove College’s roots go back to 1905 when it was started by the Sisters, Servants of the Immaculate Heart of Mary, a progressive Catholic order known for its commitment to social justice. Founded as a women’s institution, it became co-ed and predominantly African American over time, changing with the demographics of its neighborhood in northwest Detroit.

The liberal arts college, which typically had an enrollment of less than 1,000, attracted students interested in teacher education and social work programs, as well as English, history, philosophy and religious studies. The college offered graduate programs and some alumni went on to become physicians, lawyers and scientists.

True to its mission, Marygrove often served students from marginalized communities with limited means. Changes in access to federal Pell grants hurt the institution’s finances, and enrollment dwindled in recent years.

“The college was deeply in debt. Like many small colleges, institutional scholarships don’t pay the bills. The school was borrowing to make payroll. It was not a good picture,” says Marygrove President Elizabeth Burns. “With great sorrow, the board voted in summer 2017 to close undergraduate programs.”

The institution tried to survive by offering only graduate programs – many online. But that model proved to be unsustainable. In December of 2019, Marygrove closed its doors for good.

“It was very difficult,” says Frank Rashid, who taught English at the college for 37 years and lives within a mile of the campus. “It was a great place to teach. Despite our size and obscurity, we had a strong faculty and great students.”

As the college emptied its buildings, the fate of Marygrove’s beloved library was up in the air.

Marygrove’s solution: Donate the entire library to the Internet Archive for digitization and preservation.

As the college emptied its buildings, the fate of Marygrove’s beloved library was up in the air. No other library was able to house the entire collection, which included more than 70,000 books and 3,000 journals, in addition to microfilm, maps, visual media, and more. The college explored selling the books, but buyers were only interested in portions of the collection. Even disposing of the library content would cost thousands of dollars that the college couldn’t afford.

Marygrove’s solution: Donate the entire library to the Internet Archive for digitization and preservation.

“We were able to preserve the entire collection that we had built over the decades and make it available to everyone,” Burns says.

The board and alumni, while sad to see the college close, were supportive of the decision.

“There was a sense that all was not lost,” Burns says. “The legacy of the collection will be available for ongoing education. That really helped ease the pain of the transition.”

The library had a rich collection of books in history (particularly primary sources on local Detroit studies and Michigan), English, philosophy, religious studies, social work, political science, economics, psychology, business and social justice.

“The library was the best kept secret at Marygrove,” says Brenda Bryant, who started the nation’s first master’s degree program in social justice at the college 20 years ago. While the closure of the building was heartbreaking, she says having the collection digitized provides access to its great array of nonfiction and fiction books (such as The God of Small Things by Arhundati Roy) , as well as films about social justice movements.

The God of Small Things by Arhundati Roy, showing the Marygrove College Library stamp on the title page.

Byrant says the college was ahead of its time in recognizing the importance of studying these issues. With racial equity, immigration and other social justice issues so relevant today, she hopes people will take the opportunity to read about the history of prior movements.

The value of the collection extends well beyond the Marygrove community. Librarians from Wayne State University, also located in Detroit, share an admiration for Marygrove’s collection and decision to digitize.

“Marygrove has been fundamental for Detroit in educating first-generation, low-income college students and providing high quality education to the community,” says Alexandra Sarkcozy, a liaison librarian for history at Wayne State. “The librarians built a robust academic collection and took beautiful care of it. I think it’s wonderful that it was able to be preserved.”

And, as Wayne State thinks about how to lend out its own digital materials, it may consider Controlled Digital Lending as a model, adds Sarkcozy, which is how the Marygrove collections are being made available to users.

Marygrove College Library materials packed for shipping, digitization and preservation by Internet Archive.

Using Controlled Digital Lending practices with the Marygrove collections—lending out a digital copy one at a time—felt like a responsible way to continue to provide access, says Burns. And rare materials that aren’t traditionally prioritized are not lost to history.

Rashid says he was initially reluctant to let go of the print materials, but realized that digital lending opened up the possibility of access around the globe. “We are trying to share resources with scholars and students elsewhere,” says Rashid, noting it also has the additional convenience of researchers being able to look up information from home.

The Archive hired local help to pack up the Marygrove books, load them onto trucks, and transport them to centers for storage and scanning. The empty library was repurposed as a lecture hall, sports facility and cafeteria for a new high school that now operates on the campus.

Mary Kickham-Samy served as the director of the library at Marygrove from 2017 until its closure in December 2019. She was glad to see the collection donated intact and thinks alumni, in particular, will enjoy browsing through the library. “It’s beautiful the way Internet Archive has captured the materials…It’s just a win-win situation,” said Kickham-Samy, who is grateful that community members and researchers everywhere will now have access to the collection.

Valerie Deering using the Marygrove College Library collection at Internet Archive in the former physical library.

“When I heard Marygrove was going to be closing, it broke my heart,” said Valerie Deering, a poet and 1972 graduate of Marygrove. Deering didn’t fully realize what it would mean to digitize the library until she started browsing the collection online. “Actually seeing it now—this was a stroke of genius. This Internet library stuff is a pretty good idea.”

####

The post Digitization Saves Marygrove College Library After Closure appeared first on Internet Archive Blogs.

Advertising powers the Web. What if it just doesn’t work?

19 oktober 2020 - 7:20pm
On October 14, the Internet Archive presented a book talk with author Tim Hwang, NYT Tech Reporter, Kashmir Hill, and technologist, Desigan Chinniah, discussing Hwang’s new book is “Subprime Attention Crisis.” Is the Ad-Tech model powering the Internet really just the next financial bubble?

That is the question at the heart of a significant new book by Internet researcher Tim HwangSubprime Attention Crisis: Advertising and the Time Bomb at the Heart of the Internet. If you don’t already know Tim, he’s is a polymath: former Google AI policy wonk, lawyer, polemicist. In other words, just the kind of thinker we think you should know. Watch the video of a virtual book event with Tim here:

Subprime Attention Crisis makes the case that the core advertising model driving Google, Facebook, and many of the most powerful companies on the internet is—at its heart—a multibillion dollar financial bubble. Drawing parallels to the 2008 subprime mortgage crisis, Tim shines a spotlight on the lack of transparency, flawed incentives, and outright fraud that keep this machine running.

On October 14, the Internet Archive hosted a talk with the author and New York Times technology reporter Kashmir Hill. Their discussion tackled:

  • Why data-driven, online advertising may be much, much less effective than it looks
  • The long-term impact of the COVID-19 recession on the media and online ads
  • Whether or not the giants of Big Tech are already “too big to fail”

This discussion focused not only on the problems of advertising, but also on the future, and how we might be able to transition to a better, more financially robust internet. Joining the discussion was Desigan Chinniah, who co-leads Grant for the Web—a $100 million fund launched by Coil, Mozilla, and Creative Commons to spur open standards and new economic models for the web beyond advertising.

NOTE: We urge you to purchase a copy of Tim’s new book, Subprime Attention Crisis, via our local bookseller, The Booksmith. The first 50 purchasers will receive an autographed copy.

The post Advertising powers the Web. What if it just doesn’t work? appeared first on Internet Archive Blogs.

Library Leaders Forum: how to empower communities affected by COVID-19

19 oktober 2020 - 2:00pm

This year’s virtual Library Leaders Forum closes on Tuesday, following three weeks of inspiring discussion about the future of libraries in the digital age. The final session will focus on the impact of controlled digital lending on communities, particularly those affected by COVID-19. 

In last week’s session, we heard from librarians on the frontline of the COVID-19 response. Panelists shared how controlled digital lending has empowered libraries to get vital resources to those in need, despite lockdowns. “We were aware of [controlled digital lending] beforehand, but this pandemic has made us acutely aware of the need and opportunity,” said Stanford University’s chief technology strategist Tom Cramer. If you missed it, you can read a detailed recap of the session or watch the full recording

The session demonstrated the power of digital tools for reaching marginalized communities in lockdown and beyond. We were therefore pleased to announce that Internet Archive is joining Project ReShare, a group of organizations developing an open-source resource sharing platform for libraries. Resource sharing, like controlled digital lending, has the power to break down the access barriers associated with commercial platforms. 

The next session will focus on the impact that controlled digital lending is having on libraries and the communities they serve. Internet Archive founder and digital librarian Brewster Kahle will present the Internet Archive Hero Award to Michelle Wu, the visionary behind the practice. We’ll learn what inspired Michelle and how her work has empowered libraries during the current pandemic. There’s still time to register for free

We also have a very special event taking place during the session to which everyone is invited. Join us for the grand reopening of Marygrove College Library and find out how digitization saved a valuable archive from being split up and lost. The event will help place the Forum’s discussions in a real-world context by showing the impact of controlled digital lending on one African American community. It will also explore the power of digitization for preserving key elements of our cultural heritage. Registration is free for this special event.

The Library Leaders Forum may be drawing to a close, but the library community can stay connected through the #EmpoweringLibraries campaign. The campaign builds on the work of the Forum by raising awareness of the positive impact of controlled digital lending. We hope the community will unite to protect this key library practice and make knowledge accessible for all.

The post Library Leaders Forum: how to empower communities affected by COVID-19 appeared first on Internet Archive Blogs.

Internet Archive to Celebrate the Grand Reopening of the Marygrove College Library

19 oktober 2020 - 4:03am

Join us this Tuesday, October 20, at the final session of the Library Leaders Forum for a celebration of the reopening of the Marygrove College Library. Find out how digitization saved a valuable archive and preserved a community’s cultural heritage. RSVP here.

The post Internet Archive to Celebrate the Grand Reopening of the Marygrove College Library appeared first on Internet Archive Blogs.

Library Leaders Forum Highlights Community of Practice Supporting Controlled Digital Lending

15 oktober 2020 - 2:00pm

For many librarians, the global pandemic has pushed Controlled Digital Lending from sounding like a promising idea to becoming an important way of serving their patrons. Unable to physically check out books, a growing number of institutions have embraced CDL as a safe way to connect their readers with needed materials.

Librarians, educators, and technologists discussed the value and challenge of CDL for their communities at the second session of the 2020 Library Leaders Forum held online October 13. Video of the full session is now available.

“The Internet Archive has been operating a Controlled Digital Lending environment for more than nine years and we now have more than 80 libraries along with us,” said Chris Freeland, director of Open Libraries, who moderated the panel. “We are really thrilled. There is strength in community.”

With limited access to their collections during COVID-19, librarians on the frontlines shared their frustration getting digital materials to remote learners. Many publishers were willing to give free access to their content in the spring, but that didn’t last, said Tucker Taylor, head of circulation at University of South Carolina.

“We have a large textbook collection that we spent a lot of student tuition money and our tax dollars on. We wanted to continue to provide access to that,” said Taylor, noting that vendors refused to sell the library ebooks.

The library began to build its own ebook platform and ended up partnering with HathiTrust, a membership-based digital library. It provided emergency access to books the library owned in print — one book in, one book out, said Taylor.

“We are team players. We wanted to do the right thing,” said Taylor of the controlled lending practice — a less than perfect solution, but a way to get the content the library owns in the hands of students in need. “I’m a librarian. I want to check things in and out. So, it feels reasonable to me that I should be able to do online checkout.”

At Stanford University, Tom Cramer, chief technology strategist, assistant university librarian and director of digital library system and services, said the campus closure in March left students and researchers without access to millions of books in the library stacks.

“That’s why we are in Controlled Digital Lending. We were aware of it beforehand, but this pandemic has made us acutely aware of the need and opportunity,” said Cramer, who suggested libraries have been too conservative about copyright law and the exceptions it provides libraries to better serve their patrons.

The panelists also mentioned how CDL can allow libraries to offer books that are out of print to the public, access to readers with disabilities and fragile collections that cannot circulate. It’s also an environmentally friendly practice that keeps items from having to be physically shipped for interlibrary loans. 

With the current system, the needs of patrons are not being met and libraries should share resources to develop scalable solutions, said Jill Morris, executive director of Pennsylvania Academic Library Consortium, Inc. She heads the steering committee for Project ReShare (which the Internet Archive recently joined) that is working on innovative open source tools for libraries.  

Another member of the project, Sebastian Hammer, co-founder and president of Index Data, spoke on the panel about the promise of technology in helping libraries improve services to patrons. Cramer of Stanford suggested interoperability was a high priority in creating a robust system for the future and the group agreed that authors and publishers should also be part of the conversation.

Collaboration is key, said Lisa Petrides, founder and chief executive officer of the Institute for the Study of Knowledge Management in Education.

“We are trying to change how a system works,” she says, which involves working across all stakeholder groups and changing policy. “It’s about access and equity at its core.”

The post Library Leaders Forum Highlights Community of Practice Supporting Controlled Digital Lending appeared first on Internet Archive Blogs.

Pagina's