U bent hier


TV news fact-checked: health care & more + this press briefing will not be televised

Internet Archive - 23 juni 2017 - 3:13pm

by Katie Dahl and Nancy Watzman

This week the Senate released its version of health care, so to mark the occasion we offer a trio of recent health care fact checks from The Washington Post‘s Fact Checker. Other fact-checking highlights include: a claim that Saudia Arabia has been spending money on Trump hotels (true, says PolitiFact) and Ivanka Trump asserts American workers have a skill gap (also true, reports Politifact).

But before we present these fact-checks, we pause for a moment to present this commentary from CNN’s Jim Acosta on the White House’s refusal to allow cameras in a growing number of press briefings: “That wouldn’t be tolerated in city council meetings, or at a governor’s press conference,” he noted. “And here we have the representative of the president of the United States saying no you can’t cover it that way….it’s like we’re not even covering a White House anymore…it’s like we’re just covering bad reality television, is what it feels like now.”

Claim: 1.8 million jobs will be lost as a result of the AHCA (two Pinocchios)

Earlier this month, Rep. Nancy Pelosi, D., Calif., said, “Americans will lose their health coverage because of his proposal. And it is a job loser. Estimated to be 1.8 million jobs lost. Donald Trump is a job loser.”

Glenn Kessler reported for the Washington Post’s Fact Checker: “We often warn readers to be wary of job claims made by politicians based on think-tank studies. This is a case in point. Pelosi was careful to say ‘estimated,’ but two groups of researchers, using apparently the same economic model, came up with different estimates of jobs losses under the AHCA by 2022 – 1.8 million and 413,000.”

Claim: the reconciliation process will be used for the AHCA (upside down Pinocchio or flip-flop)

At a recent press briefing, Sen. Mitch McConnell, R., Ky., described the upcoming legislative process for the American Health Care Act, “Unfortunately, it will have to be a Republicans-only exercise. But we’re working hard to get there.”

Kessler responded that “McConnell’s position has changed, even though he will not acknowledge it. He was against the reconciliation process for health care in 2010; he has embraced it now. He was against secrecy and closed-door dealmaking before; he now oversees the most secretive health-care bill process ever. And he was against voting on a bill that was broadly unpopular — and now he is pushing for a bill even more unpopular than the ACA in 2010.”

Claim: insurers are leaving the health care exchanges because of Obamacare (three Pinocchios)

President Donald Trump talked with Republican senators about health care, saying among other claims, “Insurers are fleeing the market. Last week it was announced that one of the largest insurers is pulling out of Ohio — the great state of Ohio.”

 Kessler wrote that Trump “ignores that many say they are exiting the business because of uncertainty created by the Trump administration, in particular whether it will continue to pay ‘cost-sharing reductions’ to insurance companies. These payments help reduce co-pays and deductibles for low-income patients on the exchanges. Without those subsidies, insurance companies have to foot more of the bill.” 

Claim: Saudi Arabia is spending big on Trump Hotels (mostly true)

The attorney general for the District of Columbia, Karl Racine, said at a recent press conference that “The Kingdom of Saudi Arabia, whose government has important business and policy before the president of the United States, has already spent hundreds of thousands of dollars at the Trump International Hotel.”

Smitha Rajan reported for PolitiFact, “The Foreign Agent Registration Act report mentions at least one filing which clearly shows that the Saudi government spent $270,000 at the Trump International Hotel for lodging and boarding expenses between October 2016 and March 2017. It’s not clear whether the entire expenses were paid before or after Trump became president. Our research showed it was some of  both.”

Claim: there are 6 million job openings but workers don’t have the skills needed (true)

During a recent interview on Fox & Friends, Ivanka Trump, assistant to the president and daughter of the president, said “There are 6 million available American jobs, so we’re constantly hearing from CEO’s that they have job openings, but they don’t have workers with the skill set they need to fill those jobs.”

For PolitiFact, Louis Jacobson rated her claim “true,” reporting “The number she cites is correct, and she’s right to say that the skills gap plays a role. Economists warn against overestimating the role played by the skills gap in all 6 million job openings, both because other factors play a role (such as the image gap) and because the skills barriers posed are often more modest than having to earn an academic degree or to obtain specialized training.”

Connect with Internet Archive at ALA 2017—Chicago

Internet Archive - 19 juni 2017 - 12:34am

Come meet Internet Archive Founder, Brewster Kahle and Director of Partnerships, Wendy Hanamura at ALA Annual 2017 in Chicago.

Saturday, June 24

10:30-11:30 a.m.  Making your library a digital library by 2020

  • Where:  McCormick Place W194b
  • Who:     Brewster Kahle, Founder & Digital Librarian and Wendy Hanamura, Director of Partnerships

Description:  Come hear the Internet Archive team discuss OpenLibraries—a project that will enable every US library to become a more digital library. Working with library partners and organizations serving the print disabled, the Internet Archive proposes bringing 4 million books online, through purchase or digitization, starting with the century of books missing from our digital shelves. Our plan includes at-scale circulation of these e-books, enabling libraries owning the physical works to lend digital copies to their patrons. This will enable thousands of libraries to unlock their analog collections for a new generation of learners, enabling free, long-term, public access to knowledge.

Semifinalist in MacArthur Foundation’s 100&Change:  This Internet Archive project has been selected as one of the eight semifinalists in the 100&Change MacArthur Foundation Challenge which will provide $100 million over five years to an organization trying to solve one of the world’s toughest problems.  In our case: providing free access to the best knowledge available. Brewster and Wendy will describe the current state of project planning and listen to your feedback to ensure this project has transformative impact on the communities you serve.

Monday, June 26

1:30-2:15  Conversation Starter:  The Library of 2020 – Building A Collaborative Digital Collection of 4 Million Books

  • Where:  McCormick Place W194b
  • Who:     Wendy Hanamura, Dir. of Partnerships & Brewster Kahle, Digital Librarian

Description:  Even in this digital age, millions of books are not accessible to online learners and the print disabled. We in the library community haven’t been able to keep up with this digital demand, stymied by costs, eBook restrictions, and missing infrastructure. By making millions of books digitally available, we can unlock them for communities with severely limited or no access to those books. Because of distance, cost, time-constraints, or disability, people in many communities are too often unable to access physical books. Digital content is instantly available to people at a distance, at all hours, and with widely ranging physical abilities. Together with library and accessibility partners, the Internet Archive proposes bringing 4 million books online, through purchase or digitization. Our plan includes at-scale circulation of these eBooks, enabling libraries owning the physical works to lend digital copies. As 1 of 8 semifinalists for MacArthur’s 100&Change award, we seek your feedback. The goal: bringing libraries and learners 4 million eBooks, enabling the free, long-term, public access to knowledge.

NOTE:  To be live streamed via Facebook Live at https://www.facebook.com/internetnetarchive/

All times are in Central Daylight Time. For full schedule, visit https://www.eventscribe.com/2017/ALA-Annual/.  

TV news fact-checked: Donald and Ivanka Trump

Internet Archive - 14 juni 2017 - 10:23pm

By Katie Dahl

Our fact-checking partners spent time on the Trumps this week, covering Ivanka Trump’s claim about women in STEM occupations and the President’s claims about James Comey and Michael Flynn, record-setting nominations delays, how long it actually took to build the Hoover Dam and the Golden Gate Bridge, and his involvement in a new coal mine opening.

Claim: Trump said “let this go” referencing the FBI investigation of Michael Flynn (contradicted by Trump)

In his written testimony submitted to the Senate Intelligence Committee on June 7, former FBI director James Comey wrote that President Trump said, “I hope you can see your way clear to letting this go, to letting Flynn go. He is a good guy. I hope you can let this go.”

In a PolitiFact article reporting on conflicting claims between Comey and the White House, Lauren Carroll wrote that when asked about this allegation in a May 18 press conference, the President said, “No. No. Next question.”

Claim: Trump nominees faced ‘record-setting long’ delays (true)

In a comment at a cabinet meeting on June 12, President Trump said, “This is our first Cabinet meeting with the entire Cabinet present. The confirmation process has been record-setting long — and I mean record-setting long — with some of the finest people in our country being delayed and delayed and delayed.”

The Washington Post’s Fact Checker Glenn Kessler reported that Trump “faced unusually sustained opposition for a new president, including cloture votes demanded for 14 of his choices,” and gave the President their “Geppetto Checkmark” for correct statements.

Claim: women make up 47% of workforce and just 23% of STEM occupations (mostly true)

In an interview this week, Ivanka Trump said, “Women… represent 47 percent of the overall work force, we only make up 23 percent of STEM-related [science, technology, engineering, and mathematics] occupations.”

That is “not far off the mark,” according to PolitiFact’s Louis Jacobson. He went on to report, “Trump was correct about the percentage of the overall workforce that is female,” and “The report [2016 National Science Board and the National Science Foundation] found that in 2013, women represented 29 percent of individuals in science and engineering occupations. That’s higher than Trump’s 23 percent, although it supports her broader point — that women are underrepresented in STEM fields.”

Claim: Americans ‘built the Golden Gate Bridge in four years and the Hoover Dam in five’ (misleading)

In his weekly address on June 9, President Trump said, “we are the nation that built the Golden Gate Bridge in four years and the Hoover Dam in five. Now, it takes as much as a decade just to plan a major permit or a major infrastructure or anything even remotely major in our country, and that’s ridiculous and it’s going to change.”

Michelle Ye Hee Lee gave Trump “three Pinocchios” for this claim. She reported for the Washington Post’s Fact Checker, “Trump describes the construction of the Golden Gate Bridge and Hoover Dam as projects that were constructed over four or five years, unbound by the years of permitting and regulatory restrictions that current-day projects face. But Trump only focuses on the literal construction of the projects, and overlooks the many years of bureaucratic negotiating and regulating that took place leading up to the construction.”

Claim: Trump is putting miners back to work with the opening of a new coal mine (hard to believe)

In a speech in Cincinnati, Ohio on June 7, President Trump said, “Next week we’re opening a big coal mine. You know about that. One in Pennsylvania. It’s actually a new mine. That hadn’t happened in a long time, folks. But we’re putting the people and we’re putting the miners back to work.”

“Trump did not name the Pennsylvania mine,” reported Robert Farley for FactCheck.org, “and the White House did not respond to us. But these kinds of events are rare enough that it is clear he is referring to the June 8 grand opening of the Corsa Coal Company’s Acosta Deep Mine more than 60 miles southeast of Pittsburgh.

What did Trump’s presidency have to do with its opening? Nothing. Development of the Acosta mine began in September, two months before the presidential election.”

To receive the TV News Archive’s email newsletter, subscribe here.

AMA about OpenLibraries–our proposal for MacArthur’s 100&Change

Internet Archive - 12 juni 2017 - 1:49am

Live Chat on YouTube Live, Thursday, June 15 from 10-11:30 a.m. PT


Brewster Kahle, Founder and Digital Librarian
Wendy Hanamura, Director of Partnerships
John Gonzalez, Director of Engineering

What would it mean if you had easy online access to 4 million modern books–the equivalent of a great public or university library?  What would that mean for the print disabled and those unable to reach their public libraries? How would that change innovation and scholarship? In an era of misinformation, how can we tie information to the published works of humankind?

Those are some of the questions we’ve been asking ourselves at the Internet Archive as we hone our plans for Open Libraries–our proposal to the MacArthur Foundation’s 100&Change competition to tackle one of the world’s toughest problems. We are now one of eight semifinalists vying for $100 million grant to carry out our goal: democratizing access to knowledge by providing free, long-term access to a digital library of 4 million modern books. We call our project Open Libraries because we want to help every library in the nation to provide its members with digital access to its rich collections.

The Internet Archive, working with library and accessibility partners, has a plan to bring 4 million books online, through purchase or digitization, starting with the 20th century books missing from our digital shelves. Our plan includes at-scale circulation of these e-books, enabling libraries owning the physical works to lend digital copies to their patrons. Working with our accessibility partners, we will also make this collection available to the print disabled around the world.  And our team of curators will help make sure we create an inclusive, diverse collection of 20th century texts.

We now have the technology and legal frameworks to transform our library system by 2023 to provide more democratic access to knowledge–for library patrons, scholars, students and the print disabled.

We want to hear what you think.  Help us hone our plans, test our hypotheses, and dream big!

Ask us a question or post an idea in the comments below. We will answer them during our YouTube Live.  Or tweet us using #OpenLibrariesAMA.

TV news fact-checked: Trump, Pruitt, Gore, and Handel

Internet Archive - 8 juni 2017 - 9:35pm

By Katie Dahl

In this week’s roundup of fact-checked TV news, the term “travel ban” gets a final word from the president, new coal mining jobs numbers are questioned, Gore and Pruitt give competing claims about Paris Agreement target requirements, Trump supporters are polled for their approval of the Paris Agreement, two elements of the Iran Deal are clarified, and one of Trump’s arguments for privatizing the FAA gets a context check.

Claim: executive order is a “travel ban” (the president says it is)

In a tweet on June 5, President Donald Trump wrote: “People, the lawyers and the courts can call it whatever they want, but I am calling it what we need and what it is, a TRAVEL BAN!”

According to Miriam Valverde at PolitiFact, this statement ran counter to what “his spokesman, administration officials, lawyers, courts and others call it.” Among many examples collected by Valverde are three instances (1, 2, 3) of Secretary of Homeland Security John Kelly calling it a “travel pause” and twice saying the executive order “is not a travel ban.” The most recent interview was May 28. PolitiFact’s conclusion: “It’s a travel ban.”

Claim: we’ve added 50,000 coal mining jobs since last quarter, 7,000 since May (misleading spin)

In three TV interviews (1, 2, 3) with major networks on Sunday, June 4, Environmental Protection Agency Administrator Scott Pruitt made this claim: “We’ve had over 50,000 jobs since last quarter, coal jobs, mining jobs, created in this country. We had almost 7,000 mining and coal jobs created in the month of May alone.”

The Washington Post’s Fact Checker, Glenn Kessler, gave Pruitt a “four Pinocchio” rating for this claim, writing “the biggest problem with Pruitt’s statistic is that most of the gain in ‘mining’ jobs has nothing to do with coal. Most of the new jobs were in a subcategory called ‘support activities for mining,’ which accounted for more than 40,000 of the new jobs since October and more than 30,000 of the jobs since January.” For FactCheck.org, Eugene Kiely reported the same information and further that “BLS [Bureau of Labor Statistics] could not tell us how many of those jobs were related to coal mining, as opposed to gas, oil, metal ores and nonmetallic minerals. We do know, however, that most of those jobs support the gas and oil industries.”

Claim: US could change emissions targets under Paris Agreement without pulling out of it (Gore was right)

In dueling Sunday political talk show interviews, former Vice President Al Gore said of the Paris Agreement, “the requirements were voluntary. He [Trump] could have changed the requirements,” while EPA Administrator Pruitt, on another show, said “No, no, no. No, not under the agreement. Not under the agreement… You’re wrong” to Jake Tapper of CNN’s statement: “You can change those targets.” Pruitt went on to claim that the targets “can only be ratcheted up.”

For their SciCheck project, FactCheck.org’s Vanessa Schipani reported, “The Paris Agreement is voluntary. Countries aren’t penalized for failing to adhere to their proposed emissions cuts. So President Donald Trump could have ignored or changed the U.S. pledged emissions targets without withdrawing from the agreement.”

Claim: most Trump supporters wanted the US to stay in the Paris Agreement (mostly false)

In another interview on Sunday, Gore said, “A majority of President Trump’s supporters and voters wanted to stay in” the Paris Agreement.

For PolitiFact, John Kruzel reported that on “Gore’s central point, the poll [Yale-George Mason poll] found that among Trump voters, 47 percent wanted to participate in the Paris Agreement, compared to 28 percent who supported opting out, with a quarter expressing no opinion.

So, 47 percent support among Trump voters amounts to a plurality — not a majority, as Gore said.”

Claim: US flew $2 billion to Iran and Obama administration said it was used for terrorism (half true)

In a debate between Jon Ossoff and Karen Handel leading up to a special election for Georgia’s sixth congressional district later this month — the election resulted from Tom Price being tapped by the Trump administration to lead the Department of Health and Human Services — Handel made this claim. “Nearly $2 billion in cash was flown over to Iran, money that the Obama administration has admitted is being used for terrorists and to support further activities there.”

According to Jon Greenberg of PolitiFact, “The Iran deal focused on reducing Iran’s stockpiles of nuclear-grade material, but a key provision unlocked Iranian assets that had been frozen for decades. How much money was there is a matter of debate.” He went on to report that John Kerry, then Secretary of State in the Obama administration, appeared on TV and said “‘I think that some of it will end up in the hands of the IRGC (Islamic Revolutionary Guards Corps) or of other entities, some of which are labeled terrorists,’ Kerry said. ‘To some degree, I’m not going to sit here and tell you that every component of that can be prevented.'”

Claim: the Obama administration spent over $7 billion on the aviation system and failed (two Pinocchios)

In comments announcing a plan to privatize part of the federal air traffic control apparatus, President Trump said “the previous administration spent over $7 billion trying to upgrade the system and totally failed. Honestly, they didn’t know what the hell they were doing.”

Michelle Ye Hee Lee reported for the Washington Post’s Fact Checker that “Trump characterizes this program as an Obama-era error, but the planning for the massive overhaul began in 2000. Congress authorized the FAA to tackle these changes in 2003, and the Department of Transportation launched the NextGen program in January 2004… There have been delays and changes in the project, but high-priority projects have made progress.” She gave this claim “two Pinocchios.”

To receive the TV News Archive’s email newsletter, subscribe here.

Help Us Defend Net Neutrality!

Internet Archive - 7 juni 2017 - 1:15am

Please stand with the Internet Archive and over 50 allies in the effort to protect free speech by adding your name in support of net neutrality and writing to your Congressperson today. We have only 40 days left to stop a current FCC proposal that could upend the government’s prior commitment to net neutrality and seriously threaten free speech online. If you represent an organization, please consider participating in the movement on July 12 to inform the public on how to take action.

The end of net neutrality could be devastating to the Internet community at large in a multitude of ways. For example, relatively small organizations like the Archive don’t have the resources to negotiate special deals with ISPs, let alone pay new tolls in order to reach users who rely on our service to broadcast their voice. Many like us could be relegated to an Internet “slow lane” while bigger sites and closer partners of ISPs enjoy faster speeds.

We have fought against threats to a free and open Internet before and won. On September 10, 2014, we came together as hundreds of organizations and over 4 million people to support net neutrality and speak out against SOPA. In 2015, the FCC heard our voices and issued a landmark ruling called the “Open Internet Order” that declared the Internet a neutral public utility which couldn’t be censored or manipulated by corporate interests.

Fast forward to 2017. Only 18 months after the FCC reached a decision to protect our free speech, they have switched stances and are doing everything in their power to pass Docket 17-108, which would reverse the 2015 decision we worked so hard to achieve. The effort is led by a commissioner who was formerly a lawyer for one of the ISPs that has lobbied hardest against net neutrality and stands to directly benefit from this policy reversal. At our expense.

We need your help. Protect digital free speech and universal access to all knowledge. Please make your voice heard: send comments to the FCC, contact your Congressperson, and connect your organization with the July 12 campaign, and share these links with your friends.



Comcast’s Blocking and Un-Blocking of Archive.org – What We Know So Far

Internet Archive - 5 juni 2017 - 8:54pm

Comcast Internet users found themselves unable to access archive.org starting late Thursday afternoon due to Comcast blocking access to our site. The earliest time Comcast users reported problems was around 4:30 PM PST and access was restored around 6:15 AM the next day (a span of about 13 hrs 45 min).

Comcast informed us that the block was put into place due to detection of an apparent Xfinity-branded phishing page posted to archive.org by an uploader. According to Comcast, we had taken that page down promptly, but Comcast’s block was nevertheless implemented without notice on late Thursday afternoon. Hours after our reporting the blocking to friends at Comcast they diagnosed the issue, removed the block and restored their customers’ access to archive.org.

In addition to a significant number of archive.org users, some of our employees use Comcast for access and were unable to do some of their work during the block.  This was also reported on in Vice’s Motherboard.

We searched our communications for any reports from Comcast preceding the block and found only one email sent to us Thursday morning reporting a phishing page, which we took down promptly. Sent by an outside security company working for Comcast, the email did not mention any possibility of a block. The email and our removal of the item preceded the first known instance of the block by about eight hours.

This is the gist of what we know at this point. We continue to gather information and take this incident very seriously.

Dreaming of Semantic Audio Restoration at a Massive Scale

Internet Archive - 3 juni 2017 - 8:20pm

I believe we can do a fabulous job of bringing the music from the 78rpm era back to vibrant life if we really understand wear and if we could model the instruments and voices.

In other words, I believe we could reconstruct a performance by semantically modeling the noise and distortion we want to get rid of, as well as modeling the performer’s instruments.

To follow this reasoning—what if we knew we were examining a piano piece and knew what notes were being played on what kind of piano and exactly when and how hard for each note—we could take that information to make a reconstruction by playing it again and recording that version. This would be similar to what optical character recognition (OCR) does with images of pages with text—it knows the language and it figures out the words on the page and then makes a new page in a perfect font. In fact, with the OCR’ed text, you can change the font, make it bigger, and reflow the page to fit on a different device.

What if we OCR’ed the music? This might work well for the instrumental accompaniment, because then we would handle a voice, if any, differently. We could have a model of the singer’s voice based on not only this recording and other recordings of this song, but also all other recordings of that singer. With those models we could reconstruct the voice without any noise or distortion at all.

We would balance the reconstructed and the raw signals to maintain the subtle variations that make great performances.   This could also be done for context as sometimes digital filmmakers add in some scratched film effects.

So, there can be a wide variety of restoration tools if we make the jump into semantics and big data analysis.

The Great 78 Project will collect over 400,000 digitized 78rpm recordings publicly available, making it a rich data set to do large scale analysis. These transfers are being done with four different styli shapes and sizes at the same time, and all recorded at 96KHz/24bit lossless samples, and in stereo (even though the records are in mono, this provides more information about the contours of the groove). This means each groove has 8 different high-resolution representations of every 11 microns. Furthermore, there are often multiple copies of the same recording that would have been stamped and used differently. So, modeling the wear on the record and using that to reconstruct what would have been on the master may be possible.

Many important records from the 20th century, such as jazz, blues, and ragtime, have only a few performers on each, so modeling those performers, instruments, and performances is quite possible.  Analyzing whole corpuses is now easier with modern computers, which can provide insights beyond restoration as well as understand playing techniques that are not commonly understood.

If we build full semantic models of instruments, performers, and pieces of music, we could even create virtual performances that never existed.  Imagine a jazz performer virtually playing a song that had not been written in their lifetime. We could have different musician combinations, or singers performing with different cadences. Areas for experimentation abound once we cross the threshold of full corpus analysis and semantic modeling.

We hope the technical work done on this project will have a far-reaching effect on a full media type since the Great 78 Project will digitize and hold a large percentage of all 78rpm records ever produced from 1908 to 1950.  Therefore, any techniques that are built upon these recordings can be used to restore many many records.

Please dive in and have fun with a great era of music and sound.


(we get a sample every 11microns when digitizing the outer rim of a 78rpm record at 96KHz.   And given we now have 8 different readings of that, with 24bit resolution, we hopefully can get a good idea of the groove.   There are optical techniques that are very cool, but those have their own issues, I am told

10″ * 3.14 = 31.4″ circumference = 80cm/revolution

@ 78rpm:  60 seconds/min / 78revolutions/minute = .77 seconds / revolution

80cm/rev   / (.77sec/rev)  = 104cm/sec


104cm/sec / (96ksamples/sec) = 11microns )


TV news fact-checked: climate change edition

Internet Archive - 2 juni 2017 - 9:16pm

by Katie Dahl & Nancy Watzman

With President Donald Trump’s announcement on Thursday that the U.S. would pull out of the international Paris climate agreement dominating TV news screens, we devote this round up to the issue of climate change.

Global climate agreement news trending

As of Friday morning, reports on Trump’s decision to withdraw the U.S. from the Paris climate agreement was trending across TV news channels, driving out reports on investigations of Russian meddling in the 2016 elections and possible Trump campaign involvement. The one exception was MSNBC, where “Russia” was a top trending topic, while “Paris” was at the top of the list for other cable stations, according to the Television Explorer tool created by Kalev Leetaru, which draws on closed captioning from the TV News Archive to allow users to search news coverage. (The tool now incorporates recent TV news broadcasts, so general trends can be seen as the data rolls in, although for definitive results it is best to wait 24 hours to search.)

“Paris” was trending everywhere but MSNBC, where “Russia” was leading. Source: Television Explorer, TV News Archive

Claim: Paris Agreement would cause $3 trillion drop in US GDP (flawed study) 

Fact-checkers quickly analyzed Trump’s Rose Garden speech (full video available here) where he laid out his reasons for withdrawing from the agreement.  Among them: he said the “cost to the economy at this time would be close to $3 trillion in lost GDP.”

A team of reporters at FactCheck.org provided context. “That figure is for the year 2040 and for one scenario in a report that found a smaller impact under a different scenario. Another analysis estimated the potential economic impact of meeting the Paris Agreement emissions targets would be ‘modest’ and the cost of delaying action would be ‘high.'”

Similarly, PolitiFact’s Jon Greenberg wrote: “Take these statistics with a grain of salt… Yale professor Kenneth Gillingham said the NERA model tends to result in higher costs than other economic models. The study assumes certain hypothetical regulations, but ‘one could easily model other actions with much lower costs.'”

The Washington Post’s Fact Checkers, Glenn Kessler and Michelle Ye Hee Lee, reported his statistics are from a “study that was funded by the U.S. Chamber of Commerce and the American Council for Capital Formation, foes of the Paris Accord. So the figures must be viewed with a jaundiced eye.”

Of course Trump and his surrogates have made many claims in the past on TV news shows, which were fact-checked. Also worth a look: this compilation Mother Jones created last December of Trump’s statements over the years on different media (including TV news) about global warming.

Claim: the Paris Agreement is one-sided (needs context)

In April 2017, President Donald Trump decried the Paris agreement on climate as “one-sided… where the United States pays billions of dollars while China, Russia and India have contributed and will contribute nothing.”

Reporter Vanessa Schipan from FactCheck.org wrote that the “U.S. has promised to contribute $3 billion to this fund [Green Climate Fund]” and “China and India haven’t contributed to the Green Climate Fund… Russia hasn’t contributed any funds either, but it also hasn’t ratified the Paris Agreement or submitted an outline of what actions it will take…” She also reported “that, per capita, the U.S. emitted more greenhouse gases than China and India combined in 2015.”

Claim: China and India have no obligations under agreement until 2030 (four Pinocchios)

In a related statement on April 13, Environmental Protection Agency (EPA) Administrator Scott Pruitt said “China and India had no obligations under the agreement until 2030.”

The Washington Post’s Fact Checker, Glenn Kessler, reported “China, in its submission, said that, compared to 2005 levels, it would seek to cut its carbon emissions by 60 to 65 percent per unit of GDP by 2030. India said it would reduce its emissions per unit of economic output by 33 to 35 percent below 2005 by 2030… Note that both countries pledge to reach these goals by 2030, meaning they are taking steps now to meet their commitments.”

Claim: human activity, or carbon dioxide emissions, is not the primary contributor to global warming (science says, wrong)

In an interview on CNBC in March, EPA administrator Pruitt said “I would not agree that it’s [human activity or CO2] a primary contributor to the, to the global warming that we see.”

For FactCheck.org, Vanessa Schipani reported that “[S]cience says he’s wrong.” She wrote that “[a]ccording to the U.N.’s Intergovernmental Panel on Climate Change’s fifth assessment report, it is ‘extremely likely’ (at least 95 percent probable) that more than half of the observed temperature increase since the mid-2oth century is due to human, or anthropogenic, activities.”

Claim: scientists cannot precisely measure climate change (they can with different levels of certainty)

In a lengthy article for their SciCheck project, FactCheck.org’s Vanessa Schipani reviewed statements by several Trump administration officials on this question of whether we can measure climate change with precision and whether we can measure the human impact. Among those who have made this claim are EPA’s Scott Pruitt, Attorney General Jeff Sessions, Secretary of State Rex Tillerson, Interior Secretary Ryan Zinke, and Health and Human Services Secretary Tom Price. Schipani reported “scientists can measure that impact with varying levels of certainty and precision” by going through the science for the greenhouse effect, global warming to climate change, and measuring and predicting extreme weather.

To receive the TV News Archive’s email newsletter, subscribe here.

Your Content Needs a Metadata Strategy

Story Needle - 31 mei 2017 - 12:27pm

What’s your metadata strategy?  So few web publishers have an articulated metadata strategy that a skeptic may think I’ve made up the concept, and coined a new buzzword.  Yet almost a decade ago, Kristina Halvorson explicitly cited metadata strategy as one of “a number of content-related disciplines that deserve their own definition” in her seminal  A List Apart article, “The Discipline of Content Strategy”.   She also cites metadata strategy in her widely read book on content strategy.  It’s been nearly a decade since Kristina’s article, but the discipline of content strategy still hasn’t given metadata strategy the attention it deserves.

A content strategy, to have a sustained impact, needs a metadata strategy to back it up.  Without metadata strategy, content strategy can get stuck in a firefighting mode.  Many organizations keep making the same mistakes with their content, because they ask overwhelmed staff to track too many variables.  Metadata can liberate staff from checklists, by allowing IT systems to handle low level details that are important, but exhausting to deal with.  Staff may come and go, and their enthusiasm can wax and wane.  But metadata, like the Energizer bunny, keeps performing: it can keep the larger strategy on track. Metadata can deliver consistency to content operations, and can enhance how content is delivered to audiences.

A metadata strategy is a plan for how a publisher can leverage metadata to accomplish specific content goals.  It articulates what metadata publishers need for their content, how they will create that metadata, and most importantly, how both the publisher and audiences can utilize the metadata.  When metadata is an afterthought, publishers end up with content strategies that can’t be implemented, or are implemented poorly.

The Vaporware Problem: When you can’t implement your Plan

A content strategy may include many big ideas, but translating those ideas into practice can be the hardest part.  A strategy will be difficult to execute when its documentation and details are too much for operational teams to absorb and follow.  The group designing the content strategy may have done a thorough analysis of what’s needed.  They identified goals and metrics, modeled how content needs to fit together, and considered workflows and the editorial lifecycle.  But large content teams, especially when geographically distributed, can face difficulties implementing the strategy.  Documentation, emails and committees are unreliable ways to coordinate content on a large scale.  Instead, key decisions should be embedded into the tools the team uses wherever possible.  When their tools have encoded relevant decisions, teams can focus on accomplishing their goals, instead of following rules and checklists.

In the software industry, vaporware is a product concept that’s been announced, but not built. Plans that can’t be implemented are vaporware. Content strategies are sometimes conceived with limited consideration of how to implement them consistently.  When executing a content strategy, metadata is where the rubber hits the road.  It’s a key ingredient for turning plans into reality.  But first, publishers need to have the right metadata in place before they can use it to support their broader goals.

Effective large-scale content governance is impossible without effective metadata, especially administrative metadata.  Without a metadata strategy, publishers tend to rely on what their existing content systems offer them, instead of asking first what they want from their systems.  Your existing system may provide only some of the key metadata attributes you need to coordinate and manage your content. That metadata may be in a proprietary format, meaning it can’t be used by other systems. The default settings offered by your vendors’ products are likely not to provide the coordination and flexibility required.

Consider all the important information about your content that needs to be supported with metadata.  You need to know details about the history of the content (when it was created, last revised, reused from elsewhere, or scheduled for removal), where the content came from (author, approvers, licensing rights for photos, or location information for video recordings), and goals for the content (intended audiences, themes, or channels).  Those are just some of the metadata attributes content systems can use to manage routine reporting, tracking, and routing tasks, so web teams can focus on tasks of higher value.

If you have grander visions for your content, such as making your content “intelligent”, then having a metadata strategy becomes even more important.  Countless vendors are hawking products that claim to add AI to content.  Just remember—  Metadata is what makes content intelligent: ready for applications (user decisions), algorithms (machine decisions) and  analytics (assessment).  Don’t buy new products without first having your own metadata strategy in place.  Otherwise you’ll likely be stuck with the vendor’s proprietary vision and roadmap, instead of your own.

Lack of Strategy creates Stovepipe Systems

A different problem arises when a publisher tries to do many things with its content, but does so in a piecemeal manner.  Perhaps a big bold vision for a content strategy, embodied in a PowerPoint deck, gets tossed over to the IT department.  Various IT members consider what systems are needed to support different functionality.  Unless there is a metadata strategy in place, each system is likely to operate according to its own rules:

  • Content structuring relies on proprietary templates
  • Content management relies on proprietary CMS data fields
  • SEO relies on meta tags
  • Recommendations rely on page views and tags
  • Analytics rely on page titles and URLs
  • Digital assets rely on proprietary tags
  • Internal search uses keywords and not metadata
  • Navigation uses a CMS-defined custom taxonomy or folder structure
  • Screen interaction relies on custom JSON
  • Backend data relies on a custom data model.

Sadly such uncoordinated labeling of content is quite common.

Without a metadata strategy, each area of functionality is considered as a separate system.  IT staff then focus on systems integration: trying to get different systems to talk to each other.  In reality, they have a collection of stovepipe systems, where metadata descriptions aren’t shared across systems.  That’s because various systems use proprietary or custom metadata, instead of using common, standards-based metadata.  Stovepipe systems lack a shared language that allows interoperability.  Attributes that are defined by your CMS or other vendor system are hostage to that system.

Proprietary metadata is far less valuable than standards-based metadata.  Proprietary metadata can’t be shared easily with other systems and is hard or impossible to migrate if you change systems.  Proprietary metadata is a sunk cost that’s expensive to maintain, rather than being an investment that will have value for years to come. Unlike standards-based metadata, proprietary metadata is brittle — new requirements can mess up an existing integration configuration.

Metadata standards are like an operating system for your content.  They allow content to be used, managed and tracked across different applications.  Metadata standards create an ecosystem for content.  Metadata strategy asks: What kind of ecosystem do you want, and how are you going to develop it, so that your content is ready for any task?

Who is doing Metadata Strategy right?

Let’s look at how two well-known organizations are doing metadata strategy.  One example is current and news-worthy, while the other has a long backstory.


eBay decided that the proprietary metadata they used in their content wasn’t working, as it was preventing them from leveraging metadata to deliver better experiences for their customers. They embarked on a major program called the “Structured Data Initiative”, migrating their content to metadata based on the W3C web standard, schema.org.   Wall Street analysts have been following eBay’s metadata strategy closely over the past year, as it is expected to improve the profitability of the ecommerce giant. The adoption of metadata standards has allowed for a “more personal and discovery-based buying experience with highly tailored choices and unique selection”, according to eBay.  eBay is leveraging the metadata to work with new AI technologies to deliver a personalized homepage to each of its customers.   It is also leveraging the metadata in its conversational commerce product, the eBay ShopBot, which connects with Facebook Messenger.  eBay’s experience shows that a company shouldn’t try to adopt AI without first having a metadata strategy.

eBay’s strategy for structured data (metadata). Screenshot via eBay

Significantly, eBay’s metadata strategy adopts the W3C schema.org standard for their internal content management, in addition to using it for search engine consumers such as Google and Bing.  Plenty of publishers use schema.org for search engine purposes, but few have taken the next step like eBay to use it as the basis of their content operations.  eBay is also well positioned to take advantage of any new third party services that can consume their metadata.

Australian Government

From the earliest days of online content, the Australian government has been concerned with how metadata can improve online content availability. The Australian government isn’t a single publisher, but comprises a federation of many government websites run by different government organizations.  The governance challenges are enormous.  Fortunately, metadata standards can help coordinate diverse activity.  The AGLS metadata standard has been in use nearly 20 years to classify services provided by different organizations within the Australian government.

The AGLS metadata strategy is unique in a couple of ways.  First, it adopts an existing standard and builds upon it.  The government identified areas where existing standards didn’t offer attributes that were needed.  The government adopted the widely used Dublin Core metadata standard, but added some additional elements that were specific to their needs (for example, indicating the “jurisdiction” that the content relates to).  Starting from an existing standard, they extended it and got the W3C to recognize their extension.

Second, the AGLS strategy addresses implementation at different levels in different ways.  The metadata standard allow different publishers to describe their content consistently.  It ensures all published content is inter-operable.  Individual publishers, such as the state government of Victoria, have their own government website principles and requirements, but these mandate the use of the AGLS metadata standard.  The common standard has also promoted the availability of tools to implement the standard.  For example, Drupal, which is widely used for government websites in Australia, has a plugin that provides support for adding the metadata to content.  Currently, over 700 sites use the plugin.  But significantly, because AGLS is an open standard, it can work with any CMS, not just Drupal.  I’ve also seen a plugin for Joomla.

Australia’s example shows how content metadata isn’t an afterthought, but is a core part of content publishing.  A well-considered metadata strategy can provide benefits for many years.  Given its long history, AGLS is sure to continue to evolve to address new requirements.

Strategy focuses on the Value Metadata can offer

Occasionally, I encounter someone who warns of the “dangers” of “too much” metadata.  When I try to uncover the source of the perceived concern, I learn that the person thinks about metadata as a labor-intensive activity. They imagine they need to hand-create the metadata serially.  They think that metadata exists so they can hunt and search for specific documents. This sort of thinking is dated but still quite common.  It reflects how librarians and database administrators approached metadata in the past, as a tedious form of record keeping.  The purpose of metadata has evolved far beyond record keeping.  Metadata no longer is primarily about “findability,” powered by clicking labels and typing within form fields. It is now more about “discovery” — revealing relevant information through automation.  Leveraging metadata depends on understanding the range of uses for it.

When someone complains about too much metadata, it also signals to me that a metadata strategy is missing.  In many organizations, metadata is relegated to being an electronic checklist, instead of positioned as a valuable tool.   When that’s the case, metadata can seem overwhelming.  Organizations can have too much metadata when:

  • Too much of their metadata is incompatible, because different systems define content in different ways
  • Too much metadata is used for a single purpose, instead of serving multiple purposes.

Siloed thinking about metadata results in stovepipe systems. New metadata fields are created to address narrow needs, such as tracking or locating items for specific purposes.  Fields proliferate across various systems.  And everyone is confused how anything relates to anything else.

Strategic thinking about metadata considers how metadata can serve all the needs of the publisher, not just the needs of an individual team member or role.  When teams work together to develop requirements, they can discuss what metadata is useful for different purposes. They can identify how a single metadata item can be in different contexts.  If the metadata describes when an item was last updated, the team might consider how that metadata might be used in different contexts.  How might it be used by content creators, by the analytics team, by the UX design team, and by the product manager?

Publishers should ask themselves how they can do more for their customers by using metadata.  They need to think about the productivity of their metadata: making specific metadata descriptions do more things that can add value to the content.  And they need a strategy to make that happen.

— Michael Andrews

The post Your Content Needs a Metadata Strategy appeared first on Story Needle.

MIT Press Classics Available Soon at Archive.org

Internet Archive - 30 mei 2017 - 7:55am

For more than eighty years, MIT Press has been publishing acclaimed titles in science, technology, art and architecture.  Now, thanks to a new partnership between the Internet Archive  and MIT Press, readers will be able to borrow these classics online for the first time. With generous support from Arcadia, this partnership represents an important advance in providing free, long-term public access to knowledge.

“These books represent some of the finest scholarship ever produced, but right now they are very hard to find,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive. “Together with MIT Press, we will enable the patrons of every library that owns one of these books to borrow it online–one copy at a time.”

This joint initiative is a crucial early step in Internet Archive’s ambitious plans to digitize, preserve and provide public access to four million books, by partnering widely with university presses and other publishers, authors, and libraries.  The Internet Archive is one of eight groups named semi-finalists in 100&Change, a global competition for a single $100 million grant from the John D. and Catherine T. MacArthur Foundation. The competition seeks bold solutions to critical problems of our time. 

MIT Press’ Kelly McDougall (l) and Editor, Amy Brand, holding one of the publisher’s classic books.

MIT Press Director, Amy Brand said, “One of my top ambitions for the MIT Press has been to ensure that our entire legacy of publications is digitized, accessible, searchable, discoverable now and in perpetuity. Partnering with Internet Archive to achieve this objective is a dream come true not only for me and my colleagues at the Press, but also for many of our authors whose earlier works are completely unavailable or not easily accessible.”  

“Lending online permits libraries to fulfill their mission in the digital age, allowing anyone  to borrow through the ether copies of works they own,” said Professor Peter Baldwin, co-founder of Arcadia.  “The IA-MIT collaboration is a big step in the direction of realizing a universal library, accessible to anyone, anywhere.”

One of the hundreds of titles coming soon to archive.org

We will be scanning an initial group of 1,500 MIT Press titles at Internet Archive’s Boston Public Library facility, including Cyril Stanley Smith’s 1980 book, From Art to Science: Seventy-Two Objects Illustrating the Nature of Discovery, and Frederick Law Olmsted and Theodora Kimball’s Forty Years of Landscape Architecture: Central Park, which was published in 1973. The oldest title in the group is Arthur C. Hardy’s 1936 Handbook of Colorimetry.

John Palfrey, Head of School at Phillips Academy Andover and well-known public access advocate, described the partnership as “a truly ground-breaking development in open scholarship that I hope will inspire other university presses to follow suit, since so many excellent and important books are effectively out of circulation by virtue of being analog-only in a digital world.”

The Internet Archive has already begun digitizing MIT Press’ backlist and they will be available at archive.org soon. The entire MIT Press backlist should be available by the end of 2017.

The Future of Content is Multimodal

Story Needle - 28 mei 2017 - 7:45am

We’re entering a new era of digital transformation: every product and service will become connected, coordinated, and measured. How can publishers prepare content that’s ready for anything?  The stock answer over the past decade has been to structure content.  This advice — structuring content — turns out to be inadequate.  Disruptive changes underway have overtaken current best practices for making content future-ready.  The future of content is no longer about different formats and channels.  The future of content is about different modes of interaction.  To address this emerging reality, content strategy needs a new set of best practices centered on the strategic use of metadata.  Metadata enables content to be multimodal.

What does the Future of Content look like?

For many years, content strategists have discussed how people need their content in terms of making it available in any format, at any time, through any channel that the user wanted.  For a while, the format-shifting, time-shifting, and channel-shifting seemed like it could be managed.  Thoughtful experts advocated ideas such as single-sourcing and COPE (create once, publish everywhere) which seemed to provide a solution to the proliferation of devices.  And it did, for a while.  But what these approaches didn’t anticipate was a new paradigm.  Single-sourcing and COPE assume all content will be delivered to a screen (or its physical facsimile, paper).  Single-sourcing and COPE didn’t anticipate screenless content.

Let’s imagine how people will use content in the very near future — perhaps two or three years from now.  I’ll use the classic example of managed content: a recipe.  Recipes are structured content, and provide opportunities to search according to different dimensions.  But nearly everyone still imagines recipes as content that people need to read.  That assumption no longer is valid.

Cake made by Meredith via Flickr (CC BY-SA 2.0)

In the future, you may want to bake a cake, but you might approach the task a bit differently.  Cake baking has always been a mixture of high-touch craft and low-touch processes.  Some aspects of cake baking require the human touch to deliver the best results, while other steps can be turned over to machines.

Your future kitchen is not much different, except that you have a speaker/screen device similar to the new Amazon Echo Show, and also a smart oven that’s connected to  the Internet of Things in the cloud.

You ask the voice assistant to find an appropriate cake recipe based on wishes you express.  The assistant provides a recipe, which has a choice on how to prepare the cake.  You have a dialog with the voice assistant about your preferences.  You can either use a mixer, or hand mix the batter.  You prefer hand mixing, since this ensures you don’t over-beat the eggs, and keep the cake light.  The recipe is read aloud, and the voice assistant asks if you’d like to view a video about how to hand-beat the batter.  You can ask clarifying questions.  As the interaction progresses, the recipe sends a message to the smart oven to tell it to preheat, and provides the appropriate temperature.  There is no need for the cook to worry about when to start preheating the oven and what temperature to set: the recipe can provide that information directly to the oven.  The cake batter is placed in the ready oven, and is cooked until the oven alerts you that the cake is ready.  The readiness is not simply a function of elapse time, but is based on sensors detecting moisture and heat.  When the cake is baked, it’s time to return giving it the human touch.  You get instructions from the voice/screen device on how to decorate it.  You can ask questions to get more ideas, and tips on how to execute the perfect finishing touches.  Voila.

Baking a cake provides a perfect example of what is known in human-computer interaction as a multimodal activity.  People seamlessly move between different digital and physical devices.  Some of these are connected to the cloud, and some things are ordinary physical objects.  The essential feature of multimodal interaction is that people aren’t tied to a specific screen, even if it is a highly mobile and portable one.  Content flows to where it is needed, when it is needed.

The Three Interfaces

Our cake baking example illustrates three different interfaces (modes) for exchanging content:

  1. The screen interface, which SHOWS content and relies on the EYES
  2. The conversational interface, which TELLS and LISTENS, and relies on the EARS and VOICE
  3. The machine interface, which processes INSTRUCTIONS and ALERTS, and relies on CODE.

The scenario presented is almost certain to materialize.  There are no technical or cost impediments. Both voice interaction and smart, cloud-connected appliances are moving into the mainstream. Every major player in the world of technology is racing to provide this future to consumers. Conversational UX is an emerging discipline, as is ambient computing that embeds human-machine interactions in the physical world. The only uncertainty is whether content will be ready to support these scenarios.

The Inadequacy of Screen-based Paradigms

These are not the only modes that could become important in the future: gestures, projection-based augmented reality (layering digital content over physical items), and sensor-based interactions could become more common.  Screen reading and viewing will no longer be the only way people use content.  And machines of all kinds will need access to the content as well.

Publishers, anchored in a screen-based paradigm, are unprepared for the tsunami ahead.  Modularizing content is not enough.  Publishers can’t simply write once, and publish everywhere.  Modular content isn’t format-free.  That’s because different modes require content in different ways.  Modes aren’t just another channel.  They are fundamentally different.

Simply creating chunks or modules of content doesn’t work when providing content to platforms that aren’t screens:

  • Pre-written chunks of content are not suited to conversational dialogs that are spontaneous and need to adapt.  Natural language processing technology is needed.
  • Written chunks of content aren’t suited to machine-to-machine communication, such as having a recipe tell an oven when to start.  Machines need more discrete information, and more explicit instructions.

Screen-based paradigms presume that chunks of content would be pushed to audiences.  In the screen world, clicking and tapping are annoyances, so the strategy has been to assemble the right content at delivery.  Structured content based on chunks or modules was never designed for rapid iterations of give and take.

Metadata Provides the Solution for Multimodal Content

Instead of chunks of content, platforms need metadata that explains the essence of the content.  The metadata allows each platform to understand what it needs to know, and utilize the essential information to interact with the user and other devices.  Machines listen to metadata in the content.  The metadata allows the voice interface and oven to communicate with the user.

These are early days for multimodal content, but the outlines of standards are already in evidence  (See my book, Metadata Basics for Web Content, for a discussion of standards).   To return to our example, recipes published on the web are already well described with metadata.  The earliest web standard for metadata, microformats, provided a schema for recipes, and schema.org, today’s popular metadata standard, provides a robust set of properties to express recipes.  Already millions of online recipes are described with metadata standards, so the basic content is already in place.

The extra bits needed to allow machines to act on recipe metadata are now emerging.  Schema.org provides a basic set of actions that could be extended to accommodate IoT actions (such as Bake).  And schema.org is also establishing a HowTo entity that can specify more specific instructions relating to a recipe, that would allow appliances to act on the instructions.

Metadata doesn’t eliminate the need for written text or video content.  Metadata makes such content more easily discoverable.  One can ask Alexa, Siri, or Google to find a recipe for a dish, and have them read aloud or play the recipe.  But what’s needed is the ability to transform traditional stand-alone content such as articles or videos into content that’s connected and digitally native.  Metadata can liberate the content from being a one-way form of communication, and transform it into being a genuine interaction.  Content needs to accommodate dialog.  People and machines need to be able to talk back to the content, and the content needs to provide an answer that makes sense for the context.  When the oven says the cake is ready, the recipe needs to tell the cook what to do next.  Metadata allows that seamless interaction between oven, voice assistant and user to happen.

Future-ready content needs to be agnostic about how it will be used.  Metadata makes that future possible.  It’s time for content strategists to develop comprehensive metadata requirements for their content, and have a metadata strategy that can support their content strategy in the future. Digital transformation is coming to web content. Be prepared.

— Michael Andrews

The post The Future of Content is Multimodal appeared first on Story Needle.

TV news fact-checked: Gianforte, Gingrich, Pelosi & more

Internet Archive - 26 mei 2017 - 5:31pm

By Nancy Watzman and Katie Dahl

In this week’s round-up from the TV News Archive,  our fact-checking partners declare that Greg Gianforte, now Montana’s U.S. House representative-elect, was the aggressor in a conflict with a reporter; Newt Gingrich spread a conspiracy theory; House Minority Leader Nancy Pelosi stretched claims about how veterans could be hurt under the House GOP health care bill; and White House budget director Mick Mulvaney double-counted money.

Claim: Guardian reporter’s aggression, not Gianforte’s, caused altercation (flip that)

On May 24 a campaign spokesperson for Greg Gianforte, who has since won the Montana U.S. House race, said, “Tonight, as Greg was giving a separate interview in a private office, The Guardian‘s Ben Jacobs entered the office without permission, aggressively shoved a recorder in Greg’s face, and began asking badgering questions. Jacobs was asked to leave. After asking Jacobs to lower the recorder, Jacobs declined. Greg then attempted to grab the phone that was pushed in his face. Jacobs grabbed Greg’s wrist, and spun away from Greg, pushing them both to the ground. It’s unfortunate that this aggressive behavior from a liberal journalist created this scene at our campaign volunteer BBQ.”

As reported by John Kruzel and Smitha Rajan for PolitiFact, a Fox News reporter was in the room at the time and gave this account. “…Gianforte grabbed Jacobs by the neck with both hands and slammed him into the ground behind him. Faith, Keith and I watched in disbelief as Gianforte then began punching the reporter.” Gianforte has since apologized.

Claim: DNC staffer assassinated after giving emails to WikiLeaks (unsupported)

Newt Gingrich, a former Republican House Speaker, said in a TV interview, “we have this very strange story now of this young man who worked for the Democratic National Committee, who apparently was assassinated at 4 in the morning, having given WikiLeaks something like 23,000. I’m sorry, 53,000 emails and 17,000 attachments.”

“Gingrich Spreads Conspiracy Theory,” read a headline from FactCheck.org. Eugene Kiely reported “there’s no evidence for his claim.” PunditFact, a project of PolitiFact, gave Gingrich its worst fact-check rating, Pants on Fire.  Lauren Carroll reported, “Hours after Fox published its report, (Rod) Wheeler recanted. He told CNN that he hadn’t seen the evidence himself, and his knowledge of Rich’s alleged email contact with WikiLeaks came from the national Fox News reporter, not his own investigative work.”

(Note: Kiely also made use of the Wayback Machine in his piece, linking to a now-deleted Fox News story now saved at the Internet Archive. Washington Post reporters Kristine Phillips and Peter Holley published similar links in their story on how Fox News retracted its story on Seth Rich.)

Claim: seven million veterans will lose tax credit for their families in health care bill (three Pinocchios)

During a speech at a conference hosted by the Center for American Progress, Rep. Nancy Pelosi, D., Calif., said of the House-passed GOP health care reform bill, “Seven million veterans will lose their tax credit for their families in this bill.”

Michelle Ye Hee Lee reported for The Washington Post’s Fact Checker that “veterans ‘could’ — not ‘will,’ as Pelosi says — lose tax credits if the current protections don’t carry over under a new health law… Would it affect 7 million veterans and their families? Not necessarily.”

Claim: economic growth will pay for both eliminating the deficit and tax cuts (wait a minute)

In a press conference about President Trump’s proposed 2018 fiscal budget, White House budget director Mick Mulvaney said “we get to an actual balance on this budget within the 10-year window,” because “we will bring back 3% economic growth to this country and those numbers are assumed in this budget. By the way if we don’t the budget will never balance. You will never see a balanced budget again. We refuse to accept that the new normal in this country. Three percent was the old normal. Three percent will be the new normal again under the Trump administration and that is part and parcel with the foundation of this budget.” Treasury Secretary Steve Mnuchin also claimed economic growth would pay for the proposed revenue-neutral tax plan, “This will pay for itself with growth and with reduced — reduction of different deductions and closing loopholes.”

“Wait a minute, say tax and budget experts, that’s double-counting the same money,” reported Robert Farley of FactCheck.org. Roberton Williams of the Tax Policy Center told FactCheck.org that you can’t assume growth will balance the budget and offset tax cuts, “Both of those are not plausible,” he said

Claim: Manafort and others visited Moscow during the campaign (mostly false)

In a TV interview, Rep. Maxine Waters, D., Calif., said “I really do believe that much of what you saw coming out of Trump’s mouth was a play from Putin’s playbook… I think you can see visits, you know, to Moscow made during the campaign by (Paul) Manafort and others.”

“From what’s on the public record, Manafort didn’t go at all, and (Carter) Page went once… Waters made it sound like this was a regular occurrence. We rate this claim Mostly False.” Jon Greenberg reported for PolitiFact.

Claim: Wisconsin high-risk pool had 8 or 9 plans, people could go to any doctor, and premiums and copays were cheaper than Obamacare (half true)

In response to criticism from Democrats for the House-passed health care proposal, Rep. Paul Ryan, R., Wis., said “In Wisconsin, we had a really successful high-risk pool. Ten percent of the people in the individual market in Wisconsin were in the state high-risk pool. They had eight or nine plans to choose from. They could go to any doctor or any hospital they wanted. And their premiums and copays were cheaper than they are under Obamacare today.”

For PolitiFact, Tom Kertscher reported “He’s essentially on target on the first two parts, but not on the third… it can’t be flatly stated that the high-risk pool plans were cheaper than Obamacare plans for comparable coverage.”

To receive the TV News Archive’s email newsletter, subscribe here.

A Visual Approach to Learning Schema.org Metadata

Story Needle - 24 mei 2017 - 8:10am

Everyone involved with publishing web content, whether a writer, designer, or developer, should understand how  metadata can describe content. Unfortunately, web metadata has a reputation, not entirely undeserved, for being a beast to understand. My book, Metadata Basics for Web Content, explains the core concepts of metadata. This post is for those ready to take the next step: to understand how a metadata standard relates to their specific content.

Visualizing Metadata

How can web teams make sense of voluminous and complex metadata documentation?  Documentation about web metadata is generally written from a developer perspective, and can be hard for non-techies to comprehend. When relying on detailed documentation, it can be difficult for the entire web team to have a shared understanding of what metadata is available.  Without such a shared understanding, teams can’t have a meaningful discussion of what metadata to use in their content, and how to take advantage of it to support their content goals.

The good news is that metadata can be visualized.  I want to show how anyone can do this, with specific reference to schema.org, the most important web metadata standard today. The technique can be useful not only for content and design team members who lack a technical background, but also for developers.

Everyone who works with a complex metadata standard such as schema.org faces common challenges:

  1. A large and growing volume of entities and properties to be aware of
  2. Cases where entities and properties sometimes have overlapping roles that may not be immediately apparent
  3. Terminology that can be misunderstood unless the context is comprehended correctly
  4. The prevalence of many horizontal linkages between entities and properties, making navigation through documentation a pogo-like experience.

First, team members need to understand what kinds of things associated with their content can be described by a metadata standard.  Things mentioned in content are called entities.  Entities have properties.  Properties describe values, or  they express the relationship of one entity to another.

Entities are classified according to types, which range from general to specific.  Entity types form a hierarchy that can be expressed as a tree.  All entities derive from the parent entity, called Thing.  Currently, schema.org has over 600 entity types.  Dan Brickley, an engineer at Google who is instrumental in the development of schema.org, has helpfully developed an interactive visualization in D3 (a Javascript library for data visualization), presented as a radial tree, which shows the distribution of entity types within schema.org.  The tool is a helpful way to explore the scope of entities addressed, and the different levels of granularity available.

Screenshot of entity tree, available at http://bl.ocks.org/danbri/raw/1c121ea8bd2189cf411c/

D3 is a great visualization library, but it requires both knowledge and time to code.  For out second kind of visualization, we’ll rely on a much simpler tool.

Graphs of Linked Data

Web metadata can connect or link different items of information together, forming a graph of knowledge.  Graphs are ideal to visualize.  By visualizing this structure, content teams can see how entities have properties that relate to other entities, or that have different kinds of values.  This kind of visualization is known as a concept map.

Let’s visualize a common topic for web content: product information.  Many things can be said about a product: who is it from, what is like, and how much it costs.  I’ve created the below graph using an affordable and easy-to-use concept mapping app called Conceptorium (though other graphic tools can be used).  Working from the schema.org documentation for products, I’ve identified some common properties and relationships for products.  Entities (things described with metadata) are in green boxes, while literal values (data you might see about them) are in salmon colored boxes.  Properties (attributes or qualities of things) are represented by lines with arrows, with the name of the property next to the line.

Concept map of schema.org entities and properties related to products

The graph illustrates some key issues in schema.org that web teams need to understand:

  • The boundary between different entity types that address similar properties
  • The difference between different instances of the same entity type
  • The directional relationships of properties.
Entity Boundaries

Concept maps help us see the boundaries between related entity types.  A product, shown in the center of our graph, has various properties, such as a name, a color, and an average user rating (AggregateRating).  But when the product is offered for sale, properties associated with the conditions of sale need to be expressed through the Offer entity.  So in schema.org, we can see that products don’t have prices or warranties; offers have prices or warranties.  Schema.org allows publishers to express an offer without providing granular details about a product.  Publishers can note the name and product code (referred to as gtin14) in the offer together with the price, and not need to use the Product entity type at all.  The Offer and Product entity types both use the name and product code (gtin14) properties.   So when discussing a product, the team needs to decide if the content is mostly about the terms of sale (the Offer), or about the features of the product (the Product), or both.

Instances and Entity Types

Concept maps help us distinguish different instances of entities, as well as cases where instances are performing different roles. From the graph, we can see that a product can be related to other products.  This can be hard to grasp in the documentation, where an entity type is presented as both the subject and the object of various properties.  Graphs can show how there can be different product instances that may have different values for the same properties (e.g., all products have a name, but each product has a different name).  In our example, we can see that on product at the bottom right is a competitive product to the product in the center.  We can compare the average rating of the competitor product with the average ratings of the main product.  We can also see another related product, which is an accessory for the main product.  This relationship can help identify products to display as complements.

An entity type provides a list of properties available to describe something.  Web content may discuss numerous, related things that all belong to the same entity type.  In our example, we see several instances of the Organization entity type.  In one case, an organization owns a product (perhaps a tractor).  In another case, the Organization is a seller.  In a third case, the Organization is a manufacturer of the product. Organizations can have different roles relating to an entity.

Content teams need to identify in their metadata which Organizations are responsible for which role.  Is the seller the manufacturer of the product, or are two different Organizations involved?  Our example illustrates how a single Person can be both an owner and a seller of a Product.

What Properties Mean

Concept maps can help web teams see what properties really represent.  Each line with an arrow has a label, which is the name of the property associated with an entity type.  Properties have a direction, indicated by the arrow.  The names of properties don’t always directly translate into an English verb, even when they at first appear to.  For example, in English, Product > manufacturer > Organization doesn’t make much sense. The product doesn’t make the organization, but rather the organization manufactures the product.  It’s important to pay attention to the direction of a property: what entity type is expected — especially when these relationships seem inverted to how we think about them normally.

Many properties are adjectives or even nouns, and need helper verbs such as “has” to make sense.  If the property describes another entity, then that entity can involve many more properties to describe additional dimensions of that entity.  So we might say that “a Product has a manufacturer which is an Organization (having a name, address, etc.)”  That’s not very elegant in English, but the diagram keeps the focus on the nature of the relationships described.

Broader Benefits of Concept Mapping for Content Strategy

So far, we’ve discussed how concept maps can help web teams understand what the metadata means, and how they need to organize their metadata descriptions.  Concept maps can also help web teams plan their content.  Teams can use maps to decide what content to present to audiences, and even what content to create that audiences may be interested in.

Content Planning

Jarno van Driel, a Dutch SEO expert, notes that many publishers treat schema.org as “an afterthought.”  Instead, Jarno argues, publishers should consult the properties available in schema.org to plan their content.  Schema.org is a collective project, where different contributors identify properties relating to entities they would like to mention that they feel would be of interest to audiences.  Schema.org can be thought of as a blueprint for information you can provide audiences about different things you publish.  While our example concept map for product properties is simplified to conserve space, a more complete map would show many more properties, some of which you might decide to address in your content.  For example, audiences might want to know about the material, the width, or the weight of the product — properties available in schema.org that publishers may not have considered including in their content.

Content Design and Interaction Design

Concept maps can also reveal relationships between different levels of information that publishers can present.  Consider how this information is displayed on the screen.  Audiences may want to compare different values. They may want to know all the values for a specific property (such as all the colors available), or they want to compare the values for a property of two different instances (average rating of two different products).

Concept maps can reveal qualifications about the content (e.g., an Offer may be qualified by an area served).  Values (shown in salmon) can be sorted and ranked.  Concept maps also help web teams decide on the right level of detail to present.  Do they want to show average ratings for a specific product, or a brand overall?  By consulting the map, they can consider what data is available, and what data would be most useful to audiences.

Concept map app shows columns of entities and value, which allow exploration of relationships Conclusion

Creating a concept map requires effort, but is rewarding.  It requires you to compare the specification of the standard with your representation of it, to check that relationships are known and understood correctly.  It allows you to see some characteristics, such as properties used by more than one entity. It can help content teams see the bigger picture of what’s available in schema.org to describe their content, so that the team can collectively agree to metadata requirements relating to their web content.  If you want to understand schema.org more completely, to know how it relates to the content you publish, creating a concept map is a good place to start.

— Michael Andrews

The post A Visual Approach to Learning Schema.org Metadata appeared first on Story Needle.

Internet (Film) Archive – A Screening: Monday June 5 at 7 pm

Internet Archive - 22 mei 2017 - 6:18pm

Join us for an evening of fun, nostalgia and learning with a screening of the rarest, corniest and weirdest films from the Internet Archive’s collection of Educational Media. This curated screening of digitized and 16mm films will also include favorites as voted by IA users and staff.

RSVP at eventbrite.com

Browse the collection at archive.org/details/educationalfilms.

Nominate your favorite films at https://www.surveymonkey.com/r/WZFS2MD

Re: User account breach

Internet Archive - 19 mei 2017 - 8:45pm

The FBI helpfully told us that they found a copy of the Archive’s user database, dated prior to 2012, during one of their investigations. This database did not have much information that is not on the website, but it had lightly encrypted passwords of the users at the time. We have since upped the encryption level.

We have not noticed any uptick in compromised account activity at the Archive, so we’d bet against past malicious use. We will be emailing all Archive patrons who held accounts prior to 2012, containing much of the same information you see here.

We are sorry for this inconvenience.

TV news fact-checked: Comey, Schumer, McMaster, Mueller

Internet Archive - 19 mei 2017 - 5:36pm

It was a yet another extraordinary week in U.S. politics, with a series of explosive news reports centering on President Donald Trump. The TV News Archive is saving history as it happens, as well as linking relevant fact-checks by FactCheck.org, PolitiFact, and The Washington Post‘s Fact Checker to statements by public officials.

On Sunday shows, Schumer demands release of tapes–if they exist

Senate Majority Leader Charles “Chuck” Schumer, D., N.Y., made the rounds of Sunday news talk shows, appearing on “Meet the Press” and “State of the Union,” calling for a special prosecutor to investigate possible connections between the Trump campaign and Russia among other matters. In this clip, Schumer says Trump should turn over tapes–the possibility of which were raised by Trump in a tweet on May 12–if they exist, of the president’s conversations with now former FBI director James Comey.

In this piece titled “Trump vs. Comey,” FactCheck.org reporters Eugene Kiely and Robert Farley trace the history of statements by the president and Comey about their discussions. They note, “White House Press Secretary Sean Spicer has repeatedly refused to answer whether Trump has such recordings. In his interview with Jeanine Pirro, Trump said, “Well, that I can’t talk about. I won’t talk about that.”

McMaster reacts to report that Trump shared intelligence with Russians

After The Washington Post reported, on May 15, that Trump had revealed “highly classified information” to Russian envoys visiting the White House last week, national security adviser H.R. McMaster defended the president that day and at a press conference the following day. Among his assertions: “The story that came out tonight as reported is false.”

“The key phrase is “as reported,” wrote Glenn Kessler, for The Washington Post’s Fact Checker, in a piece that dissects McMaster’s statements before the press. “With this language, McMaster in theory could dispute any element, no matter how small, as false. He notably did not say the story was false.” John Kruzel, writing for PolitiFact, traced the “shifting” explanations from the White House on what happened at the meeting with the Russians, including McMaster’s statements.

Former FBI director Robert Mueller appointed special counsel

Wednesday, May 17 brought the news that the U.S. Department of Justice appointed Robert Mueller as a special counsel to investigate possible connections between Trump’s 2016 campaign and Russia. Here PolitiFact reporter Lauren Carroll gives the basics on Mueller’s background and experience.

The TV News Archive contains numerous historical clips of Mueller, who served as FBI director under  Presidents George W. Bush and Barack Obama, including this brief farewell interview he gave to ABC in 2013, where he talks about terrorism.

Mueller and Comey have an earlier association at a high-drama moment in U.S. history. In 2014, Comey told “60 Minutes” about the day that he and Mueller visited a bedridden John Ashcroft, then attorney general, to tell him they would resign rather than reauthorize a controversial domestic surveillance program under pressure from the White House. Ashcroft deferred to Comey, and, as recounted by The Los Angeles Times, “It was only when President George W. Bush agreed to listen to Comey and Mueller and restructure the program did resignation plans go away.”

To receive the TV News Archive’s email newsletter, subscribe here.

“And the Webby Award for Lifetime Achievement Goes to….”

Internet Archive - 16 mei 2017 - 11:32pm

“The Internet Archive…is building a home for Universal Access to All Knowledge, open to everyone, everywhere, to use as they like. Open to all societies of the future that care to build on our triumphs and learn from our mistakes.”

                                                                  – Lawrence Lessig

Last night in New York City, we put on our best duds and donned our fanciest archivist hats for a once in a lifetime event. The Internet Archive was honored with a Lifetime Achievement Award at the 21st annual Webbys, hailed by the New York Times as “one of the Internet’s highest honors.” The Webby Awards lauded the Internet Archive for being “the web’s most knowledgeable historian.”

Three of our veteran staff members,Tracey Jacquith, TV Archive Architect, Internet Archive founder and Digital Librarian, Brewster Kahle, and Alexis Rossi, Director of Media and Access, accepted the award. Kahle delivered the five-word acceptance speech with panache:  “Universal Access to All Knowledge.”

Perhaps the greatest honor of the evening came in the form of a video narrated by Open Knowledge champion, Lawrence Lessig.  He said, “Creativity and innovation built on the past.  The Internet Archive is the foundation preserving that past, so that perhaps, one can at least hope that our children and their children can shape a future that knows our joys and learns from our many mistakes.”

The award was presented by Nancy Lublin, CEO of the Crisis Text Line and DoSomething.org, who pointed out that in this chaotic political year, the Internet Archive has saved “200 terabytes of government data that could have otherwise been lost in the transition from blue light saber to red light saber.”

The award reads:

Webby Lifetime Achievement: Archive.org for its commitment to making the world’s knowledge available online and preserving the history of the Internet itself. With a vast collection of digitized materials and tools like the Wayback Machine, Archive.org has become a vital resource not only to catalogue an ever-changing medium, but to safeguard a free and open Internet for everyone.

The complete list of Webby Award winners is available here.

TV news fact-checked: Comey edition

Internet Archive - 12 mei 2017 - 7:12pm

We devote this week’s edition of the TV News Archive roundup to the controversy that’s erupted surrounding President Donald Trump’s sudden announcement on Tuesday, May 9, that he was firing FBI director James Comey. The TV News Archive provides a wealth of material for exploring media coverage of this major moment in U.S. history.

Comey fame tied to Clinton and Trump

Comey may still not quite be a household name, but mention of “Comey” spiked higher than ever on TV newscasts this week after he was fired. Comey has enjoyed notoriety in the past, his biggest moments tied closely to the fates of 2016 presidential rivals Hillary Clinton and Trump.

The most recent spike before this week was on March 20, when he testified before Congress, confirming that the FBI was investigating possible ties between his campaign and Russia. Another major spike occurred in November 2016, days before the election, when Comey announced the FBI was reopening an investigation into then-Democratic candidate Hillary Clinton’s use of a private email server for official business while serving as secretary of state. Comey also garnered attention in July 2016, when he announced that the FBI would not be pursuing charges against Clinton.

The visual below, showing mentions of “Comey,” was created with Television Explorer, an online tool fueled by TV News Archive data and created by Kalev Leetaru. This tool can be used to find patterns in words and phrases captured by closed captioning and contained in the TV News Archive.

Source: Television Explorer, Kalev Leetaru

Trump’s letter to Comey fact-checked

In the hours following the firing, one major point of focus for fact-checkers and other media was the portion of the letter to Comey where Trump stated, “While I greatly appreciate you informing me, on three separate occasions, that I am not under investigation, I nevertheless concur with the judgment of the Department of Justice that you are not able to effectively lead the Bureau.”

Below is a CNN broadcast, as captured in the TV News Archive, where the CNN newscaster Dave Briggs reads the letter on the air.

PolitiFact, The Washington Post‘s Fact Checker, and FactCheck.org have all weighed in on the president’s assertion, noting that too much remains unknown to confirm it. “With Comey out, it’s unclear whether the public will ever learn if the FBI was investigating Trump personally, rather than just his associates — or anything else about the investigation, for that matter,” wrote PolitiFact’s Lauren Carroll on May 11. (See fact-checks connected to televised statements by public officials here.)

Meanwhile, the story continues to unfold. On May 11, Sarah Huckabee, deputy White House press secretary, told ABC’s George Stephanopoulos that the president had confirmed this assertion with her directly. And Trump himself told NBC News’ Lester Holt that the assurances came during a private dinner and twice over the phone. And on Friday morning, Trump tweeted that Comey “better hope there are no ‘tapes’ of our conversations before he starts leaking to the press!”

Some Watergate history, please

Many commentators this week have noted parallels between Trump’s firing of Comey and the Saturday Night Massacre of 1973, when President Richard Nixon ordered independent special prosecutor Archibald Cox fired, during the Watergate investigation; his boss, Attorney General Elliot Richardson, and Deputy General William Ruckelshaus, both of whom refused to fire Cox, resigned in protest. Acting head of the Department of Justice, Robert Bork, carried out the order to fire Cox. (Note: the Richard Nixon Library playfully, but accurately, fired off a tweet noting that Nixon had never fired an FBI director, and then later was criticized for doing so by the National Archives and Records Administration, the federal agency that administers presidential libraries.)

While the TV News Archive’s collection of 1.3 million TV news shows dates back to 2009, long after the Nixon era, some footage from that time is available from later airings. Here, for example, is footage of Cox’s press conference right before he was dismissed.

And here is a quick explainer of the Saturday Night Massacre, as broadcast by MSNBC in 2013.

Searching Trump Archive for past Trump statements about Comey

The largely hand-curated Trump Archive, a collection of Trump statements and appearances on TV news broadcasts, makes it easier to find past instances of Trump talking about Comey. The TV News Archive is working on ways to make the creation of such collections less labor intensive, by using machine learning tools to identify instances of public officials speaking within the collection of 1.3 million tv news shows.

A search of closed captions on the terms “Trump” and “Comey” would yield both instances when Trump is speaking about Comey and newscasters who are reporting on the two men. But searching within the Trump Archive quickly yields Trump statements about Comey.

Here is some of what we found:

April 28, 2016: Trump says “I think if [Comey’s]  straight up she’s not going to be able to run.”


June 13, 2016: Trump talking about FBI investigation of Orlando nightclub shooting, “I’m a big fan of the FBI, there’s no bigger fan than me, but look they’ve seen better days. Let’s face it.”

October 13, 2016:  Trump speaking about Comey, “The great men and women who work for the FBI are embarrassed and ashamed of what he has done to one of our truly great institutions, the FBI itself.”

October 20, 2016: Trump at Al Smith Dinner, joking at an annual fundraiser for Catholic charities:  “I’d like to address an important religious matter, the issue of going to confession. Or, as Hillary calls it, the Fourth of July weekend with FBI director Comey.”

October 29, 2016: Following Director Comey’s letter to congressional leaders about newly discovered Clinton emails, Trump says, “I have to tell you, I respect the fact that Director Comey was able to come back after what he did. I respect that very much.”

November 14, 2016: Trump won’t say if he will ask Comey to resign.  “I think that I would rather not comment on that yet. I don’t– I haven’t made up my mind. I respect him a lot. I respect the FBI a lot.”

To receive the TV News Archive’s email newsletter, subscribe here.

Listening to the 78rpm Disc Collection

Internet Archive - 11 mei 2017 - 10:10pm

by Jessica Thompson, Coast Mastering

A few times a year, I join B. George in the Internet Archives’ warehouses to help sort and pack 78rpm discs to ship to George Blood L.P. for digitization. As a music fan and a professional mastering and restoration engineer, I get a thrill from handling the heavy, grooved discs, admiring the fonts and graphic designs on the labels, and chuckling at amusing song titles. Now digitized, these recordings offer a wealth of musicological, discographic and technical information, documenting and contextualizing music and recording history in the first half of the 20th century.

The sheer scale of this digitization project is unprecedented. At over 15,000 recordings and counting, the value strictly in terms of preservation is clear, especially given the Internet Archive’s focus on digitizing music less commonly available to researchers. Music fans can take a deep dive into early blues, Hawaiian, hillbilly, comedy and bluegrass. I even found several early Novachord synthesizer recordings from 1941.

As a researcher and audio restoration engineer, the real goldmine is in the aggregation of discographic and technical metadata accompanying these recordings. Historians can search for and cross reference recordings based on label, artist, song title, year of release, personnel, genre, and, importantly, collection. (The Internet Archive documents the provenance of the 78rpm discs so that donated collections remain digitally intact and maintain their contextual significance). General users can submit reviews with notes to amend or add to metadata, and the content of those reviews is searchable, so metadata collection is active. No doubt it will continue to improve as dedicated and educated users fill in the blanks.

Access to the technical metadata offers a valuable teaching tool to those of us who practice audio preservation. For audio professionals new to 78s and curious about how much difference a few tenths of a millimeter of stylus can make, the Internet Archive offers 15,000+ examples of this. Play through the different styli options, and it quickly becomes apparent that particular labels, years and even discs do respond better to specific styli sizes and shapes. This is something audio preservationists are taught, but rarely are we presented with comprehensive audio examples. To be able to listen to and analyze the sonic and technical differences in these versions marries the hard science with the aesthetic.

Playback speeds were not standardized until the late 1920s or early 1930s, and most discs were originally cut at speeds ranging from 76-80rpm (and some well beyond). The discs in the /georgeblood/ collection were all digitized at a playback speed of 78rpm. Preservationists and collectors debate extensively about the “correct” speed at which discs ought to be played back, and whether one ought to pitch discs individually. However, performance, recording and manufacturing practices varied so widely that even if a base speed could generally be agreed upon, there will always be exceptions. (For more on this, please check out George Blood’s forthcoming paper Stylus Size And Speed Selection In Pre-1923 Acoustic Recordings in Sustainable audiovisual collections through collaboration: Proceedings of the 2016 Joint Technical Symposium. Bloomington, IN: Indiana University Press).

Every step of making a recording involves so many aesthetic decisions – choices of instrumentation, methods of sound amplification, microphone placement, the materials used in the disc itself, deliberate pitching of the instruments and slowing or speeding of the recording – that playback speed simply become one of many aesthetic choices in the chain. As preservationists, we are preserving the disc as an historic record, not attempting to restore or recreate a performance. (Furthermore, speed correction is possible in the digital realm, should anyone want to modify these digital files for their own personal enjoyment).

How do they sound? Each 78rpm disc has an inherent noise fingerprint based on the frequency and dynamic range the format can replicate (limited, compared to contemporary digital playback formats) and the addition of surface noise from dust, dirt and stylus wear in the grooves. As expected, the sound quality in this collection varies. Some of these discs were professionally recorded, minimally played, stored well, and play back with a tolerable, even ignorable level of surface noise relative to the musical content. Others were recorded under less professional circumstances, and/or were much loved, frequently played, stored without sleeves in basements and attics, and therefore suffer from significant surface noise that can interfere with enjoyment (and study) of the music.

Yet, a compelling recording can cut through noise. Take this 1944 recording of Josh White performing St. James Infirmary, Asch 358-2A. This side has been released commercially several times, so if you look it up on a streaming service like Spotify, you can listen to different versions sourced from the same recording (though almost certainly not from the same 78rpm disc). They play at different speeds, some barely perceptibly faster or slower but at least one nearly a half-step faster than the preservation copy digitized by George Blood L.P. They also have a range of noise reduction and remastering aesthetics, some subtle and some downright ugly and riddled with digital artifacts. The version on the Internet Archive offers a benchmark. This is what the recording sounded like on the original 78rpm disc. Listen to the bend in the opening guitar notes. That technique cuts through the surface noise and should be preserved and highlighted in any restored version (which is another way of saying that any noise reduction should absolutely not interfere with the attack and decay of those luscious guitar notes).

McGill University professor of Culture and Technology Jonathan Sterne wrote a book – The Audible Past: Cultural Origins of Sound Reproductionthat is worth reading for anyone interested in a cultural history of early recording formats, including 78s. As Sterne says, sound fidelity is “ultimately about deciding the values of competing and contending sounds.” So, in listening to digital versions of 78s on the Internet Archive, music fans, researchers, and audio professionals alike engage in a process of renegotiating concepts of acceptable thresholds of noise and what that noise communicates about the circumstances of the recording and its life on a physical disc.

Fortunately, our brains are very good at calibrating to accept different ratios of signal to noise, and, I found, the more I listened to 78rpm recordings on the Internet Archive, the less I was bothered by the inherent noise. Those of us who grew up on CDs or digitally recorded and distributed music are not used to the intrusions of surface noise. However, when listening to historic recordings, we are able to adjust our expectations and process a level of noise that would be ridiculous in contemporary music formats. (Imagine this week’s Billboard Top 100 chart topper, Bruno Mars’s “That’s What I Like,” with the high and low end rolled off, covered in a sheen of crackles and pops). The fact that these 78rpm recordings sound, to us, like they were made in the 1920s, 1930s, 1940s lets them get away with a different scale of fidelity. The very nature of their historicity gets them off the hook.

In analog form, crackles and pops can be mesmerizing, almost like the sound of a crackling fire. However, once digitized, those previously random pops become fixed in time. What may have been enjoyable in analog form becomes a permanent annoyance in digital form. The threshold of acceptable noise levels moves again.

This means that noise associated with recording carriers such as 78rpm discs is almost always preferably to noises introduced in the digital realm through the process of attempted noise reduction. Sound restorationists understand that their job is to follow a sonic Hippocratic oath: do no harm. Though noise reduction tools are widely available, they range in quality (and accordingly in cost), and are merely tools to be used with a light or heavy touch, by experienced or amateur restorationists.

The question of whether noise reduction of the Internet Archive’s 78rpm recordings could be partially automated makes my heart palpitate. Though I know from experience that, for example, auto-declickers exist that could theoretically remove a layer of noise from these recordings with minimal interference with the musical signal, I don’t believe the results would be uniformly satisfactory. It is so easy to destroy the aura of a recording with overzealous, heavy-handed, cheap, or simply unnecessary noise reduction. Even a gentle touch of an auto-declicker or de-crackler will have widely varying results on different recordings.

I tried this with a sampling of selections from the /georgeblood/ collection. I chose eleven songs from different genres and years and ran two different, high quality auto-declickers (the iZotope RX6 Advanced multiband declicker and CEDAR Audio’s declick) on the 24bit FLAC files. The results were uneven. Some of the objectively noisier songs, such as Blind Blake’s Tampa Bound, Paramount 12442-B, benefited from having the most egregious surface noises gently scrubbed.

Tampa Bound Flat Transfer vs Tampa Bound Declicked, Dehissed and Denoised
that’s a lot of noise!

However, a song with a strong musical presence and mild surface noise such as Trio Schmeed’s Yodel Cha Cha, ABC-Paramount 9660, actually suffered more from light auto-declicking because the content of the horns and percussive elements registered to the auto-delicker as aberrations from the meat of the signal and were dulled. A pop presents as an aberration across all frequencies. Mapped visually across frequency, time and intensity, it looks like a spike cutting through the waveform. A snare hit looks similar and is therefore likely to be misinterpreted by an auto-declicker unless the threshold at which the declicker deploys is set very carefully. This difference is why good restorationists earn their pay.

Yodel Cha Cha flat transfer and denoised. Notice the “clicks and pops” have been scrubbed,
but so has wanted high end content in the music.

 I am approaching this collection as a listener and music fan, as a researcher, and as an audio professional, three very different modes of listening and interacting with music. In all cases, the Internet Archive 78rpm collection offers massive amounts of music and data to be explored, discovered, enjoyed, studied and utilized. Whether you want to listen to early Bill Monroe tunes, crackles, pops and all, or explore hundreds of recordings of pre-war polkas, or analyze the effects of stylus size on 1930s Victor discs, the Internet Archive provides the raw materials in digital form and, not to be underestimated, preserves the original discs too.


Abonneren op Informatiebeheer  aggregator - Beschikbaarstellen