U bent hier

Beschikbaarstellen

Should Information be Data-Rich or Content-Rich?

Story Needle - 14 augustus 2017 - 8:10am

One of the most challenging issues in online publishing is how to strike the right balance between content and data.  Publishers of online information, as a matter of habit, tend to favor either a content-centric, or a data-centric approach.  Publishers may hold deep seeded beliefs about what form of information is most valuable.  Some believe that compelling stories will wow audiences. Others expect that new artificial intelligence agents, providing instantaneous answers, will delight them. This emphasis on information delivery can overshadow consideration of what audiences really need to know and do. How information is delivered can get in the way of what the audience needs. Instead of delight, audiences experience apathy and frustration. The information fails to deliver the right balance between facts, and explanation.

The Cultural Divide

Information on the web can take different forms. Perhaps the most fundamental difference is whether online information provides a data-rich or content-rich experience. Each form of experience has its champions, who promote the virtues of data (or content).  Some go further, and dismiss the value of the approach they don’t favor, arguing that content (or data) actually gets in the way of what users want to know.

  • A (arguing for data-richness): Customers don’t want to read all that text!  They just want the facts.  
  • B (arguing for content-richness): Just showing facts and figures will lull customers to sleep!

Which is more important, offering content or data?  Do users want explanations and interpretations, or do they just want the cold hard facts?  Perhaps it depends on the situation, you think.  Think of a situation where people need information.  Do they want to read an explanation and get advice, or do they want a quick unambiguous answer that doesn’t involve reading (or listening to a talking head)?  The scenario you have in mind, and how you imagine people’s needs in that scenario, probably reveals something about your own preferences and values.  Do you like to compare data when making decisions, or do you like to consider commentary?  Do your own PowerPoint slides show words and images, or do they show numbers and graphs? Did you study a content-centric discipline such as the humanities in university, or did you study a data-centric one such as commerce or engineering? What are your own definitions of what’s helpful or boring?

Our attitudes toward content and data reflect how we value different forms of information.  Some people favor more generalized and interpreted information, and others prefer specific and concrete information.  Different people structure information in different ways, through stories for example, or by using clearly defined criteria to evaluate and categorize information.  These differences may exist within your target audience, just as they may show up within the web team trying to deliver the right information to that audience.  People vary in their preferences. Individuals may shift their personal  preferences depending on topic or situation.  What form of information audiences will find most helpful can elude simple explanations.

Content and data have an awkward relationship. Each seems to involve a distinct mode of understanding.  Each can seem to interrupt the message of the other. When relying on a single mode of information, publishers risk either over-communicating, or under-communicating.

Content and Data in Silhouette

To keep things simple (and avoid conceptual hairsplitting), let’s think about data as any values that are described with an attribute.  We can consider data as facts about something.  Data can be any kind of fact about a thing; it doesn’t need to be a number. Whether text or numeric, data are values that can be counted.

Content can involve many distinct types, but for simplicity, we’ll consider content as articles and videos — containers  where words and images combine to express ideas, stories, instructions, and arguments.

Both data and content can inform.  Content has the power to persuade, as sometimes data can possess that power as well.  So what is the essential difference between them?  Each has distinct limitations.

The Limits of Content

In certain situations content can get in the way of solving user problems.  Many times people are in a hurry, and want to get a fact as quickly as possible.  Presenting data directly to audiences doesn’t always mean people get their questioned answered instantly, of course.  Some databases are lousy answering questions for ordinary people who don’t use databases often.  But a growing range of applications now provide “instant answers” to user queries by relying on data and computational power.  Whereas content is a linear experience, requiring time to read, view or listen, data promises instant experience that can gratify immediately.  After all, who wants to waste their customer’s time?  Content strategy has long advocated solving audience problems as quickly as possible.  Can data obviate the need for linear content?

“When you think about something and don’t really know much about it, you will automatically get information.  Eventually you’ll have an implant where if you think about a fact, it will just tell you the answer.”  Google’s Larry Page, in Steven Levy’s  “In the Plex”.

The argument that users don’t need websites (and their content) is advanced by SEO expert Aaron Bradley in his article “Zero Blue Links: Search After Traffic”.   Aaron asks us to “imagine a world in which there was still an internet, but no websites. A world in which you could still look for and find information, but not by clicking from web page to web page in a browser.”

Aaron notes that within Google search results, increasingly it is “data that’s being provided, rather than a document summary.”  Audiences can see a list of product specs, rather than a few sentences that discuss those specs. He sees this as the future of how audiences will access information on different devices.  “Users of search engines will increasingly be the owners of smart phones and smart watches and smart automobiles and smart TVs, and will come to expect seamless, connected, data-rich internet experiences that have nothing whatsoever to do with making website visits.”

In Aaron’s view, we are seeing a movement from “documents to data” on the web. “The evolution of search results in terms of the gradual supplanting of document references by data than it is to infer that direction through the enumeration of individual features.”  No need to read a document: search results will answer the question.  It’s an appealing notion, and one that is becoming more commonplace.  Content isn’t always necessary if clear, unambiguous data is available that can answer the question.

Google, or any search engine, is just a channel — an important one for sure, but not the end-all and be-all.  Search engines locate information created by others, but unless they have rights to that information, they are limited in what they do with it. Yet the principles here can apply to other kinds of interactive apps, channels and platforms that let users get information instantly, without wading through articles or videos.  So is content now obsolete?

There is an important limitation to considering SEO search results as data.  Even though the SEO community refers to search metadata as “structured data”, the use of this term is highly misleading.  The values described by the metadata aren’t true data that can be counted.  They are values to display, or are links to other values.  The problem with structured data as currently practiced is that is doesn’t enforce how the values need to be described.  The structured data values are never validated, so computers can’t be sure if two prices appearing on two random websites are both quoting the same currency, even if both mention dollars.  SEO structured data rarely requires controlled vocabulary for text values, and most of its values doesn’t include or mandate data typing that computers would need to aggregate and compare different values.  Publishers are free to use most any kind of text value they like in many situations.   The reality of SEO structured data is less glamorous than it’s image: much of the information described by SEO structured data is display content for humans to read, rather than data for machines to transform.  The customers who scan Google’s search results are people, not machines.  People still need to evaluate the information, and decide its credibility and relevance.  The values aren’t precise and reliable enough for computers to make such judgements.

When an individual wants to know what time a shop closes, it’s a no brainer to provide exactly that information, and no more. The strongest cases for presenting data directly is when the user already knows exactly what they want to know, and they will understand the meaning and significance of the data shown.  These are the “known unknowns” (or “knowns but forgotten”) use cases.  Plenty of such cases exist.  But while the lure of instant gratification is strong, people aren’t always in a rush to get answers, and in many cases they shouldn’t be in a rush, because the question is bigger than a single answer can address.

The Limits of Data

Data in various circumstances can get in the way of what interests audiences.  At a time when the corporate world increasingly extols the virtues of data, it’s important to recognize when data can be useless, because it doesn’t answer questions that audiences have.  Publishers should identify when data is oversold, as always being what audiences want.  Unless data reflects audiences priorities, the data is junk as far as audiences are concerned.

Data can bring credibility to content, though has the potential to confuse and mislead as well.  Audiences can be blinded by data when it is hard to comprehend, or is too voluminous. Audiences need to be interested in the data for it to provide them with value.  Much of the initial enthusiasm for data journalism, the idea of writing stories based on the detailed analysis of facts and statistics, has receded.  Some stories have been of high quality, but many weren’t intrinsically interesting to large numbers of viewers.  Audiences didn’t necessarily see themselves in the minutiae, or feel compelled to interact with raw material being offered to them.  Data journalism stories are different from commercially oriented information, which have well defined use cases specifying how people will interact with data.  Data journalism can presume people will be interested in topics simply because public data on these topics is available.  However, this data may be collected for a different purpose, often for technical specialists.  Presenting it doesn’t transform it into something interesting to audiences.

The experience of data journalism shows that not all data is intrinsically interesting or useful to audiences.  But some technologists believe that making endless volumes of data available is intrinsically worthwhile, because machines have the power to unlock value from the data that can’t be anticipated.

The notion that “data is God” has fueled the development of the semantic web approach, which has subsequently been  rebranded as “linked data”.  The semantic web has promised many things, including giving audiences direct access to information without the extraneous baggage of content.  It even promised to make audiences irrelevant in many cases, by handing over data to machines to act on, so that audiences don’t even need to view that data.  In its extreme articulation, the semantic web/linked data vision considers content as irrelevant, and even audiences as irrelevant.

These ideas, while still alive and championed by their supporters, have largely failed to live up to expectations.  There are many reasons for this failure, but a key one has been that proponents of linked data have failed to articulate its value to publishers and audiences. The goal of linked data always seems to be to feed more data to the machine.  Linked data discussions get trapped in the mechanics of what’s best for machines (de-referencable URIs,  machine values that mean nothing to humans), instead of what’s useful for people.

The emergence of schema.org (the structured data standard used in SEO) represents a step back from such machine-centric thinking, to accommodate at least some of the needs of human metadata creators by allowing text values. But schema.org still doesn’t offer much in the way of controlled vocabularies for values, which would be both machine-reliable and human-friendly.  It only offers a narrow list of specialized “enumerations”, some of which are not easy-to-read text values.

Schema.org has lots of potential, but its current capabilities get over-hyped by some in the SEO community.  Just as schema.org metadata should not be considered structured data, it is not really the semantic web either.  It’s unable to make inferences, which was a key promise of the semantic web.  Its limitations show why content remains important. Google’s answer to the problem of how to make structured data relevant to people was the rich snippet.  Rich snippets displayed in Google search results are essentially a vanity statement. Sometimes these snippets answer the question, but other times they simply tease the user with related information.  Publishers and audiences alike may enjoy seeing an extract of content in search results, and certainly rich snippets are a positive development in search. But displaying extracts of information does not represent an achievement of the power of data.  A list of answers supplied by rich snippets is far less definitive than a list of answers supplied by a conventional structured query database — an approach that has been around for over three decades.

The value of data comes from its capacity to aggregate, manipulate and compare information relating to many items.  Data can be impactful when arranged and processed in ways that change an audience’s perception and understanding of a topic. Genuine data provides values that can be counted and transformed, something that schema.org doesn’t support very robustly, as previously mentioned.  Google’s snippets, when parsing metadata values from articles, simply display fragments  from individual items of content.  A list of snippets doesn’t really federate information from multiple sources into a unified, consolidated answer.  If you ask Google what store sells the cheapest milk in your city, Google can’t directly answer that question, because that information is not available as data that can be compared.  Information retrieval (locating information) is not the same as data processing (consolidating information).

“What is the point of all that data? A large data set is a product like any other. It must be maintained and updated, given attention. What are we to make of it?”  Paul Ford in “Usable Data

But let’s assume that we do have solid data that machines can process without difficulty.  Can that data provide audiences with what they need?  Is content unnecessary when the data is machine quality?  Some evidence suggests that even the highest quality linked data isn’t sufficient to interest audiences.

The museum sector has been interested in linked data for many years.  Unlike most web publishers, they haven’t been guided by schema.org and Google.  They’ve been developing their own metadata standards.  Yet this project has had its problems.  The data lead of a well known art museum complained recently of the “fetishization of Linked Open Data (LOD)”.  Many museums approached data as something intrinsically valuable, without thinking through who would use the data, and why.  Museums reasoned that they have lots of great content (their collections) and that they needed to provide information about their collections online to everyone, so that linked data was the way to do that.  But the author notes: ‘“I can’t wait to see what people do with our data” is not a clear ROI.’  When data is considered as the goal, instead of as a means to a goal, then audiences get left out of the picture.  This situation is common to many linked data projects, where getting data into a linked data structure becomes an all consuming end, without anchoring the project in audience and business needs.  For linked data to be useful, it needs to address specific use cases for people relying on the data.

Much magical thinking about linked data involves two assumptions: that the data will answer burning questions audiences have, and these answers will be sufficient to make explanatory content unnecessary.  When combined, these assumptions become one: everything you could possibly want to know is now available as a knowledge graph.

The promise that data can answer any question is animating development of knowledge graphs and “intelligent assistants” by nearly every big tech company: Google, Bing, LinkedIn, Apple, Facebook, etc.  This latest wave of data enthusiasm again raises questions whether content is becoming less relevant.

Knowledge graphs are a special form of linked data.  Instead of the data living in many places, hosted by many different publishers, the data is instead consolidated into a single source curated by one firm, for example, Bing. A knowledge graph combines millions of facts about all kinds of things into a single data set. A knowledge graph creator generally relies on other publisher’s linked data. But it assumes responsibility for validating that data itself when incorporating the information in its knowledge graph.  In principle, the information is more reliable, both factually and technically.

Knowledge graphs work best for persistent data (the birth year of a celebrity) but less well for high velocity data that can change frequently (the humidity right now).   Knowledge graphs can be incredibly powerful.  They can allow people to find connections between pieces of data that might not seem related, but are.  Sometimes these connections are simply fun trivia (two famous people born in the same hospital on the same day). Other times these connections are significant as actionable information.  Because knowledge graphs hold so much potential, it is often difficult to know how they can be used effectively.   Many knowledge graph use cases relate to open ended exploration, instead of specific tasks that solve well defined user problems.   Few people can offer a succinct, universally relevant reply to the question: “What problem does a knowledge graph solve?” Most of the success I’ve seen for knowledge graphs has been in specialized vertical applications aimed at researchers, such as biomedical research or financial fraud investigations.  To be useful to general audiences, knowledge graphs require editorial decisions that queue up on-topic questions, and return information relevant to audience needs and interests.  Knowledge graphs are less useful when they simply provide a dump of information that’s related to a topic.

Knowledge graphs combine aspects of Wikipedia (the crowdsourcing of data) with aspects of a proprietary gatekeeping platform such as Facebook (the centralized control of access to and prioritization of information).  No one party can be expected to develop all the data needed in a knowledge graph, yet one party needs to own the graph to make it work consistently — something that doesn’t always happen with linked data.   The host of the knowledge graph enjoys a privileged position: others must supply data, but have no guarantee of what they receive in return.

Under this arrangement, suppliers of data to a knowledge graph can’t calculate their ROI. Publishers are back in the situation where they must take a leap of faith that they’ll benefit from their effort.  Publishers are asked to supply data to a service on the basis of a vague promise that the service will provide their customers with helpful answers.  Exactly how the service will use the data is often not transparent. Knowledge graphs don’t reveal what data gets used, and when.   Publisher also know their rivals are also supplying data to the same graph.  The faith-based approach to developing data, in hopes that it will be used, has a poor track record.

The context of data retrieved from a knowledge graph may not be clear.  Google, Siri, Cortana, or Alexa may provide an answer.  But on what basis do they make that judgment?  The need for context to understand the meaning of data leads us back to content.   What a fact means may not be self-evident. Even facts that seem straightforward can depend on qualified definitions.

“A dataset precise enough for one purpose may not be sufficiently precise for another. Data on the Web may be wrong, or wrong in some context—with or without intent.” Bernstein, Hendler & Noy

The interaction between content and data is becoming even more consequential as the tech industry promotes services incorporating artificial intelligence.  In his book Free Speech, Timothy Garton Ash shared his experience using WolfamAlpha, a semantic AI platform that competes with IBM Watson, and that boldly claims to make the “world’s knowledge computable.”  When Ash asked WolfamAlpha “How free should speech be?”, it replied: “WolframAlpha doesn’t understand your query.”   This kind of result is entirely expected, but it is worth exploring why something billed as being smart fails to understand.  Conversational interfaces, after all, are promising to answer our questions.  Data needs to exist for questions to get answers.  For data to operate independently of content, an answer must be expressible as data. But many answers can’t be reduced to one or two values.  Sometimes they involve many values.  Sometimes answers can’t be expressed as a data value at all. This actuality means that content will always be necessary for some answers.

Data as a Bridge to Content

Data and content have different temperaments.  The role of content is often to lead the audience to reveal what’s interesting.  The role of data is frequently to follow the audience as they indicate their interests. Content and data play complementary roles.  Each can be incomplete without the other.

Content, whether articles, video or audio, is typically linear.  Content is meant to be consumed in a prescribed order.   Stories have beginnings and ends, and procedures normally have fixed sequences of steps.  Hyperlinking content provides a partial solution to making a content experience less linear, when that is desired.  Linear experiences can be helpful when audiences need orientation, but they are constraining when such orientation isn’t necessary.

Data, to be seen, must first be selected. Publishers must select what data to highlight, or they must delegate that task to the audience. Data is non-linear: it can be approached in any order.  It can be highly interactive, providing audiences with the ability to navigate and explore the information in any order, and change the focus of the information.  With that freedom comes the possibility that audiences get lost, unable to identify information of value.  What data means is highly dependent on the audience’s previous understanding.  Data can be explained with other data, but even these explanations require prior  knowledge.

From an audience perspective, data plays various roles.  Sometimes data is an answer, and the end of a task.  Sometimes data is the start of a larger activity.  Data is sometimes a signal that a topic should be looked at more closely.  Few people decide to see a movie based on an average rating alone.  A high rating might prompt someone to read about the film.  Or the person may be already be interested in reading about the film, and consults the average rating simply to confirm their own expectation of whether they’d like it.  Data can be an entryway into a topic, and a point of comparison for audiences.

Writers can undervalue data because they want to start with the story they wish to tell, rather than the question or fact that prompts initial interest from the audience.   Audiences often begin exploration by seeking out a fact. But what that fact may be will be different according to each individual.  Content needs facts to be discovered.

Data evangelists can undervalue content because they focus on the simple use cases, and ignore the messier ones.  Data can answer questions only in some situations.  In an ideal world, a list of questions and answers get paired together as data. Just match the right data with the right question.  But audiences may find it difficult to articulate the right question, or they may not know what question to ask. Audiences may find they need to ask so many specific questions to develop a broad understanding.  They may find the process of asking questions exhausting.  Search engines and intelligent agents aren’t going to Socratically enlighten us about new or unfamiliar topics.  Content is needed.

Ultimately, whether data or content is most important depends on how much communication is needed to support the audience.  Data supplies answers, but doesn’t communicate ideas.  Content communicates ideas, but can fail to answer if it lacks specific details (data) that audiences expect.

No bold line divides data from content.  Even basic information, such as expressing how to do something, can be approached either episodically as content, or atomically as data.  Publishers can present the minimal facts necessary to perform a task (the must do’s), or they can provide a story about possibilities of tasks to do (the may do’s).  How should they make that decision?

In my experience, publishers rarely create two radically alternative versions of online information, a data-centric and content-centric version, and test these against each other to see which better meets audience needs.  Such an approach could help publishers understand what the balance between content and data needs to be.  It could help them understand how much communication is required, so the information they provide is never in the way of the audience’s goals.

— Michael Andrews

The post Should Information be Data-Rich or Content-Rich? appeared first on Story Needle.

TV News Record: North Korea plus Vox on Fox

Internet Archive - 11 augustus 2017 - 4:42pm

A weekly round up on what’s happening and what we’re seeing at the TV News Archive by Katie Dahl and Nancy Watzman. Additional research by Robin Chin.

This week we look at how different cable networks explained newly inflamed U.S.-North Korea tensions. Which channel seemed to repeat a particular phrase, like “fire and fury” the most in the last few days?  What did fact-checking partners have to report on President Donald’s Trump’s tweeted threat against North Korea? Plus: a Vox analysis of Fox based on TV News Archive closed captioning data.

“Fire and fury” popular on CNN

Over a 72-hour-period, CNN mentioned President Donald Trump’s “fire and fury”  threat against North Korea more than other major cable networks, according to a search on the Television Explorer, a tool created by data scientist Kalev Leetaru and powered by TV News Archive data.

“fire and fury” search 819am MST 8.11.17


Morning show reactions day after “fire and fury” statement

While a Fox & Friends host Brian Kilmeade said President Trump was “right on target” with his threat against North Korea, a Fox Business Network morning show hosted Center for National Interest’s Harry Kazianis who blamed former President Barack Obama for the current U.S.-North Korea tensions. Meanwhile, host Lauren Simonetti and showed viewers a map of potential trajectories of missiles from North Korea to the continental U.S., saying “you can see they have the ability to strike major cities, including New York City and Washington, D.C..”

On a BBC morning show, the PC Agency CEO Paul Charles said President Trump “is talking like a dictator himself to some extent,” and offered his opinion on the geopolitical context, saying “it’s in their [China’s] own interest to try and find some territorial gain in the region, so I’m not convinced China can the answer.”

C-SPAN aired footage of an interview with Secretary of State Rex Tillerson in which he said “I do not believe there is any imminent threat” and that though he was on his way to Guam which North Korea said it was targeting, he “never considered rerouting.”

A CNN morning show had a panel of guests from all over the world, giving them an opportunity to share perspectives from those locations, including CNN international correspondent Will Ripley reporting from Beijing that there is “increasing concern that an accidental war could break out on the Korean Peninsula,” CNN international correspondent Alexandria Fields reporting that people in South Korea “know that a war of words can lead to a mistake and that’s the fear; that’s the fear and that’s what can cause conflict… You’ve got more than 20 million people in the wider Seoul metropolitan area.” CNN military and diplomatic analyst Rear Admiral John Kirby offered his perspective that “when the president reacts the way he does, he reinforces Kim’s propaganda that it is about the United States and regime change. He’s actually working to isolate us rather than North Korea from the international community.”



Vox on Fox; used TV News Archive data used to reveal shift in “Fox & Friends”

Vox reporter Alvin Chang used closed captioning data of “Fox & Friends” from the TV News Archive for his analysis showing that “the program is in something of a feedback loop with the president.” He spoke about his work on CNN, saying hosts of the Fox show “seem to know that the president is listening” and “instruct or advise the president, and they’ve done it increasingly more since his election.”



Fact-check: US nuclear arsenal now stronger than ever before because of the president’s actions (false)

On Wednesday, President Trump tweeted, “My first order as president was to renovate and modernize our nuclear arsenal. It is now far stronger and more powerful than ever before.”

“False,” reported PolitiFact’s Louis Jacobson, writing, “[T]his wasn’t Trump’s first order as president” and his executive order was “not unusual.” He quoted Harvard nuclear-policy expert, Matthew Bunn: “There is a total of nothing that has changed substantially about the U.S. nuclear arsenal over the few months that Trump has been in office. We have the same missiles and bombers, with the same nuclear weapons, that we had before.”

Over at FactCheck.org, Eugene Kiely quoted Hans M. Kristensen, director of the Nuclear Information Project at the Federation of American Scientists: “The renovation and modernization of the arsenal that is going on now is all the result of decisions that were made by the Obama administration,’ ”

Glenn Kessler reported for the Washington Post’s Fact Checker that the president’s tweet was “misleading Americans” and gave him “four Pinocchios.”

Fact-check: American workers were left behind after “buy American steel” bill failed (spins the facts)

In the Democratic weekly address, Sen. Tammy Baldwin, D., Wis., said, “My Buy America reform passed the Senate with bipartisan support. But when it got to the House, the foreign steel companies bought Washington lobbyists to kill it. Paul Ryan and Mitch McConnell gave them what they wanted, and American workers were left behind again.”

“Baldwin’s bill would have required U.S. steel to be used on projects funded by the Drinking Water State Revolving Fund. It didn’t pass, but a separate provision in a water infrastructure bill that became law last year does exactly that for fiscal 2017. In fact, Congress has imposed the same buy American provision for drinking water projects every year since fiscal 2014,” reported Eugene Kiely for FactCheck.org.

Fact-checkers have been busy checking recent Trump comments, including these from W. Virginia and  Youngstown, OH rallies, and the speech he gave to the Boy Scouts.

To receive the TV News Archive’s email newsletter, subscribe here.

HyperCard On The Archive (Celebrating 30 Years of HyperCard)

Internet Archive - 11 augustus 2017 - 2:00am

On August 11, 1987, Bill Atkinson announced a new product from Apple for the Macintosh; a multimedia, easily programmed system called HyperCard. HyperCard brought into one sharp package the ability for a Macintosh to do interactive documents with calculation, sound, music and graphics. It was a popular package, and thousands of HyperCard “stacks” were created using the software.

Additionally, commercial products with HyperCard at their heart came to great prominence, including the original Myst program.

Flourishing for the next roughly ten years, HyperCard slowly fell by the wayside to the growing World Wide Web, and was officially discontinued as a product by Apple in 2004. It left behind a massive but quickly disappearing legacy of creative works that became harder and harder to experience.

To celebrate the 30th anniversary of Hypercard, we’re bringing it back.

After our addition of in-browser early Macintosh emulation earlier this year, the Internet Archive now has a lot of emulated Hypercard stacks available for perusal, and we encourage you to upload your own, easily and quickly.

If you have Hypercard stacks in .sit, .bin.hqx, and other formats, visit this contribution site to have your stack added quickly and easily to the Archive: http://hypercardonline.tk

This site, maintained by volunteer Andrew Ferguson, will do a mostly-automatic addition of your stack into the Archive, including adding your description and creating an automatic screenshot. Your cards shall live again!

Along with access to the original HyperCard software in the browser, the Archive’s goal of “Access to ALL Knowledge” means there’s many other related items to the Hypercard programs themselves, and depending on how far you want to dig, there’s a lot to discover.

There are entire books written about Hypercard, of course – for example, The Complete Hypercard Handbook (1988) and the Hypercard Developers’ Guide (1988), which walk through the context and goals of Hypercard, and then the efforts to program in it.

If you prefer to watch video about Hypercard, the Archive has you covered as well. Here’s an entire episode about Hypercard. As the description indicates: “Guests include Apple Fellow and Hypercard creator Bill Atkinson, Hypercard senior engineer Dan Winkler, author of “The Complete Hypercard Handbook” Danny Goodman, and Robert Stein, Publisher of Voyager Company. Demonstrations include Hypercard 1.0, Complete Car Cost Guide, Focal Point, Laserstacks, and National Galllery of Art.”

Our goal to bring historic software back to a living part of the landscape continues, so feel free to dig in, bring your stacks to life, and enjoy the often-forgotten stacks of yore.

Statement and Questions Regarding an Indian Court’s Order to Block archive.org

Internet Archive - 9 augustus 2017 - 10:50pm

After multiple attempts to contact the relevant authorities in the Indian government regarding the recent blocking of archive.org in that country, the Internet Archive received a response early this morning indicating that two court orders (here and here) were the source of the block. The orders identify a list of thousands of websites to be blocked for allegedly making available two separate films, “Lipstick Under my Burkha” and “Jab Harry Met Sejal”. Both orders come from the same judge of the High Court of Madras (civil jurisdiction). According to many reports, http://archive.org is blocked, but https is not.

Even beyond the fundamental and major problems with preemptively blocking a site for users’ submissions and content of which it is unaware, there are serious issues with these orders. We would like to know the following:

1. Is the Court aware of and did it consider the fact that the Internet Archive has a well-established and standard procedure for rights holders to submit take down requests and processes them expeditiously? We find several instances of take down requests submitted for one of the plaintiffs, Red Chillies Entertainments, throughout the past year, each of which were processed and responded to promptly.

2. After a preliminary review, we find no instance of our having been contacted by anyone at all about these films. Is there a specific claim that someone posted these films to archive.org? If so, we’d be eager to address it directly with the claimant.

3. Archive.org is included along with thousands of other websites in a list (entitled “Non-Compliant Sites”) to block for allegedly making available the two films of concern. The only URL the list identifies pertaining to us is “https://archive.org”. There are no specific URLs for alleged locations of films on the site, only the full domain. Was there any attempt to exercise any level of review or specificity beyond “archive.org”?

All in all, this is a very worrying development and is part of a harmful pattern of governments increasingly taking web content (and in many cases entire sites) offline in unpredictable and excessive ways. We have seen reports from scholars and academics that this block has disrupted their work. We hope full access to archive.org will be restored quickly.

McConnell, Schumer, Ryan, Pelosi fact-checked clips featured in new TV News Archive collections

Internet Archive - 3 augustus 2017 - 5:32pm

Today the Internet Archive’s TV News Archive unveils growing TV news collections focused on congressional leadership and top Trump administration officials, expanding our experimental Trump Archive to other newsworthy government officials. Together, all of the collections include links to more than 1,200 fact-checked clips–and counting–by our national fact-checking partners, FactCheck.org, PolitiFact, and The Washington Post‘s Fact Checker.

These experimental video clip collections, which contain more than 3,500 hours of video, include archives focused on Senate Majority Leader Mitch McConnell, R., Ky.; Sen. Minority Leader Charles (“Chuck”) Schumer, D., N.Y.; House Speaker Paul Ryan, R., Wis.; and House Minority Leader, Nancy Pelosi, D., Calif., as well as top Trump officials past and present such as Secretary of State Rex Tillerson and former White House Press Secretary Sean Spicer.

Download a csv of fact-checked video statements or see all the fact-checked clips.

Visit the U.S. Congress archive.

Visit the Executive Branch archive.

Visit the Trump Archive.

We created these largely hand-curated collections as part of our experimentation in demonstrating how Artificial Intelligence (AI) algorithms could be harnessed to create useful, ethical, public resources for journalists and researchers in the months and years ahead. Other experiments include:

  • the Political TV Ad Archive, which tracked airings of political ads in the 2016 elections by using the Duplitron, an open source audio fingerprinting tool;
  • the Trump Archive, launched in January;
  • Face-O-Matic, an experimental Slack app created in partnership with Matroid that uses facial detection to find congressional leaders’ faces on TV news. Face-O-Matic has quickly proved its mettle by helping our researchers find clips suitable for inclusion in the U.S. Congress Archive; future plans include making data available in CSV and JSON formats.
  • in the works: TV Architect Tracey Jaquith is experimenting with detection of text in the chyrons that run on the bottom third of cable TV news channels. Stay tuned.

Red check mark shows there’s a fact-check in this footage featuring House Minority Leader Nancy Pelosi, D., Calif. Follow the link below the clip to see the fact-check, in this case by The Washington Post’s Fact Checker.

At present, our vast collection of TV news –1.4 million shows collected since 2009–is searchable via closed-captioning. But closed captions, while helpful, can’t help a user find clips of a particular person speaking; instead, when searching a name such as “Charles Schumer” it returns a mix of news stories about the congressman, as well as clips where he speaks at news conferences, on the Senate floor, or in other venues.

We are working towards a future in which AI enrichment of video metadata will more precisely identify for fact-checkers and researchers when a public official is actually speaking, or some other televised record of that official making an assertion of fact. This could include, for example, camera footage of tweets.

Such clips become a part of the historical record, with online links that don’t rot, a central part of the Internet Archive’s mission to preserve knowledge. And they can help fact-checkers decide where to concentrate their efforts, by finding on-the-record assertions of fact by public officials. Finally, these collections could prove useful for teachers, documentary makers, or anybody interested in exploring on-the-record statements by public officials.

For example, here are two dueling views of the minimum wage, brought to the public by McConnell and Schumer.

In this interview on Fox News in January 2014, McConnell says, “The minimum wage is mostly an entry-level wage for young people.” PolitiFact’s Steve Contorno rated this claim as “mostly true.” While government statistics do show that half of the people making the minimum wage are young, 20 percent are in their late 20s or early 30s and another 30 percent are 35 or older. Contorno also points out that it’s a stretch to call these jobs “entry-level,” but rather are “in the food or retail businesses or similar industries with little hope for career advancement.”

Schumer presents a different assertion on the minimum wage, saying on “Morning Joe” in May 2014 that with a rate of $10.10/hour “you get out of poverty.” PolitiFact’s Louis Jacobson rated this claim as “half true”: “Since the households helped by the $10.10 wage account for 46 percent of all impoverished households, Schumer is right slightly less than half the time.”

These new collections reflect the hard work of many at the Internet Archive, including Robin Chin, Katie Dahl, Tracey Jaquith, Roger MacDonald, Dan Schultz, and Nancy Watzman.

As we move forward, we would love to hear from you. Contact us with questions, ideas, and concerns at tvnews@archive.org. And to keep up-to-date with our experiments, sign up for our weekly TV News Archive newsletter.

 

Canadian Library Consortia OCUL and COPPUL Join Forces with Archive-It to Expand Web Archiving in Canada

Internet Archive - 2 augustus 2017 - 8:20pm

The Council of Prairie and Pacific University Libraries (COPPUL) and the Ontario Council of University Libraries (OCUL) have joined forces in a multi-consortial offering of Archive-It, the web archiving service of the Internet Archive. Working together, COPPUL and OCUL are considering ways that they can significantly expand web archiving in Canada.

A coordinated subscription to Archive-It builds on the efforts of Canadian universities that have developed web archiving programs over the years, and the past work of Archive-It with both COPPUL and OCUL members.  With 12 COPPUL members and 12 OCUL members (more than half the total membership) now subscribing to Archive-It, there is an opportunity to build a foundation for further collaboration supporting research services and other digital library initiatives. In addition, participation by so many libraries helps lower the barrier of entry for additional member institutions to join in web archiving efforts across Canada.

“OCUL is very pleased to be able to offer Archive-It to our members,” said Ken Hernden, University Librarian at Algoma University and OCUL Chair. “Preservation of information and research is an important aspect of what libraries do to benefit scholars and communities. Preserving information for the future was challenging in a paper-and-print environment. It has become even more so in the digital information environment. We hope that enabling access to this tool will help build capacity for web archiving across Ontario, and beyond.”

“Tools like Archive-It enable libraries and archives of all sizes to build news kinds of collections to support their communities in an environment where more and more of our cultural memory has moved online. We’re absolutely thrilled to be working with our OCUL colleagues in this critically important area,” said Corey Davis, COPPUL Digital Preservation Network Coordinator.

“Archive-It is excited to ramp up its support for web archiving in Canada. The joint subscription is a strategic and cost-effective way to expand web archiving among Canadian universities and to encourage participation from smaller universities who may not have felt they had the institutional resources to develop a web archiving program without the support of the consortiums.” said Lori Donovan, Senior Program Manager for Archive-It.

OCUL is a consortium of Ontario’s 21 university libraries. OCUL provides a range of services to its members, including collection purchasing and a shared digital information infrastructure, in order to support to support high quality education and research in Ontario’s universities. In 2017, OCUL commemorates its 50th anniversary.

Working together, COPPUL members leverage their collective expertise, resources, and influence, increasing capacity and infrastructure, to enhance learning, teaching, student experiences and research at our institutions. The consortium comprises 22 university libraries located in Manitoba, Saskatchewan, Alberta and British Columbia, as well as 15 affiliate members across Canada. First deployed in 2006, Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of web-published digital content.

Additionally, the recently created Canadian Web Archiving Coalition (CWAC) will help build a community of practice for Canadian organizations engaging in web archiving and create a network for collaboration, support, and knowledge sharing. Under the auspices of Canadian Association of Research Libraries (CARL) and in collaboration with Library and Archives Canada (LAC), the CWAC plans to hold an inaugural meeting in conjunction with the Internet Preservation Coalition General Assembly this September at LAC’s Preservation Centre in Gatineau, QC.  For more information about the CWAC, including how to join, please contact corey@coppul.ca.

For more information on the consortial subscription, contact carol@coppul.ca or jacqueline.cato@ocul.on.ca or lori@archive.org.

Using Kakadu JPEG2000 Compression to Meet FADGI Standards

Internet Archive - 31 juli 2017 - 11:00pm

The Internet Archive is grateful to the folks at Kakadu Software for contributing to Universal Access to Knowledge by providing the world’s leading implementation of the JPEG2000 standard, used in the Archive’s image processing systems.

Here at the Archive, we digitize over a thousand books a day. JPEG2000, an image coding system that uses compression techniques based on wavelet technology, is a preferred file format for storing these images efficiently, while also providing advantages for presentation quality and metadata richness. The Library of Congress has documented its adoption of the JPEG2000 file format for a number of digitization projects, including its text collections on archive.org.

Recently we started using their SDK to apply some color corrections to the images coming from our cameras. This has helped us achieve FADGI standards in our work with the Library of Congress.

Thank you, Kakadu, for helping make it possible for millions of books to be digitized, stored, and made available with high quality on archive.org!

If you are interested in finding out more about Kakadu Software’s powerful software toolkit for JPEG2000 developers, visit kakadusoftware.com or email info@kakadusoftware.com.

TV News Record: McCain returns to vote, Spicer departs

Internet Archive - 28 juli 2017 - 7:53pm

A weekly round up on what’s happening and what we’re seeing at the TV News Archive by Katie Dahl and Nancy Watzman. Additional research by Robin Chin.

Last week, Sean Spicer left his White House post and Anthony Scaramucci, the new communications director, made his mark; Sen. John McCain, R., Ariz., returned to the Senate floor to debate–and cast a deciding vote on–health care reform; and fact-checkers examined claims about Trump’s off-the-record meeting with Russian President Vladimir Putin, and more.

McCain shows up in D.C. – and on Face-O-Matic

Last week, after we launched Face-O-Matic, an experimental Slack app that recognizes the faces of top public officials when they appear on TV news, we received a request from an Arizona-based journalism organization to track Sen. John McCain, R., Ariz.. Soon after we added the senator’s visage to Face-O-Matic, we started getting the alerts.

News anchors talked about how McCain’s possible absence because of his brain cancer diagnosis could affect upcoming debates and votes on health care.

Reporters gave background on how the Senate has dealt with absences due to illness in the past.

Pundits discussed McCain’s character, and his daughter provided a “loving portrait.” Then coverage shifted to report the senator’s return to Washington, and late last night his key no vote on the “skinny” health care repeal.



White House: Spicer out, Scaramucci in 

After Sean Spicer resigned as White House communications director, Fox News and MSNBC offered reviews of his time at the podium.

On Fox News, Howard Kurtz introduced Spicer as someone “long known to reporters as an affable spokesman; he became the president’s pit bull,” and went on to give a run-down of his controversial relationship with the press. The conclusion, “He lasted exactly, six months.”

MSNBC offered a mashup of some of Spicer’s most famous statements. These include: “This was the largest audience to ever witness an inauguration, period, both in person and around the globe,” and “But you had a – you know, someone who is as despicable as Hitler who didn’t even sink to using chemical weapons.”

Late this week, Ryan Lizza published an article in The New Yorker based on a phone call he received from the new White House communications director, Anthony Scaramucci, in which the new White House communications director used profanity to describe other members of the White House staff he accused of leaking information. That article soon became fodder for cable TV.



Schumer, Ryan weigh in on Mueller

As Special Counsel Robert Mueller widens his investigation into Russian interference in U.S. elections, speculation is running high on TV news that President Donald Trump might fire him.

Fox News ran a clip of Senate Minority Leader Chuck Schumer, D., NY., saying, “I think it would cause a cataclysm in Washington.”

MSNBC ran a radio clip from House Speaker Paul Ryan, R., Wis.:  “I don’t think many people are saying Bob Mueller is a person who is a biased partisan. We have an investigation in the House, an investigation in the Senate, and a special counsel which sort of depoliticizes this stuff and gets it out of the political theater.”



Fact-check: Transgender people in the military would lead to tremendous medical costs and disruption (lacks context)

In a series of tweets this week, President Trump wrote, “After consultation with my Generals and military experts, please be advised that the United States Government will not accept or allow… Transgender individuals to serve in any capacity in the U.S. Military. Our military must be focused on decisive and overwhelming… victory and cannot be burdened with the tremendous medical costs and disruption that transgender in the military would entail. Thank you.”

For FactCheck.org, Eugene Kiely reported, “Although Trump described the cost as ‘tremendous,’ RAND estimated that providing transition-related health care would increase the military’s health care costs for active-duty members ‘by between $2.4 million and $8.4 million annually.’ That represents an increase of no more than 0.13 percent of the $6.27 billion spent on the health of active-duty members in fiscal 2014.”



Fact Check: Nixon held meetings with heads of state without an American interpreter (true)

Speaking on “The Rachel Maddow Show,” Ian Bremmer, president of the Eurasia Group, said:  “Apparently, President Nixon used to do it because he felt, didn’t really trust the State Department, at that point, providing the translators and didn’t necessarily want information getting out, leaking, that he would want to keep private.”

“True,” wrote Joshua Gillan for PolitiFact: “Presidential historians, historical accounts and Nixon’s own memoir show this was the case. But it’s notable that even in the example most comparable to Trump’s meeting with Putin, when Nixon used only a Soviet translator during two meetings with Brezhnev, official records of the meeting exist.”



Fact-check: Allowing insurers to sell plans across state lines will mean premiums go down 60-70% (no evidence)

Not long before the Senate took up health care reform, President Donald Trump said “We’re putting it [allowing insurers to sell plans across state lines] in a popular bill, and that will come. And that will come, and your premiums will be down 60 and 70 percent.”

FactCheck.org’s Lori Robertson reported the “National Association of Insurance Commissioners — a support organization established by the country’s state insurance regulators — said the idea that cross-state sales would bring about lower premiums was a ‘myth.’”



Fact-Check: When the price for oil goes up, it goes up, and never goes down (false)

In an interview Sunday about the new Democratic Party national agenda, Senate Minority Leader Chuck Schumer, D., N.Y., said, “We have these huge companies buying up other big companies. It hurts workers and it hurts prices. The old Adam Smith idea of competition, it’s gone. So people hate it when their cable bills go up, their airline fees. They know that gas prices are sticky. You know … when the price for oil goes up on the markets, it goes right up, but it never goes down.”

For PolitiFact, Louis Jacobson reported, “This comment takes a well-known phenomenon and exaggerates it beyond recognition. While experts agree that prices tend to go up quickly after a market shock but usually come down more slowly once the shock is resolved, this phenomenon only occurs on a short-term basis – a couple of weeks in most cases.”

To receive the TV News Archive’s email newsletter, subscribe here.

You’re Invited to a Community Screening of PBS series, AMERICAN EPIC: Sunday July 30 & Aug 6

Internet Archive - 27 juli 2017 - 2:24am

In celebration of the launch of the “Great 78 Project” the Internet Archive is sponsoring a Community Screening of the PBS documentary series “American Epic”, an inside look at one of the greatest-ever untold stories: how the ordinary people of America were given the opportunity to make 78 records for the first time.

“Without the recording lathe, Willie Nelson would have never heard the Carter Family sing. Neither would Merle Haggard or Johnny Cash. These portable machines toured the country in the 1920s, visiting rural communities like Poor Valley, West Virginia, and introducing musicians like the Carter Family to new audiences. This remarkable technology forever changed how people discover and share music, yet it was almost lost to history until music legend T Bone Burnett and a few friends decided to bring it back.” Charlie Locke – WIRED

The program will be introduced by Brewster Kahle of the Internet Archive.

Please RSVP on our free Eventbrite page.

Date: Sunday July 30th  – “The Big Bang” (:54 min) & “Blood and Soil” (:54 min)

Date: Sunday August 6th  – “Out of the Many, the One” (1:24min)

Time: Doors Open at 6:30 pm – Screening(s) at 7:00 pm

Cost: FREE and open to the public

Where: Internet Archive Headquarters 300 Funston Avenue, San Francisco, CA

“American Epic” Teaser: https://youtu.be/jcbATyomETw

Content & Decisions: A Unified Framework

Story Needle - 25 juli 2017 - 8:36am

Many organizations face a chasm between what they say they want to do, and what they are doing in practice.  Many say they want to transition toward digital strategy.  In practice, most still rely on measuring the performance of individual web pages, using the same basic approach that’s been around for donkey’s years. They have trouble linking the performance of their digital operations to their high level goals. They are missing a unified framework that would let them evaluate the relationship between content and decisions.

Why is a Unified Framework important?

Organizations, when tracking how successful they are doing, tend to focus on web pages: abandonment rates, clicks, conversions, email opening rates, likes, views, and so on. Such granular measurements don’t reveal the bigger picture of how content is performing within the publishing organization. Even multi-page measurements such as funnels are little more than an arbitrary linking of discrete web pages.

Tracking the performance of specific web pages is necessary, but not sufficient. But because each page is potentially unique, summary metrics of different pages don’t explain variations in performance.   Page-level metrics tell how specific pages perform, but they don’t address important variables that transcend different pages, such as which content themes are popular, or which design features are being adopted.

Explaining how content fits into digital business strategy is a bit like trying to describe an elephant without being able to see the entire animal. Various people within an organization focus on different digital metrics. How all these metrics interact gets murky.  Operational staff commonly track lower level variables about specific elements or items. Executives track metrics that represent higher level activities and events, which have resource and revenue implications that don’t correspond to specific web pages.

Metadata can play an important role connecting information about various activities and events, and transcend the limitations of page-level metrics.  But first, organizations need a unified framework to see the bigger picture of how their digital strategy relates to their customers.

Layers of Activities and Decisions

To reveal how content relates to other decisions, we need to examine content at different layers. Think of these layers as a stack. One layer consists of the organization publishing content.  Another layer comprises the customers of the organization, the users of the organization’s content and products.  At the center is the digital interface, where organizations interact with their users.

We also need to identify how content interacts with other kinds of decisions within each layer.  Content always plays a supporting role.  The challenge is to measure how good a job it is doing supporting the goals of various actors.

Diagram showing relationships between organizations, their digital assets, and users/customers, and the interaction between content and platforms..

First let’s consider what’s happening within the organization that is publishing content.  The organization makes business decisions that define what the business sells to its customers, and how it services its customers.  Content needs to support these decisions.  The content strategy needs to support the business strategy.  As a practical matter, this means that the overall publishing activity (initiatives, goals, resources) needs to reflect the important business decisions that executives have made about what to emphasize and accomplish.  For example, publishing activity would reflect marketing priorities, or branding goals.  Conversely, an outsider could view the totality of an organization’s content, by viewing their website, and should get a sense of what’s important to that organization.  Publishing activity reveals an organization’s brand and priorities.

The middle layer is composed of assets that the organization has created for their customers to use.  This layer has two sides: the stock of content that’s available, and digital platforms customers access.  The stock of content reflects the organization’s publishing activity .  The digital platforms reflect the organization’s business decisions.  Digital platforms are increasingly an extension of the products and services the organization offers.  Customers need to access the digital platforms to buy the product or service, to use the product or service, and to resolve any problems after purchase.  Content provides the communications that customers need to access the platform.  Because of this relationship, the creation of content assets and the designs for digital platforms are commonly coordinated during their implementation.

Within the user layer, the customer accesses content and platforms.  They choose what content to view, and make decisions about how to buy, use, and maintain various products and services.  The relationship between content activity and user decisions is vital, and will be discussed shortly.  But its importance should not overshadow the influence of the other layers.  The user layer should not be considered in isolation from other decisions and activities that an organization has made.

Feedback loops Between and Within Layers

Let’s consider how the layers interact.  Each layer has a content dimension, and a platform dimension, at opposite ends.  Content dimensions interact with each other within feedback loops, as do platform dimensions.  The content and platform dimensions ultimately directly interact with each other in a feedback loop within the user layer.

On the content side, the first feedback loop, the publishing operations loop, relates to how publishing activity affects the stock of content.  The organization decides the broad direction of its publishing. For many organizations, this direction is notional, but more sophisticated organizations will use structured planning to align their stock of content with the comprehensive goals they’ve set for the content overall.  This planning involves not only the creation of new content, but the revision of the existing stock of content to reflect changes in branding, marketing, or service themes.   The stock of content evolves as the direction of overall publishing activity changes.  At the same time, the stock of content reflects back on the orientation of publishing activity.  Some content is created or adjusted outside of a formal plan.  Such organic changes may be triggered in response to signals indicating how customers are using existing content. Publishers can compare their plans, goals, and activities, with the inventory of content that’s available.

The second content feedback loop, the content utilization loop, concerns how audiences are using content.  Given a stock of content available, publishers must decide what content to prioritize.  They make choices concerning how to promote content (such as where to position links to items), and how to deliver content (such as which platforms to make available for customers to access information).  At the same time, audiences are making their own choices about what content to consume.  These choices collectively suggest preferences of certain kinds of content that are available within the stock of content.

When organizations consider the interaction between the two loops of feedback, they can see the connection between overall publishing activity, and content usage activity.  Is the content the organization wants to publish the content that audiences want to view?

Two feedback loops are at work on the platform side as well.  The first, the business operations loop, concerns how organizations define and measure goals for their digital platforms.  Product managers will have specific goals, reflecting larger business priorities, and these goals get embodied in digital platforms for customers to access.  Product metrics on how customers access the platform provide feedback for adjusting goals, and inform the architectural design of platforms to realize those goals.

The second platform loop, the design optimization loop, concerns how the details of platform designs are adjusted.  For example, designs may be composed of different reusable web components, which could be tied to specific business goals.  Design might, as an example, feature a chatbot that provides a cost savings or new revenue opportunity. The design optimization loop might look at how to improve the utilization of that chatbot functionality.  How users adopt that functionality will influence the optimization (iterative evolution) of its design. The architectural decision to introduce a chatbot, in contrast, would have happened within the business operations loop.

As with the content side, the two feedback loops on the platform side can be linked, so that the relationship between business decisions and user decisions is clearer.  User decisions may prompt minor changes within the design optimization loop, or if significant, potentially larger changes within the business operations loop.  Like content, a digital platform is an asset that requires continual refinement to satisfy both user and business goals.

The two parallel sides, content and design, meet at the user layer.  User decisions are shaped both by the design of the platforms they are accessing, as well as content they are consuming while on the platform.  Users need to know what they can do, and want to do it.  Designs need to support users access to content they need when making a decision. That content needs to provide users with the knowledge and confidence for their decision.

The relationship between content and design can sometimes seem obvious when looking at a web page.  But in cases where content and design don’t support each other, web pages aren’t necessarily the right structure to fix problems.  User experiences can span time and devices.  Some pages will be more about content, and other pages more about functionality. Relevant content and functionality won’t always appear together.  Both content and designs are frequently composed from reusable components.  Many web pages may suffer from common problems stemming from faulty components, or the wrong mix of components. The assets (content and functionality) available to customers may be determined by upstream decisions that can’t be fixed on a page level. Organizations need ways to understand larger patterns of user behavior, to see how content and designs support each other, or fail to.

Better Feedback

Content and design interact across many layers of activities and decisions. Organizations must first decide what digital assets to create and offer customers, and then must refine these so that they work well for users.  Organizations need more precise and comprehensible feedback on how their customers access information and services.  The content and designs that customers access are often composed from reusable components that appear in different contexts. In such cases, page-level metrics are not sufficient to provide situational insights.  Organizations need usage feedback that can be considered at the strategic layer.  They need the ability to evaluate global patterns of use to identify broad areas to change.

In a future post, I will draw on this framework to return to the topic of how descriptive, structural, technical and administrative metadata can help organizations develop deeper insights into the performance of both their content and their designs.  If you are not already familiar with these types of metadata, I invite you to learn about them in my recent book, Metadata Basics for Web Content, available on Amazon.

— Michael Andrews

The post Content & Decisions: A Unified Framework appeared first on Story Needle.

Internet Archive Artist in Residence Exhibition — August 5–26

Internet Archive - 21 juli 2017 - 7:16pm

By Amir Esfahani

Ever Gold [Projects] is pleased to present The Internet Archive’s 2017 Artist in Residence Exhibition, an exhibition conceived in collaboration with the Internet Archive presenting the culmination of the first year of the Internet Archive’s visual arts residency program, featuring work by artists Laura Hyunjhee Kim, Jeremiah Jenkins, and Jenny Odell.

The Internet Archive visual arts residency is organized by Amir Saber Esfahani, and is designed to connect emerging and mid-career artists with the archive’s collections and to show what is possible when open access to information meets the arts. The residency is one year in length during which time each artist will develop a body of work that culminates in an exhibition utilizing the resource of the archive’s collection in their own practice.

During the residency Kim, Jenkins, and Odell worked with specific aspects of the Internet Archive, both at its Bay Area facilities and remotely in their studios, producing multi-media responses that employ various new media as well as more traditional materials and practices.

Public Programming: Saturday, August 5th, 4-5pmBrewster Kahle, Founder & Digital Librarian, Internet Archive, in conversation with Laura Hyunjhee Kim and Jeremiah Jenkins. Moderated by Andrew McClintock, Owner/Director of Ever Gold [Projects].
Opening Reception: Saturday, August 5th, 5-8pm
Location: Ever Gold [Projects] 1275 Minnesota St
Exhibit Dates: Aug 5–26, 2017

Jenny Odell: “For my projects, I’m extracting “specimens” from 1980s Byte magazines and animation demo reels—specimens being objects or scenes that are intentionally or unintentionally surreal. These collected and isolated images inadvertently speak volumes about some of the stranger and more sinister aspects that technology has come to embody.”

Jeremiah Jenkins: “Browser History is a project is about preserving the Internet for the very distant future. I will be transferring webpages from the Internet Archive and elsewhere onto clay tablets by creating stamps with the text and images, then pressing them into wet clay. After being fired, the slabs will be hidden in caves, buried strategically, and submerged in the sea to await discovery in the distant future. The oldest known clay tablet is a little over 4,000 years old. The cave paintings in Lascaux are around 14,000 years old. The oldest known petroglyphs are near 46,000 years old. It’s conceivable that these fired clay tablets could last for 50,000 years or more. The tablets will be pages from websites that document trade, lifestyle, art, government, and other aspects of our society that are similar to the kinds of information we have about ancient civilizations.”

Laura Hyunjhee Kim: “The Hyper Future Wave Machine is a project that positions the years 2017 and beyond as a speculative future based on audiovisual ephemera published in the years 1987 to 1991. Born in the late ’80s, I wanted to explore the technological advancements and innovations that were popularized during the nascent years of the World Wide Web. Utilizing the Internet Archive as a time machine, I searched through the archived commercial and educational media representations of networked technology, personalized computers, and information systems. Often hyperbolic with a heightened emphasis on speed, power, and the future, slogans from those past years are still relevant and surface aspirations that continue to introduce the “next big thing” to the present generation: “REALIZE THE FUTURE, YOU ALREADY LIVE IN.” As the title of the project suggests, the work revolves around an imaginary media access system, namely the Hyper Future Wave Machine (HFWM). Described as a three-way-cross-hybrid existing/nonexistent/and-yet-to-exist metaphysical machine, the concept came from contemplating data portability and the trajectory of human-machine interface technology that seamlessly minimizes physical interaction. From buttons to touchscreens to speech, would the next ubiquitously applied interface operate using some sort of nonverbal neural command?”

TV News Record: adventures with Face-O-Matic

Internet Archive - 21 juli 2017 - 3:31pm

A weekly round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman

This week we bring you adventures with Face-O-Matic; fact-checks on President Donald Trump’s legislative record and on health care reform; and we follow the TV on use of the term “lies” and “lying.”

Here’s a face, there’s a face, everywhere a face…

Face-O-Matic, our new experimental Slack app that finds faces of political leaders on major national cable networks, has given us a whole new perspective on how imagery is used in news production. Sure, Face-O-Matic picks up clips of President Donald Trump, Senate Majority Leader Mitch McConnell, R., Ky., and others speaking, whether on the floor, at press conferences, or at a luncheon.

However, often these elected officials’ faces are used to illustrate a point a news anchor is making, or in footage without audio, sometimes as a floating head somewhere on the screen, or as part of a tweet. Face-O-Matic even picks up faces in a crowd.

Face-O-Matic can help find frequently re-aired clips.

How is BBC covering the latest news in US health care legislation? Face-O-Matic shows they’re using this clip a lot. https://t.co/xdPM6Bgc8R

— TV News Archive (@TVNewsArchive) July 19, 2017

Face-O-Matic can find even images that are only briefly displayed on the screen.

Face-O-Matic catches one second of Trump with Putin https://t.co/PyWYb9cSZF via @internetarchive

— TV News Archive (@TVNewsArchive) July 19, 2017

Face-O-Matic finds both still and video of Trump in a single clip.

As a photo of Trump appears and then video of him follows, Face-O-Matic produces a single clip of the two formats https://t.co/hzNTWBKMKQ

— TV News Archive (@TVNewsArchive) July 19, 2017

Please take Face-O-Matic on a spin and share your feedback with us, tvnews@archive.org. This blog post explains how it fits into our overall plan to turn TV news into data. To install,  for now you’ll need to ask your Slack team administrator or owner to set it up. The administrator can click on the button below to get started. Visit Slack to learn how to set up or join a Slack team. Questions? Contact Dan Schultz, dan.schultz@archive.org.

Fact-check: Trump has signed more bills than any president ever (wrong)

News cameras captured Trump saying, “We’ve signed more bills — and I’m talking about through the legislature — than any president ever.” (A moment later, he commented that he doesn’t “like Pinocchios,” referring to The Washington Post’s Fact Checker rating system.)

Glenn Kessler, reporting for that same fact-checking site, explained why Trump is not, in fact, besting his predecessors in the White House when it comes to bill signing. But, he refrains from stating how many Pinocchios the president had earned: “Tempted as we are to give the president Pinocchios for his statement, he seemed to be speaking off the cuff and was operating on outdated information from his first 100 days. We don’t play gotcha here at The Fact Checker, and we appreciate that he added a caveat. He certainly appeared to pause for a moment and wonder if he was right. For Trump, that’s a step in the right direction…But he’s way off the mark and actually falling behind in legislative output.”



Fact-check: “bushel” of Pence claims on health care reform (range from “twists the facts” to “false”)

Also writing for The Washington Post’s Fact Checker, Michelle Ye Hee Lee checked a number of statements Vice President Mike Pence made about the Senate health care reform bill during an appearance at the National Governors Association.

These included, for example, the claim, “I know Governor Kasich isn’t with us, but I suspect that he’s very troubled to know that in Ohio alone, nearly 60,000 disabled citizens are stuck on waiting lists, leaving them without the care they need for months or even years.”

Lee wrote that this claim is false: “[T]here’s no evidence the wait lists are tied to Medicaid expansion. We previously gave four Pinocchios to a similar claim….The expansion and wait list populations are separate, and expansion doesn’t necessarily affect the wait list population….Whether people move off the wait list depends on many factors, such as how urgent their needs are, how long they’ll need services and whether the states have money to pay for them. Many times, a slot opens up only if someone receiving services moves out of the state or dies.”



Follow the TV

There’s been much controversy in news gathering circles about when, whether, and how to invoke the word “lie” when reporting on public officials. One of our archivists, Robin Chin, has noticed a number of prominent uses of the term by commentators in recent TV news coverage.

For example, here’s Shepard Smith on Fox News on July 14 saying, “Jared Kushner filled out his form. I think it’s an F-86 saying who he met with and what he had done… He went back and added 100 names and places. None of these people made it… Why is it lie after lie after lie? … My grandmother used to say when first we practice to — oh, what a tangled web we weave when first we practice to deceive. The deception, Chris, is mind boggling.”

And here’s Tom Brokaw on July 16 on NBC’s “Meet the Press,” saying: “Certainly there are atmospherics here that call to mind Watergate, the kind of denial of the obvious and the petty lying that is going on. But at the same time, Watergate, I like to think, was there by itself and this president is entangling himself in that kind of discussion they we’re having here today when it’s not in the interest of anyone, most of all this country, when we have so many issues before us. It’s got to get cleaned up.“

On July 17, on “CNN: Tonight With Don Lemon,” here is David Gergen saying: “Other presidents succeed at this by just being straightforward about the facts. And it’s gone on for so long and so duplicitous and so much double speak that you begin to wonder, this is quite intentional. This may be quite intentional. You create a fog bank of lies and uncertainties and vagueness and create so many different details that people just sort of say, the hell with that, I don’t want to watch this… My sense is that a lot of Americans are starting to tune out…”

Search captions for terms you are interested in at the TV News Archive. For trends, try the Television Explorer, built by data scientist Kalev Leetaru, and powered by TV News Archive data, which can provide quick visualizations of terms broken down by network.

To receive the TV News Archive’s email newsletter, subscribe here.

TV News Lab: Introducing Face-O-Matic, experimental Slack alert system tracking Trump & congressional leaders on TV news

Internet Archive - 19 juli 2017 - 3:37pm

Working with Matroid, a California-based start up specializing in identifying people and objects in images and video, the TV News Archive today releases Face-O-Matic, an experimental public service that alerts users via a Slack app whenever the faces of President Donald Trump and congressional leaders appear on major TV news cable channels: CNN, Fox News, MSNBC, and the BBC. The alerts include hyperlinks to the actual TV news footage on the TV News Archive website, where the viewer can see the appearances in context of the entire broadcast, what comes before and what after.

The new public Slack app, which can be installed on any Slack account by the team’s administrator, marks a milestone in our experiments using machine learning to create prototypes of ways to turn our public, free, searchable library of 1.3 million+ TV news broadcasts into data that will be useful for journalists, researchers, and the public in understanding the messages that bombard all of us day-to-day and even minute-to-minute on TV news broadcasts. This information could provide a way to quantify “face time”–literally–on TV news broadcasts. Researchers could use it to show how TV material is recycled online and on social media, and how editorial decisions by networks help set the terms of public debate.

If you want Face-O-Matic to post to a channel on your team’s Slack, ask an administrator or owner to set it up. The administrator can click on the button below to get started. Visit Slack to learn how to set up or join a Slack team. Questions? Contact Dan Schultz, dan.schultz@archive.org.

Add to Slack

To begin, Dan Schultz, senior creative technologist for the TV News Archive, trained Matroid’s facial detection system to recognize the president;  Senate Majority Leader Mitch McConnell, R., Ky., and Senate Minority Leader Charles Schumer, D, NY; and House Speaker Paul Ryan, R-Wis. and House Minority Leader Nancy Pelosi, D., Calif. All are high-ranking elected officials who make news and appear often on TV screens. The alerts appear in a constantly updating stream as soon as the TV shows appear in the TV News Archive

For example, on July 15, 2017 Face-O-Matic detected all five elected officials in an airing of MSNBC Live.

As can be seen, the detections in this case last as little as a second – for example, this flash of Schumer’s and McConnell’s faces alongside each other is a match for both politicians. The moment is from a promotion for “Morning Joe,” the MSNBC show that made headlines in late June when co-hosts Mika Brzezinski and Joe Scarborough were the targets of angry tweets from the president.  

The longest detected segment in this example is 24 seconds featuring Trump, saying “we are very very close to ending this health care nightmare. We are so close. It’s a common sense approach that restores the sacred doctor-patient relationship. And you’re going to have great health care at a lower price.”

Why detect faces of public officials?

First, our concentration on public officials is purposeful; in experimenting with this technology, we strive to respect individual privacy and harvest only information for which there is a compelling public interest, such as the role of elected officials in public life. The TV News Archive is committed to these principles developed by leading artificial intelligence researchers, ethicists, and others at a January 2017 conference organized by the Future of Life Institute.

Second, developing the technology to recognize faces of public officials contained within the TV News Archive and turning it into data opens a whole new dimension for journalists and researchers to explore for patterns and trends in how news is reported.  

For example, it will eventually be possible to trace the origin of specific video clips found online; to determine how often the president’s face appears on TV networks and programs compared to other public officials; to see how often certain video clips are repeated over time; to determine the gender ratio of people appearing on TV news; and more. It will become useful not just in explaining how media messages travel, but also as a way to counter misinformation, by providing a path to verify source material that appears on TV news.

This capability adds to the toolbox we’ve already begun with the Duplitron, the open source audio fingerprinting tool developed by Schultz that the TV News Archive used to track political ads and debate coverage in the 2016 elections for the Political TV Ad Archive. The Duplitron is also the basis for The Glorious ContextuBot, which was recently awarded a Knight Prototype Fund grant.

All of these lines of exploration should help journalists and researchers who currently can only conduct such analyses by watching thousands of hours of television and hand coding it or by using an expensive private service. Because we are a public library, we make such information available free of charge.

What’s next?

The TV News Archive will continue to work with partners such as Matroid to develop methods of extracting metadata from the TV News Archive and make it available to the public. We will develop ways to deliver such experimental data in structured formats (such as JSON, csv, etc.) to augment Face-O-Matic’s Slack alert stream. Such data could help researchers conduct analyses of the different amounts of “face-time” public officials enjoy on TV news.

Schultz also hopes to develop ways to augment the facial detection data with closed captioning, with for example OpenedCaptions, another open source tool he created that provides a constant stream of data from TV for any service set up to listen. This will make it simpler to search such data sets to find a particular moment that a researcher is looking for. (Accurate captioning presents its own technological challenges: see this post on Hyper.Audio’s work.)

Beyond this experimental facial detection, we have big plans for the future.  We are planning to make more than a million hours of TV news available to researchers from both private and public institutions via a digital public library branch of the Internet Archive’s TV News Archive. These branches would be housed in computing environments, where networked computers provide the processing power needed to analyze large amounts of data.

Researchers will be able to conduct their own experiments using machine learning to extract metadata from TV news. Such metadata could include, for example, speaker identification–a way to identify not just when a speaker appears on a screen, but when she or he is talking. Researchers could create ways to do complex topic analysis, making it possible to trace how certain themes and talking points travel across the TV news universe and perhaps beyond. Metadata generated through these experiments would then be used to enrich the TV News Archive, so that any member of the public could do increasingly sophisticated searches.

Feedback! We want it 

We are eager to hear from people using the Face-O-Matic Slack app and get your feedback.

  • Is the Face-O-Matic Slack app useful? What would make it more useful?
  • Would a structured data stream delivered via JSON, csv, and/or other means be helpful? What sort of information would you like to be included in such a data set?
  • Who is it important for us to track?
  • What else?

Please reach us by email at: tvnews@archive.org, or via twitter @tvnewsarchive. Also please consider signing up for our weekly TV News Archive newsletter. Or, comment or make contributions on GitHub.com/slifty, where Schultz is documenting his progress; all the code developed is open source. (One observer already provided images for a training set to track Mario, the cartoon character.)

The weeds

The TV News Archive, our collection of 1.3 million+ TV news broadcasts dating back to 2009, is already searchable through closed captions.

But captions don’t always get you everything you want. If you search, for example, on the words “Donald Trump” you get back a hodge-podge of clips in which Trump is speaking and clips where reporters are talking about Trump. His image may not appear on the screen at all. The same is true for “Barack Obama,” “Mitch McConnell,” “Chuck Schumer,” or any name.

.

Search “Barack Obama” and the result is a hodge podge of clips.

Developing the ability to search the TV News Archive by recognizing the faces of public officials requires applying algorithms such as those developed by Matroid. In the future we hope to work with a variety of firms and researchers; for example, Schultz is also working on a separate facial detection experiment with the firm Datmo.

Facial detection requires a number of related steps: first, training the system to recognize where a face appears on a TV screen; second, extracting that image so it can be analyzed; and third, comparing that face to a set known to be a particular person to discover matches.

In general, facial recognition algorithms tend to rely on the work of FaceNet, described in this 2015 paper, in which researchers describe creating a way of “mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.” In other words, it’s a way of turning a face into a pattern of data, and it’s sophisticated enough to describe faces from various vantage points – straight ahead, three-quarter view, side view, etc. To develop Face-o-Matic, TV News Archive staff collected public images of elected officials from different vantage points to use as training sets for the algorithm.

The Face-O-Matic Slack app is meant to be a demonstration project that allows the TV News Archive a way to experiment in two ways: first, by creating pipelines that run the TV News Archive video streams through Artificial Intelligence models to explore whether the resulting information is useful; second, by using a new way to distribute TV News Archive information through the popular Slack service, used widely in journalistic and academic settings.  

We know some ways it can be improved, but we also want to hear from you, the user, with your ideas. In the words of Thomas the Tank Engine, we aspire to be a “really useful engine.”

Face-O-Matic on GitHub

Follow TV News Archive progress in recognizing faces on TV on the following GitHub pages:

Tvarchive-faceomatic. The Face-o-Matic 2000 finds known faces on TV.

Tvarchive-ai_suite. A suite of tools for exploring AI research against video

 

 

 

IMLS Grant to Advance Web Archiving in Public Libraries

Internet Archive - 18 juli 2017 - 8:21pm

We are excited to announce that the Institute of Museum and Library Services (IMLS) has recently awarded our Archive-It service a Laura Bush 21st Century Librarian grant from its Continuing Education in Curating Collections program for the project Community Webs: Empowering Public Librarians to Create Community History Web Archives.

Working with partners from Queens Public Library, Cleveland Public Library, and San Francisco Public Library, and with OCLC’s WebJunction, which offers education and training to public libraries nationwide, the “Community Webs” project will provide training, cohort support, and services, for a group of librarians at 15 different public libraries to develop expertise in creating collections of historically valuable web materials documenting their local communities. Project outputs will include over 30 terabytes of community history web archives and a suite of open educational resources, from guides to videos, for use by any librarian, archivist, or heritage professional working to preserve collections of local history comprised of online materials.

We are now accepting applications from public libraries to participate in the program! Please help us spread the word about this opportunity to the entire public library community. You can also visit the program’s webpage for more information and the project’s grant materials are available through the IMLS award page.

Curating web archives documenting the lives of their patrons offers public librarians a unique opportunity to position themselves as the natural stewards of web-published local history and solidifies their role as information custodians and community anchors in the era of the web. We owe a debt of thanks to IMLS for supporting innovative tools and training for librarians and look forward to working with our public library friends and colleagues to advance web archiving within their profession and for the benefit of their local communities.

Film Screening: Lost Landscapes of LA on August 7

Internet Archive - 18 juli 2017 - 12:01am

By Rick Prelinger

Lost Landscapes of Los Angeles (2016, 83 minutes) is an experimental documentary tracing the changing city of Los Angeles (1920s-1960s), showing how its landscape expresses an almost infinite collection of mythologies. Made from home movies and studio-produced “process plates” — background images of the city shot by studio cinematographers for rear projection in feature films — Lost Landscapes depicts places, people, work and daily life during a period of rapid urban development. While audience  members are encouraged to comment, discuss and ask questions during the screening of this silent film, it is also a contemplative film that shows the life and growth of the U.S.’s preeminent Western metropolis as the sum of countless individual acts.

Lost Landscapes of Los Angeles is the latest of Rick Prelinger’s “urban history film events,” featuring rediscovered and largely-unseen archival film footage arranged into feature-length programs. Unlike most screenings, the audience makes the soundtrack — viewers are encouraged to identify places, people and events; ask questions; and engage with fellow audience members. While the films show Los Angeles as it was, the event encourages viewers to think about (and share) their ideas for the city’s future. What kind of a city do we want to live in?

Rick Prelinger is an archivist, filmmaker, and educator. He teaches at UC Santa Cruz and is a board member of Internet Archive. His films made from archival material have played at festivals, museums, theaters, and educational institutions around the world. Lost Landscapes of San Francisco (11 episodes, 2006-2016) plays every autumn in San Francisco. He has also made urban history films in Oakland and Detroit, and is currently producing a New York film for an autumn premiere. He thanks Internet Archive and its staff for making this film possible.

Get Tickets Here

Monday, August 7th, 2017
6:30 pm Reception
7:30 pm Interactive Film Program

Internet Archive
300 Funston Ave.
San Francisco, CA 94118

Landscope of Content Variation

Story Needle - 17 juli 2017 - 1:39pm

Publishers understandably want to leverage what they’ve already produced when creating new content.  They need to decide how to best manage and deliver new content that’s related to — but different from — existing content. To create different versions of content, they have three options, which I will refer to as the template-based, compositional, and elastic approaches.

To understand how the three approaches differ, it is useful to consider a critical distinction: how content is expressed, as distinct from the details the content addresses.

When creating new content, publishers face a choice of what existing material to use again, and what to change.  Should they change the expression of existing content, or the details of that content?  The answer will depend on whether they are seeking to amplify an existing core message, or to extend the message to cover additional material.  That core message straddles between expression (how something is said) and details (specifics), which is one reason both these aspects, the style and the substance, get lumped together into a generic idea of “content”.  Telling an author to simply “change the content” does not indicate whether to change the connotation or denotation of the content.  They need more clarity on the goal of the change.

Content variation results from the interaction of the two dimensions:

  1. The content expression (the approach of written prose or other manifestations such as video)
  2. The details (facts and concrete information).

Both expression and details can vary.  Publishers can change both the expression and the details of content, or they can focus on just one of the dimensions.

The interplay of content expression and details can explain a broad range of content variation.  Content management professionals commonly explain content variation by referring to a more limited concept: content structure —  the inclusion and arrangement of chunk-size components or sections.  Content structure does influence content variation in many cases, but not in all cases. Expressive variation can result when content is made up of different structural components.  Variation in detail can take place within a common structural component.   But rearranging content structure is not the only, or even necessarily the preferred, way to manage content variation.  Much content lacks formal structure, even though the content follows distinguishable variations that are planned and managed.

The expression of content (for example, the wording used) can be either fixed (static, consistent or definitive) or fluid (changeable or adaptable).  A fixed expression is present when all content sounds alike, even if the particulars of the content are different.  As an example, a “form” email is a fixed expression, where the only variation is whether the email is addressed to Jack or to Jill.  When the expression of content is fluid,  in contrast, the same basic content can exist in many forms.  For example, an anecdote could be expressed as a written short story, as a dramatized video clip, or as a comic book.

Details in content can also be either fixed, or they can vary.  Some details are fixed, such as when all webpages include the same contact details.  Other content is entirely about the variation of the details.  For example, tables often look similar (their expression is fixed), though their details vary considerably.

Diagram showing how both expression and details in content can vary

Now let’s look at three approaches for varying content.  Only one relies on leveraging structures within content, while the other two exist without using structure.

Template-based content has a fixed expression.  Think of a form letter, where details are merged into a fixed body of text.  With template-based content, the details vary, and are frequently what’s most significant about the content.   Template-based content resembles a “mad libs” style of writing, where the basic sentence structure is already in place, and only certain blanks get filled in with information.  Much of the automated writing referred to as robo-journalism relies on templates.  The Associated Press will, for example, feed variables into a template to generate thousands of canned sports and financial earnings reports.  Needless to say, the rigid, fixed expression of template-based writing rates low on the creativity scale.  On the other hand, fixed expression is valuable when even subtle changes in wording might cause problems, such as in legal disclaimers.

Compositional content relies on structural components.  It is composed of different components that are fixed, relying on a process known as transclusion.  These components may include informational variables, but most often do not.  The expression of the content will vary according to which components are selected and included in the delivered content.  Compositional content allows some degree of customization, to reflect variations in interests and detail desired.  Content composed from different components can offer both expressive variation and consistency in content to some degree, though there is ultimately a intrinsic tradeoff in those goals.  Generally the biggest limitation of compositional content is that its range of variation is limited.  Compositional variation increases complexity, which tends to prioritize creating consistency in content instead of variation.  Compositional content can’t generate novel variation, since it must rely on existing structures to create new variants.

Elastic content is content that can be expressed in a multitude of ways.  With elastic content, the core informational details stay constant, but how these details are expressed will change. None of the content is fixed, except for the details.  In fact, so much variation in expression is possible that publishers may not notice how they can reuse existing informational details in new contexts.  Elastic content can even morph in form, by changing media.

Authors tend to repeat facts in content they create.  They may want to keep mentioning the performance characteristic of a product, or an award that it has won. Such proof points may appeal to the rational mind, but don’t by themselves stimulate  much interest.  To engage the reader’s imagination, the author creates various stories and narratives that can illustrate or reinforce facts they want to convey.  Each narrative is a different expression, but the core facts stay constant.  Authors rely on this tactic frequently, but sometimes unconsciously.  They don’t track how many separate narratives draw on the same facts. They can’t tell if a story failed to engage audiences because its expression was dull, or because the factual premise accompanying the narrative had become tired, and needs changing.  When authors track these informational details with metadata, they can monitor which stories mention which facts, and are in a better position to understand the relationships between content details and expression.

Machines can generate elastic content as well.   When information details are defined by metadata, machines can use the metadata to express the details in various ways.  Consider content indicating the location of a store or an event.  The same information, captured as a geo-coordinate value in metadata, can be expressed multiple ways.  It can be expressed as a text address, or as a map.  The information can also be augmented, by showing a photo of the location, or with a list of related venues that are close by.  The metadata allows the content to become versatile.

As real time information becomes more important in the workplace, individuals are discovering they want that information in different ways.  Some people want spreadsheet-like tools they can use to process and refine the raw alphanumeric values.  Others want data summarized in graphic dashboards.  And a growing number want the numbers and facts translated into narrative reports that highlight, in sentences, what is significant about the information.  Companies are now offering software that assesses information, contextualizes it, and writes narratives discussing the information.  In contrast to the fill-in-the-blank feeding of values in a template, this content is not fixed.  The content relies on metadata (rather than a blind feed as used in templates); the description changes according to the information involved.  The details of the information influence how the software creates the narrative.   By capturing key information as metadata, publishers have the ability to amplify how they express that information in content.  Readers can get a choice of what medium to access the information.

The next frontier in elastic content will be conversational interfaces, where natural language generation software will use informational details described with metadata, to generate a range of expressive statements on topics.  The success of conversational interfaces will depend on the ability of machines to break free from robotic, canned, template-based speech, and toward more spontaneous and natural sounding language that adapts to the context.

Weighing Options

How can publishers leverage existing content, so they don’t have to start from scratch?  They need to understand what dimensions of their content that might change.  They also need to be realistic about what future needs can be anticipated and planned for.  Sometimes publishers over-estimate how much of their content will stay consistent, because they don’t anticipate the circumstantial need for variation.

Information details that don’t change often, or may be needed in the future, should be characterized with metadata.  In contrast, frequently changing and ephemeral details could be handled by a feed.

Standardized communications lend themselves to templates, while communications that require customization lend themselves to compositional approaches using different structural components.  Any approach that relies on a fixed expression of content can be rendered ineffective when the essence of the communication needs to change.

The most flexible and responsive content, with the greatest creative possibilities, is elastic content that draws on a well- described body of facts.  Publishers will want to consider how they can reuse information and facts to compose new content that will engage audiences.

— Michael Andrews

The post Landscope of Content Variation appeared first on Story Needle.

TV News Record: Donald Trump Jr makes “email” popular on TV again

Internet Archive - 14 juli 2017 - 12:55am

This week the term “email” took on a new meaning in the annals of political controversy, President Donald Trump traveled to Poland, and the Senate continued to struggle with health care reform.

Email back on TV following Trump Jr.’s release of email exchange

Email as a technology may be on the way out (or just evolving), but its place in political history, already assured, got an even bigger boost this week when Donald Trump Jr. on Tuesday released a June 2016 email chain in which he exclaimed “I love it” to the prospect of receiving damaging information about Hillary Clinton through Russian intermediaries.

The term “email” is spiking again on TV news broadcasts, though it has not yet climbed to levels in the lead up to the November 2016 elections. In those months, particularly Fox news networks hammered on storylines of both hacked Democratic National Committee (DNC) emails and Hillary Clinton’s use of a private email server to do official business while serving as secretary of state.

However, with congressional and federal investigations of possible Russian tampering with the elections underway, we are early in the life cycle of this story. Stay tuned, and remember that searching terms on TV news is just a few clicks away on Television Explorer, which is fueled by TV News Archive data.

Search of term “emails” on Television Explorer, fueled by TV News Archive data. (Click on image to see larger.)

Following the TV 

The Watergate move “All the President’s Men,” made the term “follow the money” an inspiration for journalists everywhere; thanks to the TV News Archive, enterprising reporters and researchers can “follow the TV” – find and link to past statements of public officials relevant to a current story.

With this week’s news putting Russia’s involvement in the election back in the headlines, past statements by members of the Trump camp become interesting watching. For example, here’s former Trump campaign chairman, Paul Manafort, in July 2016, saying “that’s absurd” to the allegation of a Putin-Trump connection.  Here’s Donald Trump Jr. in July 2016 saying it was “disgusting” to say the DNC email hack was perpetrated by the Russian government to support Trump. And here is advisor Kellyanne Conway in December 2016 saying “absolutely not” to a question about whether the Trump campaign was in contact with Russians trying to influence the election.

Factcheck: Obama knew about Russian interference in election and did nothing about it (mostly false)

At a joint press conference with Polish President Andrzej Duda last week, President Trump said “Barack Obama when he was president found out about this, in terms of if it were Russia, found out about it in August. Now the election was in November. That is a lot of time he did nothing about it.”

According to Lauren Carroll reporting for Politifact, the Obama administration took several steps after learning of the interference. Among them: “Obama personally confronted Russian President Vladimir Putin and told him to back off… On Oct. 7, the Obama administration publicly identified Russia for the first time as being behind election-related hacks, issuing a joint statement from Homeland Security and the Director of National Intelligence… Also, throughout August and up through the election, Homeland Security Secretary Jeh Johnson encouraged state-level election officials, through official statements and phone calls, to protect voting-related systems from cyber intrusions…However, the Obama administration took its most significant actions against Russia after Nov. 8. In late December, Obama ordered 35 Russian diplomats and suspected intelligence agents to leave the United States, and he also imposed narrow sanctions on some Russian individuals and organizations.”

Factcheck:  Billions are pouring into NATO because of the Trump administration (four Pinocchios)

During a speech in Poland last week, President Donald Trump said about about his calls for increased defense spending by other countries for NATO, “As a result of this insistence, billions of dollars more have begun to pour into NATO.”

“These budget decisions were made during the 2016 calendar year, before Trump became president,” reported Michelle Ye Hee Lee, for The Washington Post’s Fact Checker. She quoted Alexander Vershbow, former deputy secretary general of NATO, who said: “‘Who deserves the most credit? Vladimir Putin. It was the invasion of Crimea, the launching of insurgency backed by Russia in Eastern Ukraine, that was the wake-up call for the majority of the allies.”

Factcheck: hundreds of thousands will die if the Senate health care bill passes (can’t say)

With the Senate debating health care reform, FactCheck.org checked a recent statement by House Minority Leader Nancy Pelosi, D., Calif, where she said, “We do know that… hundreds of thousands of people will die if this bill (Senate health care bill) passes.”

Lori Robertson and Robert Farley wrote, “the research uses terms like ‘could’ and ‘suggests’ and ‘cannot definitively demonstrate a causal relationship,’ not the definitive ‘will’ favored by opponents of the bill. We can’t say whether any specific projection is a correct or valid number.”

To receive the TV News Archive’s email newsletter, subscribe here.

Net Neutrality Day of Action is Tomorrow, June 12!

Internet Archive - 11 juli 2017 - 9:56pm

Tomorrow, the Internet Archive will join with a huge list of Internet companies and organizations to protest the FCC Chair’s stated intentions to do away with net neutrality protections established by prior Commissioners.

Among other actions, the Archive will be displaying a pop-up message tomorrow on our sites to demonstrate the severity of this threat and simulate an Internet in which ISPs are given free reign to provide selective access to the web.

Organizations and site owners can learn more about joining in the Day of Action here.

Private individuals can help by 1. sending a letter to the FCC to voice your support for net neutrality (the deadline for comments is coming up fast!) and 2. spreading the message on social media.

The people fought back PIPA and SOPA before and we’re ready to stand up again!

How to play and play with 78rpm record transfers

Internet Archive - 10 juli 2017 - 9:08pm


There are over 50k uploaded recordings from 78’s from users, and there are now 10’s of thousands of high-bitrate unrestored transfers of 78’s that are part of the Great 78 Project.

With this many, it gets hard to find things you want to explore.  Here are some techniques I do:

Again, I recommend the “play items” link as it plays along like youtube does.

To download for research and preservation purposes:
  • Download right hand side of a “details” page, and you can click to see the whole list of files.
  • There is the “best” stylus version (according to an audio engineer at George Blood Co) that is renamed to be itune-ish compatible, but all the stylii recordings are there in flat and equalized formats, and each of those in FLAC and MP3 formats.
To download many records for research and preservation purposes (requires linux or mac and command line skills):
  • Install the Internet Archive command line interface
  • To download metadata in json from our 78 transfers, in bash:
    • for item in `./ia search “collection:georgeblood” –itemlist`; do curl -Ls https://archive.org/metadata/$item/metadata ; echo “”; done
    • I installed gnu parallel to speed things up (I use “brew install parallel” on a mac)
    • ./ia search “collection:georgeblood” –itemlist | parallel -j10 ‘curl -Ls https://archive.org/metadata/{}/metadata’ > 78s.json
  • To download all of files of the high bitrate transfers (and is repeatable to update based on failures or new additions):
    • ./ia download –search=”collection:georgeblood”      (14TB at this point)
  • To download only the metadata and Flac’s:
    • ./ia download 78_–and-mimi_frankie-carle-and-his-orchestra-gregg-lawrence-kennedy-simon_gbia0006176a –format=”24bit Flac” –format=”Metadata”
  • To download only the metadata and mp3’’s of all ragtime recordings:
    • ./ia download –search=”collection:georgeblood AND ragtime” –format=”VBR MP3″ –format=”Metadata”

If you want to do more on downloading specific sets, I suggest the documentation or joining the slack channel.

How You can Help: Please help find dates for these 78’s

We are doing some by automatically matching against 78discography.com and discogs.com, but many are done by hand, finding entries in billboard magazine and on DAHR the like.  But many still need dates.

If you would like to help, then please do research and post your findings in the review of a 78rpm record, citing your sources.  Then someone with privileges will change the metadata in the item.

The complete collection has date facets on the left reflecting the dates we have found.  But we only have dates for about half, and there are thousands of 78rpm sides posted each month so we need help!

To find what others have done, you can list them in the order of the most recent reviews.

The most recent ones that do not have a date nor a review are here.  This is a good starting place.

But again, these need dates.  If you tried and could not find anything online, then please post a review to that effect so others do not spend time on the same one.   

If you find other information, or know other information about the performer, performance, or piece, please put it in.  Also links to youtube, wikipedia, and old magazines like cashbox and billboard.  

For those that get into it, we invite you to join the slack channel (a great tool if you have not used it already), then that is where there is some discussion.   Caitlin@archive.org can set you up.

Oh, and I have gotten a bit obsessed, and this is a twitter feed of a digital transfer every 10 minutes which I visit more times than I should probably.

Restoration techniques:

I have been using Dartpro MT – I like it because it has a “Filter builder” –
the program doesn’t like 24 bit but I transfer at 24/96,000 , resample to
16/96000 then decrackle starting with a setting of 50 repeating the process
with an increase each time by another 10 i.e. 50, 60, 70, 80 (maximum) if
more noise is still there, I run repeatedly at 80 until the reported
interventions get to a number of 4000 or so. I can then manually remove any
clicks that are left.

I don’t use the denoise or dehiss, preferring to use declick  at very low
settings (78’s don’t have hiss – what you hear as hiss is the combination of
many little clicks.

I start with a setting of 2 then 4 then 6 . I leave the settings the same
then just find whether 1 or 2 or 3 passes will polish the higher noise away.
Decrackle doesn’t affect the high frequencies but declick does. This process
takes a little time but I almost never find that any distortion is
introduced to the sound.  I hate getting to the end of the record where the
really growly trumpet sound is a distorted mess, and this workflow prevents
that. – Mickey Clark

My “go to” software for restoration work is Izotope. However, I also have Adobe Audition, Diamond Cut, Pro Tools, Samplitude, and Sound Forge available for specific situations. As Ted Kendall points out: Hearing and Judgment play the major roles in both transfer and restoration. To that I would add Experience. And, as always, start with the best possible source.

Get Involved

Please write to the Internet Archive’s music curator  bgeorge@archive.org or more generally to info@archive.org .

Please join this project to:

  • Share knowledge. Help improve the metadata, curate the collection, contact collectors, do research on the corpus, etc.
  • Include your digitized collection. If you have already digitized 78s or related books or media, we’d like to include your work in the collection.
  • Digitize your collection.  We’ve worked hard to make digitization safe, fast and affordable, so if you’d like to digitize your collection we can help.
  • Donate 78s.  We have 200,000 78s, but we are always looking for more.  We will digitize your collection and preserve the physical discs for the long term.

If you are in the bay area of California, we can also use help in packing 78’s for digitization and please come over for a lunch on a Friday.

TV News Record: Focus on North Korea

Internet Archive - 7 juli 2017 - 5:52pm

By Katie Dahl and Nancy Watzman

Following the U.S. government’s confirmation that North Korea had successfully fired an intercontinental ballistic missile, we focus on statements by public officials and pundits on the nuclear threat from the Korean Peninsula, including some past fact-checked segments.

What top-rated cable shows aired the day after

On Fox News, “Tucker Carlson Tonight” focused his report on the missile launch by interviewing Michael Malice, a New York-based ghost writer and author of Dear Reader: The Unauthorized Biography of Kim Yong Inalong with  George Friedman a founder of Geopolitical Futures. Malice said the launch amounted to a commercial for the country’s product, “It’s a great sales pitch to show they have weapons they could sell and make a lot of money off of.” Friedman emphasized that the “Chinese have no reason to solve this,” and also said he didn’t think North Korea has a “capable” nuclear missile at this point.

Over on MSNBC, Rachel Maddow interviewed NBC’s national security reporter, Courtney Kube, who said that North Korea hadn’t demonstrated its capability to deliver a nuclear warhead yet, although “I don’t know if you would find anyone in the U.S. military at the highest levels who would say with confidence or certainty that they don’t absolutely have that capability. I think that they’re hopeful they do not, since they haven’t demonstrated or tested it.”

In the first hour of Anderson Cooper 360 on CNN, John Berman, sitting in for Cooper, placed North Korea’s launch in a global context with President Donald Trump’s trip to Europe, interviewing a panel of former public officials, David Gergen, who advised Republican and Democratic presidents; John Kirby, who was a spokesperson for the State Department under the Obama administration, and Shamila Chaudry, who served on the National Security Council under the Obama administration.


What Congressional leaders have said about North Korea

Senate Majority Leader Mitch McConnell, R., Ky., on April 2o17, mentioned North Korea in context of the U.S. missile strike on Syria in response to chemical attacks on civilians as “a message to Iran and North Korea and the Russians that America intends to lead again.”

House Speaker Paul Ryan, R., Wis., when talking about a bill to strengthen sanctions back in 2016, said “[Obama’s] strategy of strategic patience with North Korea, it’s just not working.”

House Minority Leader Nancy Pelosi, D., Calif., in April 2017, said “The president is playing with fire when he’s talking about North Korea. We have to exhaust every diplomatic remedy.”

Senate Minority Leader Chuck Schumer, D., N.Y.,  in April 2017, said “The only way to really stop North Korea from doing what it’s doing short of war is to get China to fully cooperate, because they control all the trade. They control the entire economy, really, of North Korea. My view is to get the Chinese to do something real, you have to be tough with them on trade. Trade is their mother’s milk.”

And now for some past fact-checked segments on North Korea.

Trump never said that more countries should acquire nuclear weapons (False)

In November 2016, not long after he won the election, then-President-elect Donald Trump tweeted:

The @nytimes states today that DJT believes “more countries should acquire nuclear weapons.” How dishonest are they. I never said this!

— Donald J. Trump (@realDonaldTrump) November 13, 2016

Lauren Carroll, reporting for PolitiFact, rated this claim “false,” citing several examples from the campaign trail where Trump had said just that. For example, in April 2016, Fox News’ Chris Wallace asked, “You want to have a nuclear arms race on the Korean peninsula?” Later in the broadcast, Trump said about Japan and South Korea, “”Maybe they would be better off — including with nukes, yes, including with nukes.”

China has “total control” over North Korea (Mostly False)

During a Republican primary debate in January 2016, Trump said that China has “total control just about” over North Korea. Reporting for PolitiFact, Louis Jacobson rated this claim as “mostly false.” “He has a point that China holds significant leverage over North Korea if it wishes to exercise it, since China provides the vast majority of North Korea’s international trade, including food and fuel imports. But Trump’s assertion, even slightly hedged as it is, overlooks some significant limits to that leverage, notably the North Korean government’s willingness to follow its own drummer even if that means its people suffer. The fact that North Korea recently conducted a nuclear test over the strenuous objections of China suggests that Beijing lacks anything approaching ‘total control’ over North Korea.”

China accounts for 90 percent of North Korea’s trade (True)

In April 2017, Secretary of State Rex Tillerson told the U.N. Security Council, “But China, accounting for 90 percent of North Korean trade, China alone has economic leverage over Pyongyang that is unique, and its role is therefore particularly important.”

PolitiFact’s John Kruzel rated this claim as “true.” “China’s role as an outsize trade partner of North Korea is a relatively new development. Since 2000, trade with the rest of the world has dropped off, as Chinese trade has risen. While the ratio is subject to change based on political factors, China now accounts for around 90 percent of North Korean trade.”

To receive the TV News Archive’s email newsletter, subscribe here.

Pagina's

Abonneren op Informatiebeheer  aggregator - Beschikbaarstellen