In the introduction of the leaked and much cited New York Times Innovation Report the author breaks down what’s required to run a successful journalism based business. The first thing is producing great journalism — “deep, broad, smart and engaging”. And the second is getting that journalism to readers. Simple enough. While we’re producing much more content than before, distribution as a mean to create meaningful engagement has yet to make fundamental progress.

The majority of digital publications are a direct adaption of the newspaper. Content is served using databases that hold articles in stationary databases. Those are fed using a content management system, maintained by writers, editors an moderators.

That data is then broken down to buckets, the structure of which is designed through a careful process of information architecture. When executed well, IA will make navigation through buckets intuitive and usable. Content is then rendered for the reader, in a usable form (UX) and visually pleasing presentation (UI / visual design).

Can machine learning offer a sufficient alternative to traditional media model?

MVC, abbreviation for Model View Controller, is a software architecture model to efficiently hold data and render it to view as needed.

The Model handles the logic for the application data. This is often means the database where information is being held. The View handles the way data is being rendered, grabbing data from the Model (database) and presenting it to the end user. The Controller connect the view and the model. Grabbing data for the user and capturing any user input. The authors of the The Times Innovation Report articulate the process of gaining readers’ loyalty and the conditions required for it.


Improving technology to build more and better tools to get content in front of readers. Changing the simplistic digitization of content, and experimenting with more effective ways to repackage evergreen content and pushing relevant content to readership.


Better advocacy of the work. Creating structures to maximize the reach of the most important content. Promoting best practices and sharing them company wide.


Stronger connection with the most loyal readers, online and offline. The current climate is more demanding and readers are expecting more two–way communication.

In this paradigm Discovery is the main beneficiary a change in classic (static) MVC models. The Times produces over 300 URLs every day. Having all of that content reach the audience is deservers is quite simply an impossible task, especially on an on–going basis. It will require aggressive social media push, an a lot more screen estate than what The Times has.

Generative vs static: recent developments in machine learning reveal an alternative way of thinking and designing content experiences. If transitionally all outcomes of a piece of software had to be programmed — new machine learning structures can generate views and functions based on a query.

Once it has fulfilled those structures can disperse.

Stories (at this stage) are not self generated, but their packaging is. Stories will still be written by professional staff. however manual content bucketing and subsequent placement into views no longer be required.

Classic data architecture holds information in a very specific, indexable points. Metaphorically speaking a piece of data (let’s say a tweet) sits on a shelve, in a room; on the second floor of a house; on a street with a name. When that information is required, a client navigates to that point and fetches it. Thens serves it to the user.

In a new model there are no shelves, nor is there a house — the structures builds themselves when a question is asked, and disappears when it is answered. The building blocks exist but without the rigid, static structure.

A good way of understanding this concept is through a video game called No Man’s Sky. In the game the player explores an endless universe, with an unlimited number of stars. As the developer of the game describes: The way the universe–and all elements within it–are created is procedural.

It’s not random, but procedural

Everything follows a mathematical formula, which is generated on the fly by the computer, to a result that looks correct, and feels natural in the game. That is in itself a huge leap from what’s currently common in video games — a hugely time consuming product to produce. Where every detail is drawn by hand and manipulated to seem believable.

Having a random render engine could result in unexpected and non–believable results. For example planets that are completely red, or all blue. Instead, when a planet is created an algorithm can determine its distance from the sun. In certain proximity the the planet will have water. If it has water it might have rivers and oceans. Those will then create moisture in the atmosphere, which will mean that the sky blue.

This then raises the question: What happens when the user has left that planet?

The presenter answers that through the classic question: if a tree falls in the forest and there is no one there to hear it, will it still make sound?

In the real world, the answer is of course yes — mainly due to our (and our planet’s) physicality. In No Man’s Sky generative universe the answer is no.

If there is no one around, there is no tree, there is no forest, there is no sound. When a player lands on a planet with their ship, it is crated right there and then.

Although that point is slightly too literal for the case of generative content models, it demonstrates the shift we need to make in the way we think about data.

If the content itself is our currency what can we do maximize its monetization? Discovery is the real weak point in today’s media market. What can machine learning and other generative models do to maximize reach, conversion and readers’ loyalty?

June 30, 2015