Problems With Breaking News With The Knowledge Graph

The world awoke on Tuesday to a new Microsoft CEO. Satya Nadella, the former Head of Cloud Computing, had been promoted in a pretty uneventful affair replacing the incumbent CEO Steve Ballmer.

The media picked the news up exceptionally quickly and the story spread around the web like a wildfire with news outlets, bloggers and social media all talking about what the appointment meant for the business and what changes Nadella would be likely to make.

There was one place however that didn’t even notice that anything had even happened. Google’s Knowledge Graph.

Google’s news search served an updated story on the chief executive switch, of course, but the first visible result was provided by the Knowledge Graph, and despite it being a database containing encyclopedia entries on about 570m concepts, relationships, facts and figures, it was quickly made out of date by the Microsoft move. A fact I’m sure wasn’t lost on Nadella, a former Internet search employee.

I was alerted to this anomaly by Samuel Gibbs of the Guardian who wrote about the lag in the system. So I jumped at the chance to examine the issue in real-time.

The Knowledge Graph is fueled by a number of knowledge bases that push facts to be used in information panels, however, as shown in search patents unearthed by Bill Slawski, these need to be verifiable. Essentially Google needs two sources of information to verify against before they will insert data into a panel. This seemed like an ideal area to investigate further.

As Wikipedia is seen as an important source of information for the Knowledge Graph I began scanning through dbpedia, a database of Wikipedia used by many semantic web applications. Diving into the RDF output for Nadella and Ballmer’s entries came up with nothing to cause alarm. Both sets of data had been updated to include their new employment status.

Updating the Knowledge Graph

Now that there was one definite source of data on the web, I took to Freebase to edit their profiles to see if the Knowledge Graph could be “kicked into gear”.

After a few minutes I had entered Nadella’s new CEO status at Microsoft and updated Ballmer’s new employment details. Now I had to wait.

Using a tool called Page Monitor I tracked the RDF output of Freebase to see of there was a correlation between the time of publication to the moment the Knowledge Graph updated with the new information.

Alas, just a few hours after editing, the RDF dump had updated followed quickly by an updated entry within the SERPs:

So what does this tell us about the Knowledge Graph?

Verified sources
We have long understood that the Knowledge Graph needed multiple sources of information to populate a panel for an entity, and thanks to patents we had an indication that two separate sources of information would be enough to influence results. However, seeing this (albeit rough and ready) experiment in the wild gives a solid sign that this may well be the case.

Freebase as a source
Freebase is seen by many (myself included) as key to the growth of the Knowledge Graph and other semantic agents. This shows how Freebase data can also be used as a source of user-generated information that can be passed into the Knowledge Graph.

Time sensitivity is an issue
Last but not least, this debacle shows that for time sensitive information such as breaking news; the Knowledge Graph simply isn’t ready. The process of becoming (or editing) an entity isn’t well-known and as such will hamper the ability for Google to keep its panels updated to respond in (almost) real-time.

As more and more results begin to show more dynamic knowledge panels like these, it’s the job of an SEO consultant to understand how these panels are created, why and how they can affect our clients in the real world.

Problems with Breaking News with the Knowledge Graph

Updating the Knowledge Graph

So what does this tell us about the Knowledge Graph?

Andrew Isidoro