“A system of morality which is based on relative emotional values is a mere illusion, a thoroughly vulgar conception which has nothing sound in it and nothing true” – Socrates
Socrates poised that an ethical argument based on emotion was not one worth discussing, yet as SEO consultants, we have been guilty of this in recent weeks. Back in 2012 Google introduced a feature to the search results called the Knowledge Graph. It gave users an improvement to the level of their interaction with the SERPs that they had never seen before (or since). There has been much talk of this aspect of search over the past few months but the ethical questions to be discussed around the implementation of raw data into search deserve to be discussed with logic at the forefront. When discussing ethical matters like this; I’ve long been an advocate of the voice of the collective so for this post I have decided to surround myself with people much more intelligent than I such as Bill Slawski, Dr Pete Meyers and Gianluca Fiorelli. We discussed briefly over email the topics in question and here were their answers:
With Google expanding to include more panels pulled from pages without markup; how do you see information retrieval effecting brands, publishers and retailers alike?
The purpose behind knowledge panels are really two-fold. The first of those is to improve discoverability, to make it easier for people who don’t know a topic well to learn more, so that they have related information and topics to search for. The second purpose is similar to that of a snippet in search results. Knowledge panels provide a representation of the entities they are about, include some disambiguation information when there are other entities or concepts by the same name so that a search can explore those as well. In neither instance is the purpose to replace web pages or documents that might be pointed to by Google, but instead to give people more to search for from the search engine, including in many instances, topics that people often search for next historically when they perform a search for the original entity.
I think it depends a lot on the vertical. It’s easy to look at a quick answer derailing a result and see nothing but bad news. It’s fair to ask, though – if your business is nothing but aggregating easy answers (plus ads, most likely), how much value do you add? Sites that listed dates for major holidays provided a service for a while and made good money on ads, but now that Google can answer a question like “When is Christmas?”, that business model is over. Being brutally honest, though – it wasn’t a very strong model to begin with. On the other hand, imagine you’re a local restaurant, and Google is serving up a rich knowledge panel with your photos, address, telephone and today’s operating hours. Have they potentially taken a click from your website? Sure, but does that matter? They’ve made your brand look more credible and given people the information they need to find you. If those people walk in the door, it doesn’t matter where the information comes from. I’m not arguing about Google’s intent or responsibility to webmasters (I think they’ve milked “good for users” a bit too hard lately). I’m just saying that the impact on your business can vary wildly. Some people will do well.
I think it is already doing it, if it true what implementation data are telling us about the real use of schema.org and other structured data, being it quite small with respect the total amount of web document indexed by Google. A very simple example is how Google is able (well, not always) to interpret authorship thanks to the by-line and with the rel=”author” being absent. How brands, publisher et al are going to be affected? I think that at first they will see and notice a traffic decrease, probably… But what they will also see will be – IMHO – a better quality of the traffic that still they will receive, also from a Knowledge Graph navigation. They will loose traffic that tends to bounce a lot or that is not going to convert ever. More over, if web site owners/SEO are able to monitor and control what Google is “scraping” from them, they can gain visibility above the fold in the SERPs, which is quite a precious value right now that organic search snippets visibility is shrinking.
Many see the see Google’s expansion of the knowledge graph to include more and more terms to be aggressive; Do you and would you ever recommend against schema.org or other microformats to limit information passed to search engines?
Search engines have been working to extract structured data from the somewhat unstructured nature of web pages for a long time. The labels from microformats and schema might make it easier for a search engine to extract information from a page, and if you want your page to be a source of such information, including that kind of markup isn’t a bad idea. I can envision some people portraying Google’s knowledgebase to be “aggressive”, and there have been people who have written about search engine bias, and a desire for search engines to show their own properties instead of those from original sources. But often those other properties are just more finely focused vertical searches.
There may be isolated cases, but in general, I wouldn’t recommend that. Google is going to find ways to extract data from someone, somehow. Either you can control that data and make sure it comes from you, or it can either (a) come from a competitor, or (b) come from you however Google finds and mangles it. From a purely commercial standpoint, I’m not sure what choice we have but to play the evolving game.
No, I wouldn’t. What I would suggest, and actually that’s what I suggest to my clients from some time now, is to craft their content in order to have “answers” ready to be used by Google in the Knowledge Graph and Answers box, but to put special efforts in offering in-depth content in the same page. For instance, using as an example a site offering IP information, if it was just answering to a question like “what’s my IP” with just the IP number of a domain name, then that site is going to sink due to Answers box. But if in the same page the site offers deep information as what others domain are hosted in the same IP, what country is that IP assigned to, what historical information we can find about that IP, if that IP was ever flagged for malware and what kind of malware and so on, then we are offering informations that will be valuable to the users and that Google cannot offer with a simple answer.
Many webmasters have complained about results containing scraped data; but in your opinion is Google doing anything wrong? Is there any logical or ethical argument (from a user perspective) against Google presenting scraped data within panels?
One of the tenets of copyright is the concept of fair use, and there’s a 4 pronged test for whether a use of someone’s artistic work is or isn’t fair use. Facts themselves aren’t something you can copyright, though unique compilations of facts have been shown to be. So, Abraham Lincoln’s height isn’t something that you can copyright, and the fact that Bill Clinton plays the Saxophone isn’t either. If a summary of facts is shown in a knowledge panel from a templated Wikipedia biography box, that information isn’t necessarily going to stop people from visiting the Wikipedia page, and may actually encourage more people to visit it.
I think they’re starting to tip the balance. Google will argue that this data is good for users and that they’ve made webmasters a lot of money over the years. This is true, and we should be honest and admit it. Many of us have made a lot of money off of Google and they leveled the playing field for a while for small business. On the other hand, they make $60B/year, and the vast majority of that comes from either putting advertisements on search results extracted from our sites (AdWords) or on ads placed directly on our sites (AdSense). There’s always been an implied promise – Google will make money from our data, but in return they’ll drive traffic back to us. Once they start to extract answers or create knowledge panels that just link to more Google searches, the relationship starts to break. Is that illegal? No. Is it unethical? I think it’s a broken promise, even if the promise is implied. I think they run the risk that, pushed too hard, we may block our sites and abandon Google. They still hold most of the power, admittedly, but I don’t think they should take the balance lightly.
My first reaction, as a marketer, is not really an happy one when I see Google “scraping” an answer from a site. But as a user I must admit that it really makes my life easier, and if the answer is followed by a link to the source (and that link should be more visible as such, not in light grey), I found myself clicking on that link many times and with a far more convinced interest than when I find the same hint from a search organic snippet. And that is surely better also for a the web site owner. So… after a more paused reflection, what I think Google is doing is not really scraping, but: a) offering an immediate answer for who is looking just that, especially on mobile; b) is doing somehow a sort of Curation of its own indexed data.
Where do you see the Knowledge Graph expanding to by 2020?
I can see more people working to help expand the amount of information shown in knowledge panels by 2020. We will see information that is publicly accessible but not necessarily publicly available on a wide scale, showing up in knowledge panel or Google Now card, or Google Field Trip card. These will include things like information from historical marker programs, inscriptions on landmarks and memorials, or from documents like historical register applications.
I strongly expect the on-the-fly Knowledge Graph to expand rapidly. Google can’t rely on human-edited databases for entity data – they have to be able to create entities and relationships directly from their index. Honestly, though, that expansion will happen in 2014-2015. By 2020, Google will have made the SERP completely modular, allowing for any variation of device, screen, resolution, etc. Ten-result pages will be gone and replaced with fully dynamic combinations of knowledge panels, targeted results (maybe just one or a handful, depending on the use case), and entity/relationship browsing. I’d expect something less linear and more mind-map style, especially for data on people, places, and things. I’d also expect the Knowledge Graph to expand into social and be more and more personalized. Part of that is already available in Google Now cards, but I’m not just talking about things like your flight status. I think Google will try to extract your own relationships and build on your network. There’s a huge untapped commercial potential in being able to personalize product recommendations built on your trust of your own connections, for example. Your Knowledge Graph experience and mine in 2020 may be completely different.
It’s hard to know or even preview. What I expect is that Google will start looking at ways to avoid that people will be “spamming” the Knowledge Graph itself, which is now theoretically possible (and easy), as we can manipulate the sources from where big part of the information is pulled from.
The question of ethics surrounding the Knowledge Graph will no doubt continue for many months/years but there is one fact that is not going away; users love it. Providing answers within the search results not only allows users access to information at a glance but they also allow them to do all this within Google’s environment. That’s good UX. To paraphrase Socrates once more “From the users deepest desires often come the SEOs deadliest hate.” While the Knowledge Graph continues to give users a superior search experience; we can expect them to display more and more information within the SERPs. Ethical or not…
More on influencing the Knowledge Graph here but as always, lets discuss in the comments!