Dev

It took Google’s Web crawlers 15 years to come to terms with JavaScript

Image Credit: Neon Tommy/Flickr

JavaScript was created in 1995. Google’s search engine debuted in 1998. Yet it took 15 years for the two to fully intertwine.

Up until a few months ago, Google’s search engine crawlers couldn’t widely and accurately render one of the Web’s most fundamental programming languages. Nearly two decades old, JavaScript powers much of the Web’s interactivity. For years, Google’s search engine crawlers couldn’t accurately render JavaScript, and thus, missed quite a bit of content.

The reasoning behind this delayed marriage is simple: When Google’s search engine was created, JavaScript was still immature, and Flash powered much of the Web’s interactivity in isolation. As Flash gradually died off, Google stuck to its guns: HTML, and later, CSS.

Today Google shared in a blog post that over “the past few months, our indexing system has been rendering a substantial number of web pages more like an average user’s browser with JavaScript turned on.”

Google is now finally able to interpret the Web much like a modern browser can. For Google, doing so is imperative to its survival. JavaScript is often used today to display content — text, images, and files that Google must understand in order to grow as an ad firm.

This is not a sudden development for Google. Back in 2012, Google “webspam” team head Matt Cutts urged developers to not hide their JavaScript from their crawlers because Google was “getting better” at crawling it.

Like in marriage, Google isn’t a flawless partner; the company’s crawlers aren’t perfect, Google shares, despite all the progress announced today.

More about the companies and people from this article:

Google's innovative search technologies connect millions of people around the world with information every day. Founded in 1998 by Stanford Ph.D. students Larry Page and Sergey Brin, Google today is a top web property in all major glob... read more »

Powered by VBProfiles

44 comments
Timothy Carroll
Timothy Carroll

Fantastic!  Now the annoying javascript pop-over ads will be indexed too!  THE INDUSTRY IS SAVED!

Michael Low
Michael Low

Hope they can index AngularJS sites now.

Richard Ortega
Richard Ortega

I think the only sites this affects greatly of full blown single page JS apps, rather than typical JS sprinkled around your site. Not a freaking deal, majority of crawlers have headless JS renders for years.

Tahir Raza
Tahir Raza

Flash based content has always been an issue when it comes to SEO,google treats them badly but google Indexed Javascript based codes for sure that is what i know from my 1 year past experience ,But may be web crawlers are smarter now and surely they are,and they crawl pages more effectively !!

Siva Nadarajah
Siva Nadarajah

This is incorrect. Rendering and Indexing are two different things. Google always indexed JavaScript.

Robert Ressmann
Robert Ressmann

shit. does this mean that i have to obfuscate every line of my evil code?

jessygrossi
jessygrossi

@FGrante Les liens faits en JS sont détectés à terme. On parle d'obfuscation maintenant. C'est le jeu du chat et de la souris.

GinnyHoge
GinnyHoge

@VentureBeat ? "Eric was chief technology officer and corporate executive officer at Sun Microsystems, where he led the development of Java"

elliotlewis
elliotlewis

@hereinthehive We assume all time, assume they have an internet connection & a browser of some kind & the version of that browser.

FGrante
FGrante

@jessygrossi Curieux de voir jusqu'où peut aller la lecture du JS. Je pense qu'on peut vite trouver des limites.

hereinthehive
hereinthehive

@elliotlewis Just feels like a slippery slope. Even if something can use JS we can be more intelligent about whether it should

hereinthehive
hereinthehive

@elliotlewis And we should broadly assume less. There's movement to not assume a connection. We shouldn't be assuming browser or version now

elliotlewis
elliotlewis

@colinrotherham Nah, be good to get a very bespoke project like that. Is poss to write JS that can run either/both client&server side?

colinrotherham
colinrotherham

@elliotlewis Blank page syndrome. Would like something with server-side render, shared templates with NodeJS etc. Tried anything like that?

elliotlewis
elliotlewis

@colinrotherham Good read that. I saw an impressive ap recently using JS framework. Req initial 1MB download, cached, but still. JS only :-/

elliotlewis
elliotlewis

@hereinthehive ooo dirty tactics, bringing in Flash! I guess my thinking is, we worry about all this but maybe landscape changed anyway.

hereinthehive
hereinthehive

@elliotlewis As with almost everything 'it depends'. I'd maintain as far as possible we don't assume JS, it's like assuming Flash (almost!)

elliotlewis
elliotlewis

@hereinthehive It is hard on complex interaction sites. All that effort when JS is always on touch & if it doesn't download? Hit refresh.

elliotlewis
elliotlewis

@hereinthehive Agreed. 95% of what we build is consumed content. But that's changing. It's not hard to use PE on most sites.

hereinthehive
hereinthehive

@elliotlewis Fundamentals should be consistent. It's not hard to change or layer even great complexity. Case by case how, when & why!

elliotlewis
elliotlewis

@hereinthehive Qu if natural progression of web not slippery slope. Visual = GDeg, Interaction = ProgE. But web getting v interactive.