This is a great approach, but detecting the user-agent is the wrong way to decid...

Isofarro · on Oct 7, 2013

That Google Ajax crawler spec is no magic bullet.

Nick Denton: "Dip in uniques largely because of drop in Google refers. Pageviews (which are driven more by core audience) less affected." -- http://twitter.com/nicknotned/status/61152134929981440

Nick Denton: "Google does not fully support "hashbang" URLs. So we're eliminating them rather than waiting for Mountain View." -- http://twitter.com/nicknotned/status/61465859079671808

Nick Denton: "Yeah, I'd advise against hashbang urls. Will kill search traffic -- even if you abide by Google protocol." -- http://twitter.com/nicknotned/status/62595141927583745

alanlewis · on Oct 7, 2013

These tweets are from 2.5 years ago. Has the google bot improved since then? (Honest question)

Isofarro · on Oct 7, 2013

Considering the intention behind the Google document is to enable support for existing Ajax applications, and not the cornerstone of crawlability of newly built apps, probably not.

Also, the same document that's quoted in defence of these Web (unfriendly) Apps is https://developers.google.com/webmasters/ajax-crawling/

Where in the first section of that document: https://developers.google.com/webmasters/ajax-crawling/docs/... There is this:

"If you're starting from scratch, one good approach is to build your site's structure and navigation using only HTML. Then, once you have the site's pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses."

abraham · on Oct 7, 2013

You can use the meta tags with normal URLs and completely ignore hashbang urls.

bceagle · on Oct 7, 2013

> This is a great approach

No, it is not. While this will certainly help your client side app get indexed, it is not 'great'. Other commenters on this thread bring up a number of valid concerns, but in my mind it comes down to two very simply things.

One is that when you are fighting for the top spot in organic traffic, this won't cut it. Off-page SEO is more important than on-page optimizations, but it on-page optimizations still have value.

The other issue is that this approach assumes that the client side rendered view at a particular hash is exactly what should be initially rendered on the server side. While this could work in some cases, it is my experience that it either creates a weird user experience and/or you end up doing hacks on the client side in order to ensure PhantomJS captures the right html.

This is a fine solution for some use cases, but I really hope that the community doesn't think this is the future. This is a temporary hack until we get a good server/client rendering framework in place OR all search engines evolve to capture pure client side apps without any of this.

thoop · on Oct 7, 2013

There definitely are valid concerns with this approach...but until the search engines pick up the slack, this is the solution that we're left with. It's definitely not ideal, but I prefer it over having non-DRY code just to serve incomplete HTML to crawlers.

bceagle · on Oct 7, 2013

The alternative solution is pushstate with an acceptance that you will need to either do some duplication of effort on the server side rendering or creating something along the lines of the AirBnb Rendr framework. I am in the process of doing the former with the plan to do the latter with a future iteration.

benaiah · on Oct 7, 2013

This is a great point. It might seem extreme, but I would advocate never using the User-Agent string to make decisions about what to serve a client. There is too much hackery and history that clouds up the User-Agent (such as every browser identifying itself as Mozilla), and it's almost always a proxy for something else that you actually want to test for.

In some rare situations, it's unavoidable, but even then I'd urge trying to rearchitect the solution to avoid it.

robmcm · on Oct 7, 2013

Actually Google recomend using the IP address of the bot vai a DNS lookup:

http://googlewebmastercentral.blogspot.co.uk/2006/09/how-to-...

benaiah · on Oct 7, 2013

That makes sense, though in an ideal world there would be an abstracted header that said "hey, I'm not gonna render JS the way a regular browser will, so send me something prerendered". Then you could write something that would actually be future-proof and work with other search engines.

The way Google suggests there actually seems a little bit nefarious, as it makes it hard-coded to Google instead of working for any search engine.

mk3 · on Oct 7, 2013

You do realize that you gave a link which is from 2006? And more recent recommendations does not include that. [EDIT] OK as I was downvoted I will clarify my point: https://developers.google.com/webmasters/ajax-crawling/docs/... This is recommended practice for crawling javascript generated pages, no need to lookup for spiders IP address as someone mentioned.

robmcm · on Oct 7, 2013

thoop · on Oct 7, 2013

Thanks, I'll work on adding that. I'd still like to use a useragent fall-back for the crawlers that might not use the _escaped_fragment_ protocol.

mk3 · on Oct 10, 2013

escaped fragment is used by major search engines google, bing, yandex (if you are interested in russian markets), not sure about Yahoo! as it was ages since I used yahoo for anything.

thomasfromcdnjs · on Oct 7, 2013

I wrote a similar open source library that uses this approach last year

http://github.com/apiengine/seoserver

and the blog post related to it

http://backbonetutorials.com/seo-for-single-page-apps/

gcb1 · on Oct 7, 2013

no it is not a great approach for 99% of the cases.

the issue with getting content from scripted sites is not the initial part... you could use noscript and be done much easier.

the real issue is that most sites require user interaction to get to most content. this does nothing besides providing a convenient DoS entry point.

nice hack though.