Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

This is a great approach, but detecting the user-agent is the wrong way to decide if you should pre-render the page. If you include the following meta tag in the header:

   <meta content="!" name="fragment">
then Google will request the page with the "_escaped_fragment_" query param. That's when you should serve the pre-rendered version of the page.

Google has documentation on this here: https://developers.google.com/webmasters/ajax-crawling/docs/... and we've been using this method at https://circleci.com for the past year.

Waiting for google to request the page with _escaped_fragment_ should also prevent you from getting penalized for slow load times or showing googlebot different content.



That Google Ajax crawler spec is no magic bullet.

Nick Denton: "Dip in uniques largely because of drop in Google refers. Pageviews (which are driven more by core audience) less affected." -- http://twitter.com/nicknotned/status/61152134929981440

Nick Denton: "Google does not fully support "hashbang" URLs. So we're eliminating them rather than waiting for Mountain View." -- http://twitter.com/nicknotned/status/61465859079671808

Nick Denton: "Yeah, I'd advise against hashbang urls. Will kill search traffic -- even if you abide by Google protocol." -- http://twitter.com/nicknotned/status/62595141927583745


These tweets are from 2.5 years ago. Has the google bot improved since then? (Honest question)


Considering the intention behind the Google document is to enable support for existing Ajax applications, and not the cornerstone of crawlability of newly built apps, probably not.

Also, the same document that's quoted in defence of these Web (unfriendly) Apps is https://developers.google.com/webmasters/ajax-crawling/

Where in the first section of that document: https://developers.google.com/webmasters/ajax-crawling/docs/... There is this:

"If you're starting from scratch, one good approach is to build your site's structure and navigation using only HTML. Then, once you have the site's pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses."


You can use the meta tags with normal URLs and completely ignore hashbang urls.


> This is a great approach

No, it is not. While this will certainly help your client side app get indexed, it is not 'great'. Other commenters on this thread bring up a number of valid concerns, but in my mind it comes down to two very simply things.

One is that when you are fighting for the top spot in organic traffic, this won't cut it. Off-page SEO is more important than on-page optimizations, but it on-page optimizations still have value.

The other issue is that this approach assumes that the client side rendered view at a particular hash is exactly what should be initially rendered on the server side. While this could work in some cases, it is my experience that it either creates a weird user experience and/or you end up doing hacks on the client side in order to ensure PhantomJS captures the right html.

This is a fine solution for some use cases, but I really hope that the community doesn't think this is the future. This is a temporary hack until we get a good server/client rendering framework in place OR all search engines evolve to capture pure client side apps without any of this.


There definitely are valid concerns with this approach...but until the search engines pick up the slack, this is the solution that we're left with. It's definitely not ideal, but I prefer it over having non-DRY code just to serve incomplete HTML to crawlers.


The alternative solution is pushstate with an acceptance that you will need to either do some duplication of effort on the server side rendering or creating something along the lines of the AirBnb Rendr framework. I am in the process of doing the former with the plan to do the latter with a future iteration.


This is a great point. It might seem extreme, but I would advocate never using the User-Agent string to make decisions about what to serve a client. There is too much hackery and history that clouds up the User-Agent (such as every browser identifying itself as Mozilla), and it's almost always a proxy for something else that you actually want to test for.

In some rare situations, it's unavoidable, but even then I'd urge trying to rearchitect the solution to avoid it.


Actually Google recomend using the IP address of the bot vai a DNS lookup:

http://googlewebmastercentral.blogspot.co.uk/2006/09/how-to-...


That makes sense, though in an ideal world there would be an abstracted header that said "hey, I'm not gonna render JS the way a regular browser will, so send me something prerendered". Then you could write something that would actually be future-proof and work with other search engines.

The way Google suggests there actually seems a little bit nefarious, as it makes it hard-coded to Google instead of working for any search engine.


You do realize that you gave a link which is from 2006? And more recent recommendations does not include that. [EDIT] OK as I was downvoted I will clarify my point: https://developers.google.com/webmasters/ajax-crawling/docs/... This is recommended practice for crawling javascript generated pages, no need to lookup for spiders IP address as someone mentioned.


True


Thanks, I'll work on adding that. I'd still like to use a useragent fall-back for the crawlers that might not use the _escaped_fragment_ protocol.


escaped fragment is used by major search engines google, bing, yandex (if you are interested in russian markets), not sure about Yahoo! as it was ages since I used yahoo for anything.


I wrote a similar open source library that uses this approach last year

http://github.com/apiengine/seoserver

and the blog post related to it

http://backbonetutorials.com/seo-for-single-page-apps/


no it is not a great approach for 99% of the cases.

the issue with getting content from scripted sites is not the initial part... you could use noscript and be done much easier.

the real issue is that most sites require user interaction to get to most content. this does nothing besides providing a convenient DoS entry point.

nice hack though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: