I personally would recommend not blocking that. For example, the White House recently rolled out new robot.txt and I think they blocked the images, directory or CSS or JavaScript, something like that.
You really don’t need to do that. In fact, sometimes it can be very helpful; if we think something spammy is going on with JavaScript, if somebody is doing a sneaky redirect or something like that.
My personal advice would be to let Googlebot go ahead and crawl that. It’s not like these files are huge anyway, so it doesn’t consume a lot of bandwidth. My personal advice; just go ahead and let Googlebot have access to all that stuff and most of the time, we won’t ever fetch it. But in rare occasion when we’re doing a quality check on behalf of someone, or receive a spam report, then we can go ahead and fetch that and make sure that your site is clean and not having any sorts of problems.
Related posts:
- Now that Google can crawl JavaScript links, what is going to happen to all those paid links that were behind JavaScript code? Will Google start penalizing them?
- What is the best way to serve different content according to user country IP (legal reasons)?
- Does Googlebot use inference when spidering – having crawled site.com/article/page1.htm and /page2.htm, can it guess at the existence of a /page3.htm and crawl it? Or does it stick entirely to what it finds via the link graph and/or Sitemaps/feeds?
- How does Google calculate site load times in the data it exposes in Google’s Webmaster statistics? Is the calculation simply average time to get and receive the HTML content for a page?
- What impact does “page bloat” have on Google rankings? Most of the winners in SEO seem to have very simple pages (very few images, HTML-only design), sometimes to the detriment to the user in a poorly designed page.
