Delivery URL Blocked by Googlebot

Hello!

We've built a single page application in Javascript using the Kentico API. The main purpose of this site is for SEO ranking and indexing. However when testing the site through Googlebot it says that the robots.txt on the Kentico server is blocking it and it cannot see how our page is going to be built.

Is there anyway to get this unblocked or do you have any advice on how to get around this? Without this solution we won't be able to use Kentico on our SEO sites.

Comments

  • JanL@kentico.com[email protected] Czech RepublicMember, Administrator, Kentico Staff admin

    Hello,

    Let us just understand your scenario better. Where is your robots.txt served from? Is it Kentico CMS/EMS or your custom web application? If so, what is the HTTP server? Is it IIS, Apache, or something else? Do you serve robots.txt as a static content, or does it get generated dynamically? By what technology? ASP.NET, PHP, NodeJS?

    Jan

  • Hi Jan,

    The file in question is being served from Kentico CMS, specifically https://deliver.kenticocloud.com/robots.txt.

    Looks like the Disallow is set for all pages, so Googlebot cannot read the data that is coming from the API call to build our pages.

  • JanL@kentico.com[email protected] Czech RepublicMember, Administrator, Kentico Staff admin

    Hi Natasha,

    The file in question is being served from Kentico CMS, specifically https://deliver.kenticocloud.com/robots.txt.

    I see. You're trying to use the robots.txt file used for internal purposes of the Delivery/Preview API endpoint. By virtue, this robots.txt file cannot cover all content served by client applications using Kentico Cloud as a backend service.

    You should use your own robots.txt file that is served off of your server. That way you'll be able to edit it according to your needs.

    But let me understand the problem even more. Haven't you run into issues with content served directly by our API? Such issue might have been the reason you've tried to refer to our robots.txt in your Googlebot settings.

    Jan

  • nhanna@envisiagroup.com[email protected] Member
    edited August 2017

    Hi Jan,

    We do have a robots.txt on our own server with the following settings:
    User-agent: *
    Disallow:
    Allow: .js
    Allow: .css

    However since Googlebot is hitting the Delivery/Preview API endpoint it will refer to your robots.txt since it's on your host. We have not run into any other issues while trying to serve content via API.

  • JanL@kentico.com[email protected] Czech RepublicMember, Administrator, Kentico Staff admin

    Hi Natasha,

    However since Googlebot is hitting the Delivery/Preview API endpoint it will refer to your robots.txt since it's on your host.

    OK, that's something to be thought about thoroughly. I'll discuss the issue with our colleagues in the Delivery/Preview API team and I'll let you know.

    Jan

  • JanL@kentico.com[email protected] Czech RepublicMember, Administrator, Kentico Staff admin

    Hi Natasha,

    We haven't been able to discuss the topic yet but we should do it early next week. I'm sorry for keeping you hanging on.

    Jan

  • Ok, thanks for the update Jan!

  • JanL@kentico.com[email protected]ntico.com Czech RepublicMember, Administrator, Kentico Staff admin

    Hi Natasha,
     
    We've discussed the topic with several colleagues. Nowadays, the best SEO technique for SPA apps is still to have a backend server that renders the initial content (not just "Loading ...") of the main page (or pages) as a static HTML output. That way, all search engines can crawl the content. Then, only the HTML fragments that the user sees after clicking, these should be fetched directly by JS from the Kentico Cloud Delivery/Preview API endpoint. Like pagination, content in infinite scrolling, in tabs, in accordions, etc.
     
    Although Googlebot can evaluate JS logic in SPA apps, it deliberately ignores content that is visible upon user actions (https://moz.com/blog/javascript-seo). Only the initial content of the page is crawled. Other search engines have little to no support for JS. Given that Baidu takes about 80% of Chinese search market, Yandex has about 50% in Russia and Bing claims to have about 25% in US, the SPA apps still lose a significant amount of traffic these days.
     
    As an alternative, there is also a technology in Angular 2 onwards called Angular Universal. If you're familiar with Angular, you may be interested in that. It works in a way that the server not only serves the client-side app to the browsers, but at the same time, it also starts a server-side app (on a different port) that serves pre-rendered HTML to crawlers and mobile browsers. You may wish to consider that too.
     
    Hopefully I've helped you in doing the right decision about going the SPA way. Should you have questions, feel free to ask.
     
    Jan

  • Hi Jan,

    Thanks for all the info and for pointing us in the right direction! We'll take a second look at our setup and try the suggestions you made.

    -Natasha

  • mabrahams675mabrahams675 Sydney, AustraliaMember

    Are you using a framework such as Angular or React?

    For Angular there's a solution for this called Angular Universal where you have a single codebase that runs both server-side and client-side (the server side requires Express and nodeJS). I've only gone as far as spinning up some boilerplate projects to see how it works but haven't actually developed anything with it so far.

    There are a number of solutions for React from what I can see.

  • Since this was just a proof of concept we didn't use a framework, just pure JS, though we'd probably end up going with React if we keep it on the client side.

  • JanL@kentico.com[email protected] Czech RepublicMember, Administrator, Kentico Staff admin

    Hi guys, it's great to see more people with experience with Universal. If you create a Universal project with Kentico Cloud, just paste here a link to it. We'll love to see it!

  • leelee Member ✭✭
Sign In or Register to comment.