Google Is Dropping Support for Noindex in Robots.txt
Google announced that they are “retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019.” This means that GoogleBot will now ignore any “noindex” directive given in the robots.txt file.
If you relied on this rule to control what Google crawls, you still have options to keep pages out of the index. According to Google’s post on the Webmaster Central Blog sites still have five ways to prevent pages from being indexed and displayed in the SERP:
“Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
404 and 410 HTTP status codes: Both status codes mean that the page does not exist, which will drop such URLs from Google’s index once they’re crawled and processed.
Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google’s index.
Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google’s search results.” (A note on unsupported rules in robots.txt, 7/2/2019)
If you need to switch from a noindex directive in the robots.txt file to a new method, make sure you do your research on which is the best alternative for your situation.