How do we keep this site running? This post may contain affiliate links — the cost is the same to you, but we get a referral fee. Compensation does not affect rankings. Thanks!
According to Google, SSL pages can be disallowed by serving a separate robots.txt file for each protocol (http and https). So for example:
For your http protocol (//www.example.com/robots.txt)
For your https protocol (//www.example.com/robots.txt)
Linux vs IIS
This approach works fine on Linux/ Apache, but what about Windows/ IIS? The easiest way to implement this technique is to have your SSL on a separate domain entirely, or a subdomain. If the SSL is on the same domain and you no longer need the SSL, you can redirect your ASP pages to non-SSL (http) pages. But in most cases you are not trying to get rid of the https pages, you simply don’t want them indexed. In this case you can ask Google to remove the few pages you want removed. At any rate you should begin using absolute links to prevent this from happening again:
Absolute vs Relative Links
Commonly the problem involves the spider following a relative link from an http page to an https page. Since the link doesn’t contain “https” the spider doesn’t realize the destination is https and indexes it. To prevent this from happening I recommend using absolute links, so the spiders recognize the https protocol and differentiate it from the http page. Indexing of multiple https versions of pages could also lead to duplicate content penalties./p>
Solutions I don’t recommend
There are some solutions to this problem floating around the development community that I recommend you don’t take.
The Google Remove URL Tool
One is using the Google URL remove tool to ban the https pages. I consider this method to be unreliable, and taking the chance that your entire site will be removed from the index is not worth it.
The Meta NoIndex Tags
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
This tag will prevent the robots from indexing the page. While in theory this method should work (you simply add the meta tags to your https pages, I’m hesitant about this method as you don’t want to risk blocking the http version as well. For example, if Google is following the link to the https page, and then sees it should not index that page, it may do the same for all variants of that link, including http.