Want to remove a page from the UW search results? Here's how.
Posted March 30th, 2009 by Anonymous
Do old links from your site still appear in search results? Have you moved a page to a new URL but UW search is still showing the old link? If so, this article is for you. We will go through four methods of removing links from search engines.
Many problems are caused in UW search results if old pages are left up or are moved without redirecting them properly. You can help improve the quality of our search results by correctly removing and redirecting your old pages.
Search results come from Google
The UW search results come directly from external search tools - usually Google. This means that UW has no control over what links appear in the results and how they are ordered. In order to remove a page from the search results, you need to use other methods to tell the search engines that this page no longer exists, or has moved to a new address.
Because UW search results usually come from Google, we will be discussing methods for removing pages from Google search results in this article. However, most of these techniques work in Yahoo, MSN, and Ask.com as well.
What happens when you delete a page
Search engines index pages by their URLs. They don't check these url's every day, and they won't immediately remove a url if they can't find the page. If you remove a page from your site it may take weeks or months for search engines to naturally realize that this page no longer exists. This means that outdated URLs may continue to appear in the search results for weeks or even longer.
How to remove a link from the search results
There are a few things you can do to get links out of the search results:
-
Moving a page to a new URL
If you are replacing one page with a new version at a different URL, you should consider keeping the page at its existing address. Moving pages can cause many problems, including poor search results and broken links from other pages. Keeping pages at their existing URLs is usually the best option.
If this is not possible or desirable, and you need to move to a new URL, you should tell the search engines that the page has moved. How do you do that? If you are on an Apache (Linux) server you can use directives in a file called .htaccess to tell the search engine the page has moved.
(If you don't know what kind of server your site is on, or this seems too technical for you, please contact your technical support staff for assistance. Most UW servers use Apache).
The .htaccess file is a simple text file containing some instructions for the server. To create it, simply make a new text document using a text editor (e.g. Notepad on Windows) or your HTML editor. Name this file .htaccess.
Note: Depending on what you are using to mange your site you may need to turn on hidden files to see the .htaccess file. In Dreamweaver's Files panel, choose View > Show Hidden Files from the Files drop-down menu. If your operating system doesn't let you create a file with no extension, simply create htaccess.txt then rename it to .htaccess.
In this file we need to use a 301 Redirect to tell the server that the page has moved permanently to a new address. This line looks like this:
Redirect 301 /folder/file.html http://yoursite.uwaterloo.ca/newfolder/file.htmlTo redirect an entire folder, use this:
Redirect 301 /folder http://yoursite.uwaterloo.ca/newfolderUpload the .htaccess file to the root directory of your website (where the index.html file is) and check to see if it works! Be very careful with .htaccess - small errors could cause your site not to work at all.
More information on .htaccess see Redirect Visitors To a New Page/Site from Webweaver.
-
Remove pages from your site
This may seem obvious, but the first thing to do if you want to remove a page from the search results is to actually remove the file from your web server. If you remove links to a page, but the file still exists on the server, search engines will continue to index it.
Why is this? As mentioned above, search engines index URLs. Once they have discovered a URL they will continue to index it until they can't find it or you provide alternate instructions – even if you aren't linking to it anymore.
If you want keep these pages for future reference you may also choose to block them from search engines using robots.txt. If you have a lot of archived pages, it might be a good idea to keep them in a separate archive directory - that way you only have to block one folder. More information on how to do that is in the next section.
The problem here is that – as mentioned above – search engines might not visit your site every day. In fact, they might not visit every week or every month! It may take weeks or months for them to realize this page is missing and remove it from their index. In that case, a few more options are available.
-
Removing URLs using robots directives
There are a few other things you can do to tell search engines to remove your page (or not to index it in the first place).
-
Meta Noindex tag
This is an option for pages that you want to remain available on the web but you do not want to appear in the search results. In this case, simply add the following tag to the <head> section of your web page:
<meta name="robots" content="noindex">To complicate matters, including this tag does not mean that the page will not be crawled. Search engines will continue to visit the page - they just won't include it in the index. To prevent them from visiting the page at all, you need to block them using robots.txt
-
Block search engines using a robots.txt file
A robots.txt file is similar to a .htaccess file in that it is a simple text file that sits in the root directory of your website. In this case, it provides instructions to search engines (robots). You can use this file to tell search engines not to visit a certain section or an individual page on your website. if they encounter this directive, search engines will stop visiting and indexing any URLs included in the directive.
To block all search engines from an individual page:
User-agent: * Disallow: /directory/file.htmlTo block all search engines from a folder
User-agent: * Disallow: /folder/To block all search engines from your entire site:
User-agent: * Disallow: /
Now, because it takes some time for search engines to index your page, it may take them some time to find your noindex or disallow directives. If this process is too slow for you, you have one other option.
More information at Robots.txt in a Nutshell from Robotstxt.org and at Wikipedia.
-
-
Removing URLs using Google Webmaster Tools
If you need something to be removed from the index right away, you can request to have it removed from the Google index using Google Webmaster Tools. You will need to create an account using a Google ID (or sign up for one if you don't already use other Google tools). Once you've signed in, Webmaster Tools will provide you with instructions for verifying your site. This way they will know that you are the rightful owner of the website.
After you have verified your site, Webmaster Tools will give you access to a wealth of information about your site, including:
- File not found errors
- Search terms used to find your site
- Crawl stats (how often your pages are visited by Google spiders)
- Duplicate title tags & meta descriptions
- Option to remove URLs
- + more
The Remove URLs tool is the one we are interested in today. It can be found from the left side menu under Site Configuration > Crawler Access > Remove URL tab. Before requesting removal of a particular URL, make sure it returns a File Not Found (error 404) page when you try to access it and/or is blocked with robots.txt.
After submitting the form, Google will attempt to process your request within 3-5 days. Feedback on the status of your request will be provided in Webmaster Tools.
If you choose to use this option, make sure that all links pointing to your page have been removed. If the search engine encounters your page again (from links on other pages), it will re-index it. The Remove URL option is one-time removal only.
Yahoo also has a Delete URL function.
- Login to post comments