Kahuki Webmaster Forum and Discussion Community  

Go Back   Kahuki Webmaster Forum and Discussion Community > Website Development & Management > Web Hosting

Web Hosting Discussions on all aspects of web hosting, choosing a host, reviews and technical things.



Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-25-2006, 01:36 AM
Peter's Avatar
Member
 
Join Date: Apr 2006
Posts: 72
Peter is on a distinguished road
robots.txt Question?

My understanding is you just place the text file in root folder, BUT what kind of syntax, if any, do you use? It would be great if someone could give some insight!

Reply With Quote
  #2 (permalink)  
Old 05-25-2006, 01:40 AM
Rookie
 
Join Date: Aug 2007
Posts: 7
gchsolutions is on a distinguished road
You need to supply more info. What kind of robot are you talking about?

Reply With Quote
  #3 (permalink)  
Old 05-25-2006, 01:41 AM
Member
 
Join Date: Nov 2005
Posts: 64
MrScott is on a distinguished road
Here is an example of Google's robots.txt
If that helps any...

User-agent: *
Allow: /searchhistory/
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues


Use the javascrpit command to easily find robots.ext file on any website...

javascript:void(location.href='http://' + location.host + '/robots.txt')

Reply With Quote
  #4 (permalink)  
Old 05-25-2006, 02:16 AM
Member
 
Join Date: Nov 2005
Posts: 53
Smisha is on a distinguished road
Hi

Robots txt is just a text file which tells robots (which are automated indexers for search engines know where to visit)

If an indexing robot knows about a document, it may decide to parse it, and insert it into its database. How this is done depends on the robot: Some robots index the HTML Titles, or the first few paragraphs, or parse the entire HTML and index all words, with weightings depending on HTML constructs, etc. Some parse the META tag, or other special hidden tags.


STUFF TO TAKE INTO CONSIDERATION
=================================
► If you want to tell robots (major search engines) which pages to follow or which pages not to follow you have to include this meta tag within the header portion of the site.
<meta name="robots" content="all" />

► If you wish to NOT allow robots index
<meta name="robots" content="noindex, nofollow" />

What that does, it tells the search engine NOT to index that page and NOT to follow the links

► ROBOTS.TXT in root
User-agent:
Disallow:

Where user-agent is the robot name
Where disallow is the expression to tell the user-agent what to not index or follow

The * means ALL USER AGENTS (google msn yahoo ...)
The / in Disallow means 'entire server'


EXAMPLES
=================================
► To exclude all robots from the entire server
User-agent: *
Disallow: /

► To allow all robots complete access
User-agent: *
Disallow:

Or create an empty "/robots.txt" file.

► To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

► To exclude a single robot
User-agent: BadBot
Disallow: /

► To allow a single robot
User-agent: WebCrawler
Disallow:

User-agent: *
Disallow: /

► To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:

User-agent: *
Disallow: /~joe/docs/

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/private.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html


I hope I helped...

Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt file and ranking? kiran Web Hosting 3 05-11-2007 05:56 PM
robots.txt and sitemap question? Eric_Storm Web Hosting 2 04-26-2007 12:03 AM
What does a robots.txt file do? ab909 Web Hosting 6 10-02-2006 12:19 PM
Does robots.txt have any legal precedent? Rachna Web Hosting 4 08-11-2006 07:46 PM
How do I keep Yahoo robots from scanning my webpage? I have a robots.txt file in place, but it's not working. t4x Web Hosting 2 03-28-2006 04:32 PM


All times are GMT. The time now is 04:59 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0