FAQ FAQ | Forum Rules and Policies Forum Policies
 
Memberlist Member List | Calender Community

Navigation

Home
Tutorials


References
Web design tips
Dating
Graphic Design tips
Search Engine Promotion
User-Agent database
About the Internet
Web Technology
Web Screenshot Archive
Our Newsletter
Search

Free Stuff
Free graphics
Free desktop wallpaper
Free HTML templates
Free scripts

Free web-based tools
Graphic Design Contest
Affordable Web Hosting
Educational Toys
Diaper Cake

Services
Learning Toys
Services summary
Dog Supplies
Dog Collars
Dog Beds
Dog Clothes
Dog Toys
Cake Decorating
Web design services
Web design packages
Logo design services
Lake Norman Coupons
Graphic design services
Pricing
Portfolio
Custom Website Design
Unique Baby Gifts

Inside i.D.
Our partners
Advertise
Site Map
FAQ
Contact
About

Robots, spiders, worms and crawlers

All major search engines send out a little program called a 'spider' to index your pages. Some search engines use them to 'index' your entire site, some just 1 or 2 pages. Spiders take a 'snapshot' of your page, and determine what your page is about by looking at text on the page, META tags, and various other page factors. Most directories such as Looksmart, Zeal, and the Open Directory Project also send out a spider. However, since they are not search engines, the primary function of their spiders are to ensure your site is still up and running.

These robots leave a trace behind of their access attempts in your server log files just as a human visitor does, so if you have access to your stats you will be able to spot them. Your best hint of this indexing attempt will be seen by checking access attempts to your 'robots.txt' file in the root of your webs directory. If you don't have a 'robots.txt' file that is because you never created one. Don't worry however, a spider will still crawl your site without one. All search engines check for this little text file that will tell the crawler where and where not to go. It can also allow and disallow certain robots if you find a particular spider to be nasty in nature. The main purpose of this file is so the robot will not index directories or files it isn't supposed to, such as cgi directories, administration files, etc. If you wish to create a 'robots.txt' file but don't know where to start, visit the 'official' robots.txt site by clicking here.

Unfortunately, there are also malicious spiders on the web that are used for reasons other than search engines. Some spiders are designed to copy your website to the clients hard-drive, others are designed to collect e-mail addresses to be used for sending unsolicited e-mail.


Find out how to stop SPAM and junk e-mail by clicking here.

Click here for a list of spiders that have visited this site.

Spiderhunter has excellent resources about spiders, if you wish to learn more.

 

 

 

 

 


Cool Sites

Website Templates
Free Hit Counter
$2.95/month Web Hosting
Reseller Hosting

 



Logo Design | Photoshop Tutorials | Dreamweaver Tutorials | Non GMO Canola
© 2000-2004 IceHouse Designs, Inc. View Privacy Statement.

Valid HTML 4.01!

Valid CSS!