Googlebots

= Googlebots crawling = we had problems being visible in Google...
 * a search of site:cwi.unik.no does not reveal any results in google, while all pages are accessible in Microsoft

Searching for missconfiguration

 * the robots.txt file is visible: http://cwi.unik.no/robots.txt
 * installed: user agent switcher in Firefox, have tried, but switching to Googlebots shows the robots.txt and the site "as normal".
 * Thanks to jamesattard.com for information on curl

use of curl to find the pages
"curl" can see all other web pages, but not cwi.unik.no. why?
 * virtual hosts, as defined in /etc/apache2/sites-available . The default contains blocked web pages, e.g. deny from 180.76.0.0/255; 66.249, 62.142, 152,94, 38.101, 83.103, 208.115, 193.37.0.0....

$ curl -A "Googlebot" cwi.unik.no
 * does not show anything

$ curl -A "Googlebot" aftenposten.no
 * works nicely for aftenposten

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 301 Moved Permanently Moved Permanently The document has moved here. Apache Server at aftenposten.no Port 80 $ $ curl -A "Googlebot" wiki.unik.no
 * works nicely for wiki.unik.no

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 

Googlebots is not allowed to crawl the machine
what does that mean?
 * IP-specific firewall rules on ports 80 and 443 that could block the goolgebot

Error Message 403
Googlebots throws error 403, which means: ''The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated.''

Others

 * then your page heads for incorrect robots meta,
 * and your server headers for incorrect configuration.
 * .htaccess" file - with content: deny from 66.249

Analysis of robots.txt file
A very good tools is provided by the guys at: http://tool.motoricerca.info/robots-checker.phtml