A question for Lauren: How does your spider cope with URLS that
sniff-n-serve? If a url is crawled and found to have web content, does
that mean you will not crawl it again?
I really would like to see your crawler crawl a site twice, if requested
in the robot.txt file, with different user agents - one eliciting web
content, the second time eliciting imode content... (or whatever).
I am still for the meta tag approach. People could use it or not as they
wish. Are we really arguing over 20 or so bytes...? (I take all the
points about non-standard meta-tags - but dammit Jim - we are in as good
a position as any to suggest a standard....)
Nick
<shitsukoi>URLs should not be taken to specify the format in which data
is served.... </shitsukoi>
[ Did you check the archives? http://www.appelsiini.net/keitai-l/ ]
Received on Fri Feb 9 13:49:17 2001