Hi again,
Thank you for all the suggestions. We've been tossing them around, and
our plan is to take a "comprehensive" approach - robots.txt, tags, text,
etc.
We plan on having a separate i-mode crawl in which we'll use a
DoCoMo-style user agent for spidering. (We already have separate crawls
for WML and HDML, in which we use UP/Nokia user agents w/ a parenthesized
"Googlebot..." at the end.) [Incidentally, will the parenthesized
"Googlebot" throw anyone off? I'll find out soon enough, I suppose.]
At first glance, however, it seems infeasible/undesirable to crawl each
potential page multiple times, using a different i-mode user agent every
time. The final index should have just one representative copy of the
i-mode page that we can accurately search against. However, once the user
clicks on the search result, he/she will go directly to the page & will be
served whatever customized output the site has, regardless of the specific
version we indexed. Nick & Craig commented on this a bit... but tell me -
in general do sites redirect users to customized pages with different URLs
depending on the user agent or does the same URL have different content
depending on the user agent? For example, if I go to site www.xxx.com/i
with, say, a P503i, will I be automatically redirected to some another
page, such as www.xxx.com/i/p503i? This could be problematic given the
scheme I described above.
I like the robots.txt idea (User-agent: DoCoMo/*) a *lot*, but once again
- is this something that most i-mode developers already do or would have
to spearhead something?
And, yes, we do take robots.txt files seriously. From what I've seen,
most "violations" are because the robots file didn't exist at the time of
the crawl or, if it did, the format was wrong.
We'll also be looking for good start sites. So, I'd appreciate any
suggestions, including the URLs of your own i-mode sites. (Please mail me
directly w/ these, as I don't want to clutter the list!)
Lauren
[ Did you check the archives? http://www.appelsiini.net/keitai-l/ ]
Received on Thu Feb 15 09:11:46 2001