ICC-Crawler
Japanese Page
 
Introduction
ICC-crawler was developed by Chikayama-Taura laboratory at University of Tokyo and is operated by Knowledge Clustered Group at NICT. The main goal behind developing the crawler is to collect Web pages for researches related to Web-search and data mining. Recently, we are planning to use it for crawling weblogs too. The Crawler is used by the members of Knowledge Clustered Group at NICT to crawl Web-pages only for the research purposes. Our crawling policy distinctly respects the general crawling norm. Though we duly understand the concern of the webmasters, we would like to assure that our crawler is only crawling pages for performing researches and not for any business use. Please have a glance at our crawling policy for better understanding. We sincerely appreciate your cooperation and support.
Policy
Our Crawler always respects the common crawling norm as like following:
  • If the respective Web page has the meta tag included as follows, our crawler never crawls the page.
    Ex:<meta name="robots" content="nofollow, noindex">
  • It always reads the "robots.txt" and never crawls the restricted pages.Ex:
    User-agent: *
    Disallow: /cgi-bin
    User-agent: ICC-Crawler
    Disallow: /
  • Given Crawl-Delay in /robots.txt, our crawler will connect every "Crawl-Delay" time. Otherwise, the rate of access will be controlled so that the crawler does not inflict excessive load on the accessed servers.
  • In case, anyone wants his/her pages not to be crawled at all, if he/she kindly contact us, we will make sure that it is properly respected from then onwards.
Information on Current Crawling
The hostnames and IP addresses of currently crawling machines are:

202.180.34.186
Goal
We would like to clarify again that our crawler is collecting pages solely for research purposes. We are interested in crawling large volume of pages for following ongoing researches at our group:
  • Analysis of Web pages to provide useful criteria on information credibility,
  • Clustering of Web pages,
  • Observing Analysis of Weblogs.
Contact
For any query or comment or request please send mails to us.


Information Credibility Criteria Project
Knowledge Clustered Group,
Knowledge Creating Communication Research Center,
National Institute of Information and Communications Technology.
Phone. +81-774-98-6825
Fax. +81-774-98-6960
Copyright ©2006 Knowledge Clutersted Group