| |
| Introduction |
|
| |
ICC-crawler
was developed by Chikayama-Taura laboratory at University of Tokyo and is
operated by Knowledge Clustered Group at NICT. The
main goal behind developing the crawler is to collect Web pages
for researches related to Web-search and data mining. Recently,
we are planning to use it for crawling weblogs too. The Crawler
is used by the members of Knowledge Clustered Group at NICT to crawl
Web-pages only for the research purposes. Our crawling policy
distinctly respects the general crawling norm. Though we duly
understand the concern of the webmasters, we would like to assure
that our crawler is only crawling pages for performing researches
and not for any business use. Please have a glance at our crawling
policy for better understanding. We sincerely appreciate your
cooperation and support. |
| Policy |
|
| |
Our
Crawler always respects the common crawling norm as like following:
-
If
the respective Web page has the meta tag included as follows,
our crawler never crawls the page.
Ex:<meta name="robots" content="nofollow,
noindex">
-
It
always reads the "robots.txt" and never crawls
the restricted pages.Ex:
User-agent: *
Disallow: /cgi-bin
User-agent: ICC-Crawler
Disallow: /
-
Given Crawl-Delay in /robots.txt, our crawler will connect every
"Crawl-Delay" time. Otherwise, the rate of access will be controlled so that the crawler does not inflict excessive load on the accessed servers.
-
In
case, anyone wants his/her pages not to be crawled at all,
if he/she kindly contact us, we will make sure that it is
properly respected from then onwards.
|
|
Information on Current Crawling | |
| |
The hostnames and IP addresses of currently crawling machines are:
202.180.34.186
|
| Goal |
|
| |
We
would like to clarify again that our crawler is collecting pages
solely for research purposes. We are interested in crawling large
volume of pages for following ongoing researches at our group:
-
Analysis of Web pages to provide useful criteria on information credibility,
-
Clustering of Web pages,
- Observing
Analysis of Weblogs.
|
| Contact |
|
| |
For any query or comment or request please send mails to us.
Information Credibility Criteria Project
Knowledge Clustered Group,
Knowledge Creating Communication Research Center,
National Institute of Information and Communications Technology.
Phone. +81-774-98-6825
Fax. +81-774-98-6960 |
|