<?xml version="1.0" encoding="UTF-8"?><!-- generator="WordPress/2.6.3" -->
<rss version="0.92">
<channel>
	<title>G2 Crawler News</title>
	<link>http://crawler.trillinux.org/news</link>
	<description>G2 Crawler News</description>
	<lastBuildDate>Sat, 01 Nov 2008 15:25:35 +0000</lastBuildDate>
	<docs>http://backend.userland.com/rss092</docs>
	<language>en</language>
	
	<item>
		<title>The Architecture of a Crawler</title>
		<description>I'm going to explain how crawlers work. There are three main tasks that a crawler has to take care of.

	Find new hosts to crawl.
	Request data from a host that is being crawled.
	Display to the user the data gathered.

This design lends itself well to being distributed. Several host crawlers (those that ...</description>
		<link>http://crawler.trillinux.org/news/2008/11/01/the-architecture-of-a-crawler/</link>
			</item>
	<item>
		<title>Recent Updates</title>
		<description>My focus lately has been on hub uptimes. There is a new page showing hub uptime distribution graphs. It gives a visual representation of some of the categories on the uptimes page. The overall hub uptime distribution graph also features two vertical lines. The red line shows where the average ...</description>
		<link>http://crawler.trillinux.org/news/2008/10/19/recent-updates/</link>
			</item>
	<item>
		<title>Quick g2paranha update</title>
		<description>The crawler has been running pretty well with only minor tweaks from day to day which sometimes show up as blips in the graph. It was also down for a few days due to a failing hard drive.

Yesterday the crawler got into the Foxy network again which uses the same ...</description>
		<link>http://crawler.trillinux.org/news/2008/07/10/quick-g2paranha-update/</link>
			</item>
	<item>
		<title>g2paranha - The New G2 Crawler</title>
		<description>Anyone who has read through this blog knows that the crawler has tended to crash fairly often. In recent times it was crashing to much to even continue running it. But rather than give up entirely I decided to write my own crawler. Five weeks later and g2paranha has emerged. ...</description>
		<link>http://crawler.trillinux.org/news/2008/06/23/g2paranha/</link>
			</item>
	<item>
		<title>The State of G2</title>
		<description>I was reading the Gnutella2 article on Wikipedia today and I noticed both entries in the External Links section point to my sites (crawler.trillinux.org and g2.trillinux.org). The latter being the new home for the G2 specs after gnutella2.com was allowed to expire. This got me thinking that it looks like ...</description>
		<link>http://crawler.trillinux.org/news/2008/02/28/the-state-of-g2/</link>
			</item>
	<item>
		<title>More Crawler Downtime</title>
		<description>I spent last weekend replacing my router with another computer. The transition was a bit bumpy but things are starting to get sorted out. More extended periods of downtime are possible over the next few weeks as I get things completely transitioned and working reliably. </description>
		<link>http://crawler.trillinux.org/news/2008/01/30/more-crawler-downtime/</link>
			</item>
	<item>
		<title>Crawler Downtime</title>
		<description>The crawler has been down since Friday because I'm doing hardware work on my router. It is also the computer that does backups and it has had a slowly failing hard drive for the last few months. I finally bought a new hard drive and have been deciding how to ...</description>
		<link>http://crawler.trillinux.org/news/2007/10/08/crawler-downtime/</link>
			</item>
	<item>
		<title>New Graphs</title>
		<description>I added some new graphs back in April on the hub density page. They show the percentage of hubs with a certain number of leaves. This way the capacity of hubs can be tracked more granularly than just the average leaves per hub statistic.

Let me know about other improvements you'd ...</description>
		<link>http://crawler.trillinux.org/news/2007/05/24/new-graphs/</link>
			</item>
	<item>
		<title>Country Database Update</title>
		<description>You may have noticed that the number of "Unknown"/"??" countries has been increasing since the graphs came back in June. This is because the country is determined by using MaxMind's GeoLite Country database which maps IP addresses to countries. The version the crawler was using hadn't been updated since September ...</description>
		<link>http://crawler.trillinux.org/news/2006/10/11/country-database-update/</link>
			</item>
	<item>
		<title>New best hub uptime</title>
		<description>Today a new best hub uptime was established beating the old one of 209h 10m 45s. This is because for some reason unknown to me the crawler has decided to run for over a week continuously without dying. I did however check system logs recently and it does look like ...</description>
		<link>http://crawler.trillinux.org/news/2006/10/06/new-best-hub-uptime/</link>
			</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.460 seconds -->
