slashdot picked up the infrasearch story yesterday. there were a few good comments on problems with the concept:
“Imagine a network of a million hosts (a small subset of all webservers). Each of these is running a gnutella-based search-engine. On one of the servers is an interface to search the network for some information. The query is forwarded onto the overlay network, to say 10 nodes at each node, assuming some mechanism is in place to avoid loops. if the network is well interconnected, it will take about 5-6 hops to reach an edge of the cloud (probably a couple of times more to reach all the nodes). As
soon as the first nodes get the search-request, they send back results, say limited to the first 5-10 most significant hits. Each reply has a number of tuples consisting of (URLs, a description and an indication of how close the match is and a timestamp and probably some more), maybe 1-2 kB per reply. Say 10% of servers have a match, then 100000 hosts will at some point send back results.I calculate, roughly a 100 MB of results will be arriving at the searching node within a few minutes, if it can process the dataflow.
This is only one search, both the searching nodes and the servers will have to deal with a lot of searches if you look at other search-engines as a comparison. ”
at least that’s better than the bandwidth hogginess that jacob levy noticed while evaluating gnutella:
“…most importantly, it sucks bandwidth. I can easily see how network admins will want to outlaw this beast, if they can. For my evening of experimentation, I downloaded a total of 65 MBytes of files, while my total incoming consumed bandwidth was 365 MBytes, and my upstream bandwidth was 755 MBytes. Yes, really — all that, in a measly five hours.”