Following are the slides of my paper presentation at ICISA, 2010 in Seoul, Korea.
This paper considers tradeoffs in web crawler design especially from the perspective of events versus threads[1,2]. The paper also makes some recommendations for better OS support for web crawling. It points out that the two principal problems with web crawling are:
This paper considers tradeoffs in web crawler design especially from the perspective of events versus threads[1,2]. The paper also makes some recommendations for better OS support for web crawling. It points out that the two principal problems with web crawling are:
- Choosing the right pages to crawl
- Basic architecture for performing the crawl
If any of you is interested in more details I recommend him to contact me through email at or Moreover you can also request for a copy of the paper by personal email.
[1] von Behren, R., Condit, J., and Brewer, E. Why Events are a Bad Idea (for High-concurrency Servers). In 10th Workshop on Hot Topics for Operating Systems (HotOS IX), Lihue, Hawaii, May 2003.
[2] Ousterhout, J. Why threads are a bad idea (for most purposes). In Invited talk presented at 1996 USENIX Annual Technical Conference, San Diego, CA, October 1996.
[3] Engler, D. R., Kaashoek, M. F., and O'Toole, J. 1995. Exokernel: an operating system architecture for application-level resource management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (Copper Mountain, Colorado, United States, December 03 - 06, 1995). M. B. Jones, Ed. SOSP '95. ACM, New York, NY, 251-266.