Cassiopeia – Towards a Distributed and Composable Crawling Platform
DOI:
https://doi.org/10.26636/jtit.2014.2.1026Keywords:
composable software, distributed Web crawling framework, event-driven architecture, event-driven processing, SEDA, Web crawlerAbstract
When it comes to designing and implementing crawling systems or Internet robots, it is of the utmost importance to first address efficiency and scalability issues (from a technical and architectural point of view), due to the enormous size and unimaginable structural complexity of the World Wide Web. There are, however, a significant number of users for whom flexibility and ease of execution are as important as efficiency. Running, defining, and composing Internet robots and crawlers according to dynamically-changing requirements and use-cases in the easiest possible way (e.g. in a graphical, drag & drop manner) is necessary especially for criminal analysts. The goal of this paper is to present the idea, design, crucial architectural elements, Proof-of-Concept (PoC) implementation, and preliminary experimental assessment of Cassiopeia framework, i.e. an all-in-one studio addressing both of the above-mentioned aspects.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2014 Journal of Telecommunications and Information Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.