Cassiopeia – Towards a Distributed and Composable Crawling Platform

Authors

  • Leszek Siwik
  • Robert Marcjan
  • Kamil Włodarczyk

DOI:

https://doi.org/10.26636/jtit.2014.2.1026

Keywords:

composable software, distributed Web crawling framework, event-driven architecture, event-driven processing, SEDA, Web crawler

Abstract

When it comes to designing and implementing crawling systems or Internet robots, it is of the utmost importance to first address efficiency and scalability issues (from a technical and architectural point of view), due to the enormous size and unimaginable structural complexity of the World Wide Web. There are, however, a significant number of users for whom flexibility and ease of execution are as important as efficiency. Running, defining, and composing Internet robots and crawlers according to dynamically-changing requirements and use-cases in the easiest possible way (e.g. in a graphical, drag & drop manner) is necessary especially for criminal analysts. The goal of this paper is to present the idea, design, crucial architectural elements, Proof-of-Concept (PoC) implementation, and preliminary experimental assessment of Cassiopeia framework, i.e. an all-in-one studio addressing both of the above-mentioned aspects.

Downloads

Download data is not yet available.

Downloads

Published

2014-06-30

Issue

Section

ARTICLES FROM THIS ISSUE

How to Cite

[1]
L. Siwik, R. Marcjan, and K. Włodarczyk, “Cassiopeia – Towards a Distributed and Composable Crawling Platform”, JTIT, vol. 56, no. 2, pp. 79–89, Jun. 2014, doi: 10.26636/jtit.2014.2.1026.

Most read articles by the same author(s)