Data di Pubblicazione:
2009
Abstract:
Document clustering techniques have been applied in several areas, with the web as one of the most recent and influential. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. This work proposes a novel heuristic online document clustering model that can be specialized with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches. The obtained results confirm the validity of the proposed method both for batch scenarios and online scenarios where document collections can grow over time.
Tipologia CRIS:
Articolo su Rivista
Keywords:
Online clustering; Short documents analysis; Similarity measures
Elenco autori:
M., Carullo; Binaghi, Elisabetta; Gallo, Ignazio
Link alla scheda completa:
Pubblicato in: