THE SIXTH INTERNATIONAL CONFERENCE ON FORENSIC COMPUTER SCIENCE (ICOFCS 2011)

THE SIXTH INTERNATIONAL CONFERENCE ON FORENSIC COMPUTER SCIENCE

Print ISBN 978-85-65069-07-6 - Online ISBN 978-85-65069-05-2, pp 166-172
DOI: 10.5769/C2011019 and http://dx.doi.org/10.5769/C2011019

Computação Forense via agrupamento hierárquico de documentos

By Luís Filipe da Cruz Nassif, and Eduardo Raul Hruschka

To download this paper, click here.

ABSTRACT

In computer forensic analysis, hundreds of thousands of files are usually analyzed. Most of the data available in these files consists of unstructured text that are hard to be analyzed by human beings. In this context, the use of automated techniques, based on text mining, is of great relevance. In particular, clustering algorithms can help to find new, useful, and potentially actionable knowledge from text files. This work presents an approach that applies document clustering algorithms to forensic analysis of computers seized in police investigations. It was carried out a comparative study of three hierarchical clustering algorithms - Single Link, Complete Link and Average Link - when applied to five textual databases derived from real cases. In addition, it was used the Silhouette relative validity index for automatically estimating the number of groups. To the best of our knowledge, studies of this nature, especially considering the use of hierarchical algorithms and the automatic estimation of the number of clusters, have not been reported in the literature about computer forensics. This study can thus serve as a starting point for researchers interested in developing further research in this particular application domain. In brief, the experiments performed show that the algorithm Average Link provided the best performances. This study also presents and discusses several practical results for both researchers and practitioners of computer forensic analysis.

KEYWORDS

Computer Forensics; data clustering; text mining.