Nuova ricerca


Dipartimento di Ingegneria "Enzo Ferrari"

Home |


2023 - DICE: a Dataset of Italian Crime Event news [Relazione in Atti di Convegno]
Bonisoli, Giovanni; Pia Di Buono, Maria; Po, Laura; Rollo, Federica

2022 - Online News Event Extraction for Crime Analysis [Relazione in Atti di Convegno]
Rollo, F.; Po, L.; Bonisoli, G.

Event Extraction is a complex and interesting topic in Information Extraction that includes methods for the identification of event's type, participants, location, and date from free text or web data. The result of event extraction systems can be used in several fields, such as online monitoring systems or decision support tools. In this paper, we introduce a framework that combines several techniques (lexical, semantic, machine learning, neural networks) to extract events from Italian news articles for crime analysis purposes. Furthermore, we concentrate to represent the extracted events in a Knowledge Graph. An evaluation on crimes in the province of Modena is reported.

2022 - Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset [Relazione in Atti di Convegno]
Rollo, F.; Bonisoli, G.; Po, L.

The automatic categorization of crime news is useful to create statistics on the type of crimes occurring in a certain area. This assignment can be treated as a text categorization problem. Several studies have shown that the use of word embeddings improves outcomes in many Natural Language Processing (NLP), including text categorization. The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization. The approach followed is to compare different document pre-processing, Word2Vec models and methods to obtain word embeddings, including the extraction of bigrams and keyphrases. Then, supervised and unsupervised Machine Learning categorization algorithms have been applied and compared. In addition, the imbalance issue of the input dataset has been addressed by using Synthetic Minority Oversampling Technique (SMOTE) to oversample the elements in the minority classes. Experiments conducted on an Italian dataset of 17,500 crime news articles collected from 2011 till 2021 show very promising results. The supervised categorization has proven to be better than the unsupervised categorization, overcoming 80% both in precision and recall, reaching an accuracy of 0.86. Furthermore, lemmatization, bigrams and keyphrase extraction are not so decisive. In the end, the availability of our model on GitHub together with the code we used to extract word embeddings allows replicating our approach to other corpus either in Italian or other languages.

2021 - Using Word Embeddings for Italian Crime News Categorization [Relazione in Atti di Convegno]
Bonisoli, Giovanni; Rollo, Federica; Po, Laura