Best Paper Award: A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents
Best Paper Award first place 2015: A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents (e.g.: NLP on hadoop).
P. Nesi, G. Pantaleo and G. Sanesi
The 21st International Conference on Distributed Multimedia Systems DMS 2015 Hyatt Regency, Vancouver, Canada August 31 - September 2, 2015.
The recent growth of the World Wide Web at increasing rate and speed and the number of online available resources populating Internet represent a massive source of knowledge for various research and business interests. Such knowledge is, for the most part, embedded in the textual content of web pages and documents, which is largely represented as unstructured natural language formats. In order to automatically ingest and process such huge amounts of data, single-machine, non-distributed architectures are proving to be inefficient for tasks like Big Data mining and intensive text processing and analysis. Current Natural Language Processing (NLP) systems are growing in complexity, and computational power needs have been significantly increased, requiring solutions such as distributed frameworks and parallel computing programming paradigms. This paper presents a distributed framework for executing NLP related tasks in a parallel environment. This has been achieved by integrating the APIs of the widespread GATE open source NLP platform in a multi-node cluster, built upon the open source Apache Hadoop file system. The proposed framework has been evaluated against a real corpus of web pages and documents.
Best Paper Award: A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents (e.g.: NLP on hadoop)
A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents
100 Hits