Data Mining: Geoname and VIP Names, towards dbPedia and internal resolution

In almost all content collections, a large number of classification and performing art metadata contains citations to person names and may be to geographical names, that are locations and places (cities, provinces, regions, states, etc.).
In most cases, many metadata fields are free text ingested by external archives and thus the cited names are not related to other citations of the same name in the same portal neither to any qualified databases of very important and well known person names, such as dpPedia, and for geographical geonames. See for example, the DC.description, DC.abstract, DC.Title, DC.creator, the list of actors, and performers, etc.

In ECLAP, we have solved the above problems by:
  • implementing natural language processing, NLP, algorithms for mining the multilingual data set to extract person names, identify synonymous, and search them to dbPedia and as well as on geonames. NLP algorithms identify names and are supported by a back office NameMiningManager, see the figure at the end of this page. The name manager allows to navigate in the resolved names, assess  the quality, peform eventual correction, activate specific tasks.
  • identified person names can be classified either as: ECLAP users, vip names on dpPedia, or generic person names. For each of these categories links has been established to see from each object the citation, to report them into the linked open data, to make them accessible on the Social Graph.
  • identified geographical names that are related to the content via the metadata and via the Social Graph. Thus also made accessible as Linked Open Data.

Over 170000 objects a total of 870000 citations to person names have been identified, and thus more than 24000 distinct person names (with 140 Synonyms). Among them, 153 are ECLAP users that contributed as contributors and authors, while 2094 have been identified as Vip Names. These names are highlighted into the metadata reported on the right side of the content while it is played and you can click on them, as you can click on related geonames/places.

Thus, clicking on highlighted names on the metadata as:
Example of the NameMiningManager, on the left the list of names with eventual synonimous; on the right top the examples of the occurrencies identified into the ECLAP big data archive. On the right bottom, the list of eventual dbpedia links identified as in this case for Dario Fo.

An example of SocialGraph with relationships of cited by and cited names. In more details, from the object it is possible to see the names cited in the metadata (cited names: Dario Fo, Franca, Rame, Paolo nesi, Mariateresa Pizza, etc.). Among them we can see the creator (Mariateresa Pizza). Please note that, expanding the relatioships of Paolo Nesi, we can see that the other content in which he is cited (cited by). This link object can bring you at content in the center of  the graph.