OSIM (project): Open Space Innovative Mind

Despite the presence of many systems for developing and managing structured taxonomies and/or SKOS models for a given domain for which small documents set are accessible, the production and maintenance of these domain knowledge bases is still a very expensive and time consuming process. OSIM proposes a solution for assisting expert users in the development and management of knowledge base, including SKOS and ontologies modeling structures and relationships. The proposed solution accelerates the knowledge production by crawling and exploiting different kinds of sources (in multiple languages and with several inconsistencies among them). The proposed tool supports the experts in defining relationships among the most recurrent concepts, reducing the time to SKOS production and allowing assisted production. The validity of the produced knowledge base has been assessed by using SPARQL query interface and a precision and recall model. The solution has been developed for Open Space Innovative Mind, with the aim of creating a portal to allow industries at posing semantic queries to discover potential competences in a large institution such as the University of Florence, in which several distinct domains are associated with its own departments.
 

LINK TO THE OSIM KNOWLEDGE ENGINE AND SEARCH INTERFACE.

 
Open Mind Innovative Space project has as main objective the realization of a portal on which the industries, students and interested researchers can pose questions with the aim of identifying the competences in terms of researchers and groups in the large knowledge of the University of Florence. In the literature, there is a number of systems that have been proposed to solve the above described problem of helping modeling knowledge bases, may be matching the demand (semantic query) against the offer (knowledge about domain). The accessible version is part of a wider project called OSIM, which has been partially founded by Fondazione Monte dei Paschi di Siena. The OSIM project is presently under development, in this page you can see the last public version of OSIM, only.
 
As previously stated, the main goal of the OSIM is to realize a service to industries on which they can pose questions with the aim of identifying researchers and groups with the needed competences, knowledge among those of the University of Florence. The University of Florence includes more than 50 different departments belonging to all the scientific sectors areas, and hosting about 2000 researchers and more than 400 labs with their web pages. Each researcher may also teach at 2-3 courses; thus about 6000 course programs that may be considered competence descriptors as well. Moreover, the several research departments and researchers participate to research projects, for a total of about other 20.000 descriptors, etc. In such a context, it is very hard to identify a manageable number of people that could be reasonably entitled in terms of skill to create a shared common SKOS. This is due to the fact that the whole knowledge model has to be extracted from a huge amount of information, ranging from health care to geometry and math, from engineering to agriculture, from mechanics to statistic and pharmaceutics, etc. And, the sources of this knowledge may change quite dynamically, every year the courses are updated, the CV of people change, other publications and projects arrive, etc. 

OSIM architecture

 

OSIM General Architecture

On the basis of the above description, the available information can be ingested from a large amount of different sources. This highly dynamic collection of sources may be automatically gathered through the use of software agents and crawling tasks. The information gained can be used by a semantic search engine to answer user queries with a high degree of precision. For example, by using an assisted semantic query interface with natural language query engines.  

The domain knowledge is composed by three self-supporting ontologies which are related by semantic relationships. Therefore, the basic elements of the knowledge base are those regarding:

  • Friend of a Friend (FOAF) ontology used to model many properties about Person and Organization class (professor, phd students, students, researchers, contractors, their relationships, research classification as SSD, CUN, etc.): the name, the surname, the e-mail properties and the knows relationship (applicable to individual belonging to the Person class).
  • Academic life ontology is an ontology, we developed by DISIT people specifically for the Italian University case structure and terminology, that defines elements for describing universities and the activities that occur at them (labs, departments, faculties, research centers, groups, projects, courses, curricula, matter, projects, integrated labs, etc.). The main OWL entities and classes described by ontology are:
    • Organization class describes physical structures of university like research center, departments and laboratories;
    • People and role describe instances likefull-professors,  researchers and PhD students, related and derived from FOAF concepts;
    • Activity entities that cover concepts like pastprojects, ongoing projects and academic publications; To each person the specific publications are added as well, establishing in this way also relationships among the different authors.
  • Competences SKOS: it is the SKOS ontology that describes the hierarchy of the technical skills of structures and people belonging to the given application context, taking into acount the multilingual aspects, synonyms, etc. This part of the knowledge is the most dynamic.

The components related to the Academic life ontology and to the FOAF are initialized and directly populated by gathering information from the University database and from other institutions. Among them the central CINECA servers. This operation is performed with a set of crawling tasks realized by using SOAP Client implemented in JAVA making use of JAX-WS.

On the basis of the described architecture, the most critical aspect is the modeling and population of the above mentioned Competence SKOS for the whole university area. Typically, in these cases the solution proposed is to manually produce a coarse classification. On the other hand, what it is really needed is to arrive at a SKOS strongly related to the real sources of descriptors to allow the automated classification and reasoning.
For these reasons we started with the idea of producing a solution for assisting expert users in the collaborative development and management of a Competence SKOS, the Collaborative SKOS Accelerator and Manager, CoSKOSAM. With the aim of accelerating the process of SKOS production and population. In the next section the identified requirements are presented.

Furthermore, the ontology is produced according to the OWL/RDF/SKOS rules and can benefit from emerging technologies and innovations offered by the semantic web and natural language processing. The generated ontology is used as information domain by a demand and supply system about academic skills. It is currently in connection with a semantic database is queried by performing SPARQL queries allowing:

  • semantic search to retrieve ranked information. For computing  ranking it is possible to make use of term frequency as a factor weighting within the ranking algorithm thus resulting roiboust on uncertainties;
  • semantic indexing for search engine optimization and fuzzy queries, thus correcting eventual typos;
  • exploiting inferential engine to increase the system intelligence, increasing roboustness via similarities and relationships among terms;
  • improving the engine for providing results to the users and permitting them to navigate in the mesh of relationships among FOAF entities and results.
 

Project status

The status of the project can be summarized as follows:

  • Departments: formerly 49, since 1st January 2013 the University has been reorganized in 24 new departments
  • Keywords: 249000 from documents, and 140746 from CV courses, etc.
  • Documents, more than 18000 (among them: CV, courses, etc.)
  • People with courses: 2344, reseachers more than 1700
  • Publications: all those which are present on CINECA data base; for about 80000 publications, 30000 authors, reconstructed from the registrations performed by more than 4000 people (professor, reseachers, phd students, etc.) of the UNIFI on the CINECA database of research product
  • the whole semantic database consists of some thens of millions of triples.
 

Condition of the main departments on which the validation has been early performed (data updated at september 2013).

Dipartimento (UniFI NEW) n° Keywords n° Doc. n° Persone
Dipartimento di Architettura (DiDA) 4385 1120 123
Biologia 5713 258 42
Chirurgia e Medicina Traslazionale (DCMT) 7231 1174 62
Chimica "Ugo Schiff" 11147 489 88
Fisica e Astronomia 12688 457 64
Gestione Sistemi Agrari, Alimentari e Forestali (GESAAF) 4181 324 56
Ingegneria Civile e Ambientale (DICEA) 3569 341 43
Ingegneria Industriale 4796 579 58
Ingegneria dell'Informazione 6462 399 59
Lettere e Filosofia 2457 645 72
Lingue, Letterature e Studi Interculturali 2826 647 49
Matematica e Informatica "Ulisse Dini" 4597 570 90
Medicina Sperimentale e Clinica 11738 2449 157
Neuroscienze, Psicologia, Area del Farmaco eSalute del Bambino (NEUROFARBA) 10649 883 84
     
Storia, Archeologia, Geografia, Arte e Spettacolo (SAGAS) 3372 882 88
Scienze Biomediche, Sperimentali e Cliniche 8744 1263 100
Scienze per l'Economia e l'Impresa 4527 1109 107
Scienze della Terra 5685 286 42
Scienze della Formazione e Psicologia 2114 368 40
Scienze Giuridiche (DSG) 1448 1201 90
Statistica, Informatica, Applicazioni "G. Parenti" (DiSIA) 4596 563 49
Scienze delle Produzioni Agroalimentari e dell'Ambiente (DISPAA) 9099 405 76
Scienze Politiche e Sociali 2332 945 51
Scienze della Salute (DSS) 6390 830 63
TOTALE 140746 18187 1753
       

 

The validation has been performed againt queries on these department while all the rest of departments have been processed for keyword extraction and document analysis. The proposed tool has been used to develop the full knowledge base for indexing the knowledge of the whole structures of the University of Florence. It presently consists of 24 departments, about distinct 250000 keywords, 140000 indexed, coming from about 18000 documents (as CV, courses, etc.), and 1753 people that have courses and CV, while the total amount of researchers is much larger. Moreover, the publications collected from CINECA area are about 80000, with about 30000 authors including professors, PhD students, visiting professor, temporary researchers, contractors, etc.
 

References

 

Project team

OSIM framework is part of the project "Development of a Connection Platform Between University and Small and Medium Enterprises"; it has been partially founded by Fondazione MPS and is coordinated by Prof. M. Lombardi. The teams of Prof. M. Lombardi and Prof. P. Rissone are working on creating the competence knowledge models for their departments by using the CoSKOSAM tools. The design and development of the OSIM infrastrcuutre and solution has been performed by DISIT of DSI (https://www.disit.org/disitmn/) under the coordination of Prof. Paolo Nesi. The OSIM project is presently under development and validation in its new shape and semantic model. Several other aspects not mentioned here are under development and will be presented when they will be integrated in the public accessible version of OSIM!

The activity of OSIM project has been partially moved into the new SACVAR project.
 

Contact:

Paolo Nesi, paolo.nesi@unifi.it