Treffer: An open-source toolkit for mining Wikipedia

Title:
An open-source toolkit for mining Wikipedia
Source:
Artificial Intelligence, Wikipedia and Semi-Structured ResourcesArtificial intelligence (General ed.). 194:222-239
Publisher Information:
Oxford: Elsevier, 2013.
Publication Year:
2013
Physical Description:
print, 39 ref
Original Material:
INIST-CNRS
Subject Terms:
Cognition, Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Génie logiciel, Software engineering, Intelligence artificielle, Artificial intelligence, Reconnaissance et synthèse de la parole et du son. Linguistique, Speech and sound recognition and synthesis. Linguistics, Annotation, Anotación, Base de données, Database, Base dato, Discrimination, Discriminación, Désambiguïsation, Disambiguation, Desambiguisación, Fouille donnée, Data mining, Busca dato, Langage JAVA, JAVA language, Lenguaje JAVA, Langage XML, XML language, Lenguaje XML, Langage naturel, Natural language, Lenguaje natural, Logiciel libre, Open source software, Software libre, Multilinguisme, Multilingualism, Multilingüismo, Mémoire partagée, Shared memory, Memoria compartida, Ontologie, Ontology, Ontología, Outil logiciel, Software tool, Herramienta software, Relation sémantique, Semantic relation, Relación semántica, Ressource naturelle, Natural resources, Recurso natural, Réseau social, Social network, Red social, Service web, Web service, Servicio web, Structure document, Document structure, Estructura documental, Sémantique, Semantics, Semántica, Traitement langage, Language processing, Tratamiento lenguaje, Ontology extraction, Semantic relatedness, Toolkit, Wikipedia
Document Type:
Fachzeitschrift Article
File Description:
text
Language:
English
Author Affiliations:
Computer Science Department, The University of Waikato, Private Bag 3105, Hamilton, New Zealand
ISSN:
0004-3702
Rights:
Copyright 2014 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.26867913
Database:
PASCAL Archive

Weitere Informationen

The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.