Result: Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources

Title:
Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources
Source:
Content architecture: exploiting and managing diverse resourcesAslib proceedings : New information perspectives. 62(4-5):466-475
Publisher Information:
Bradford: Emerald, 2010.
Publication Year:
2010
Physical Description:
print; 10; 3/4 p
Original Material:
INIST-CNRS
Document Type:
Conference Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, United Kingdom
English Heritage, Portsmouth, United Kingdom
Department of Information Studies, University College London, London, United Kingdom
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Accession Number:
edsfra.23716633
Database:
FRANCIS Archive

Further Information

Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist rich semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware rich indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as Grey Literature, from the Archaeological Data Service OASIS corpus (Online Access to the Index of archaeological investigations), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.