Treffer: Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.

Title:
Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.
Authors:
Endara L; Department of Biology University of Florida Gainesville Florida 32611 USA., Cui H; School of Information University of Arizona Tucson Arizona 85719 USA., Burleigh JG; Department of Biology University of Florida Gainesville Florida 32611 USA.
Source:
Applications in plant sciences [Appl Plant Sci] 2018 Mar 31; Vol. 6 (3), pp. e1035. Date of Electronic Publication: 2018 Mar 31 (Print Publication: 2018).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: John Wiley & Sons, Inc Country of Publication: United States NLM ID: 101590473 Publication Model: eCollection Cited Medium: Print ISSN: 2168-0450 (Print) Linking ISSN: 21680450 NLM ISO Abbreviation: Appl Plant Sci Subsets: PubMed not MEDLINE
Imprint Name(s):
Publication: 2018-: Hoboken, NJ : John Wiley & Sons, Inc.
Original Publication: St. Louis, MO : Botanical Society of America, 2013-
References:
Front Plant Sci. 2015 Aug 10;6:619. (PMID: 26322060)
BMC Bioinformatics. 2011 May 12;12:148. (PMID: 21569390)
Curr Opin Plant Biol. 2015 Apr;24:93-9. (PMID: 25733069)
Syst Biol. 2018 Jan 01;67(1):49-60. (PMID: 29253296)
Appl Plant Sci. 2015 Feb 09;3(2):. (PMID: 25699217)
Am J Bot. 2018 Mar;105(3):549-564. (PMID: 29730880)
Am J Bot. 2017 Apr;104(4):505-508. (PMID: 28400413)
Comp Funct Genomics. 2005;6(7-8):388-97. (PMID: 18629207)
PLoS One. 2011;6(10):e25630. (PMID: 21991324)
BMC Bioinformatics. 2016 Nov 17;17(1):471. (PMID: 27855645)
PLoS Curr. 2013 Jun 26;5:. (PMID: 23827969)
Methods Mol Biol. 2016;1374:89-114. (PMID: 26519402)
Syst Biol. 2015 Nov;64(6):936-52. (PMID: 26018570)
Trends Plant Sci. 2011 Dec;16(12):635-44. (PMID: 22074787)
J Biomed Semantics. 2016 Nov 14;7(1):65. (PMID: 27842607)
Appl Plant Sci. 2018 Mar 31;6(3):e1035. (PMID: 29732265)
Contributed Indexing:
Keywords: morphological matrices; natural language processing; phenotypic traits; taxonomic descriptions
Entry Date(s):
Date Created: 20180508 Latest Revision: 20240314
Update Code:
20250114
PubMed Central ID:
PMC5895189
DOI:
10.1002/aps3.1035
PMID:
29732265
Database:
MEDLINE

Weitere Informationen

Premise of the Study: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms.
Methods and Results: Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon-by-character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae.
Conclusions: The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses.