Treffer: Text Transformation Pipeline: An example of the multi-stage text transformation pipeline applied to a sample abstract (PMID: 30609739).

Title:
Text Transformation Pipeline: An example of the multi-stage text transformation pipeline applied to a sample abstract (PMID: 30609739).
Publication Year:
2025
Document Type:
Bild still image
Language:
unknown
DOI:
10.1371/journal.pone.0322498.g002
Rights:
CC BY 4.0
Accession Number:
edsbas.A7179BA5
Database:
BASE

Weitere Informationen

The process begins with the original text, followed by biomedical entity recognition and standardization using PubTator, which replaces medical terms and their synonyms with standardized identifiers (e.g., MeSH IDs). The text is then processed by MAREA, which simplifies and prepares it for machine learning by retaining standardized biomedical terms and ensuring consistent tokenization. In the final stage, non-biomedical synonyms are replaced using WordNet to further refine the embeddings. This figure illustrates the transformation applied across 30 million abstracts.