Treffer: Text Transformation Pipeline: An example of the multi-stage text transformation pipeline applied to a sample abstract (PMID: 30609739).
Title:
Text Transformation Pipeline: An example of the multi-stage text transformation pipeline applied to a sample abstract (PMID: 30609739).
Authors:
Publication Year:
2025
Subject Terms:
Cancer, Science Policy, Space Science, Environmental Sciences not elsewhere classified, Biological Sciences not elsewhere classified, Mathematical Sciences not elsewhere classified, Information Systems not elsewhere classified, semantically meaningful representations, received less attention, python code implementing, ensuring consistent representation, combined approach aimed, cell lung carcinoma, additionally leveraged wordnet, 8 %, suggesting, reduce embedding noise, improving embedding quality, higher embedding quality, biomedical synonym replacement, biomedical concept representations, +embeddings%22">xlink "> embeddings, word2vec algorithm applied, span multiple words, mean pairwise distance, single concept identifier, biomedical concept synonyms, embedding techniques, biomedical terms, biomedical synonyms
Document Type:
Bild
still image
Language:
unknown
DOI:
10.1371/journal.pone.0322498.g002
Availability:
Rights:
CC BY 4.0
Accession Number:
edsbas.A7179BA5
Database:
BASE
Weitere Informationen
The process begins with the original text, followed by biomedical entity recognition and standardization using PubTator, which replaces medical terms and their synonyms with standardized identifiers (e.g., MeSH IDs). The text is then processed by MAREA, which simplifies and prepares it for machine learning by retaining standardized biomedical terms and ensuring consistent tokenization. In the final stage, non-biomedical synonyms are replaced using WordNet to further refine the embeddings. This figure illustrates the transformation applied across 30 million abstracts.