Treffer: A teaching proposal for a short course on biomedical data science.

Title:
A teaching proposal for a short course on biomedical data science.
Authors:
Chicco D; Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy.; Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada., Coelho V; Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy.
Source:
PLoS computational biology [PLoS Comput Biol] 2025 Apr 14; Vol. 21 (4), pp. e1012946. Date of Electronic Publication: 2025 Apr 14 (Print Publication: 2025).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Public Library of Science Country of Publication: United States NLM ID: 101238922 Publication Model: eCollection Cited Medium: Internet ISSN: 1553-7358 (Electronic) Linking ISSN: 1553734X NLM ISO Abbreviation: PLoS Comput Biol Subsets: MEDLINE
Imprint Name(s):
Original Publication: San Francisco, CA : Public Library of Science, [2005]-
References:
PLoS One. 2019 May 3;14(5):e0216416. (PMID: 31050684)
J Healthc Eng. 2021 Jun 9;2021:1004767. (PMID: 34211680)
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51. (PMID: 1180967)
PLoS One. 2018 Aug 30;13(8):e0202947. (PMID: 30161168)
Sci Eng Ethics. 2016 Apr;22(2):303-41. (PMID: 26002496)
J Big Data. 2021;8(1):140. (PMID: 34722113)
BioData Min. 2017 Dec 8;10:35. (PMID: 29234465)
Genetics. 2023 May 4;224(1):. (PMID: 36866529)
PeerJ Comput Sci. 2024 Sep 3;10:e2256. (PMID: 39314688)
PeerJ Comput Sci. 2024 Feb 26;10:e1896. (PMID: 38435625)
Nat Hum Behav. 2018 Jan;2(1):6-10. (PMID: 30980045)
PLoS Comput Biol. 2023 Jan 5;19(1):e1010786. (PMID: 36602949)
PLoS One. 2018 Aug 31;13(8):e0201991. (PMID: 30169521)
J Biomed Inform. 2023 Aug;144:104426. (PMID: 37352899)
IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):248-60. (PMID: 27045825)
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7. (PMID: 21868852)
Nat Methods. 2021 Oct;18(10):1122-1127. (PMID: 34316068)
Nucleic Acids Res. 2008 Jan;36(Database issue):D440-4. (PMID: 17984083)
PeerJ Comput Sci. 2021 Jul 5;7:e623. (PMID: 34307865)
Clin Teach. 2018 Apr;15(2):104-108. (PMID: 29575667)
PLoS Comput Biol. 2022 Dec 15;18(12):e1010718. (PMID: 36520712)
PLoS One. 2018 Dec 21;13(12):e0209500. (PMID: 30576362)
Sci Rep. 2019 Sep 10;9(1):13036. (PMID: 31506502)
Bioinformatics. 2012 Jan 1;28(1):112-8. (PMID: 22039212)
Nat Biotechnol. 2018 Dec 03;:. (PMID: 30531897)
BioData Min. 2023 Feb 17;16(1):4. (PMID: 36800973)
Artif Intell Med. 2013 May;58(1):63-72. (PMID: 23428358)
PLoS Comput Biol. 2022 Aug 11;18(8):e1010348. (PMID: 35951505)
Elife. 2019 Oct 09;8:. (PMID: 31596231)
N Engl J Med. 2018 Jun 14;378(24):2311-2320. (PMID: 29897847)
Entry Date(s):
Date Created: 20250414 Date Completed: 20250414 Latest Revision: 20250420
Update Code:
20250420
PubMed Central ID:
PMC11996213
DOI:
10.1371/journal.pcbi.1012946
PMID:
40228204
Database:
MEDLINE

Weitere Informationen

As the availability of big biomedical data advances, there is a growing need of university students trained professionally on analyzing these data and correctly interpreting their results. We propose here a study plan for a master's degree course on biomedical data science, by describing our experience during the last academic year. In our university course, we explained how to find an open biomedical dataset, how to correctly clean it and how to prepare it for a computational statistics or machine learning phase. By doing so, we introduce common health data science terms and explained how to avoid common mistakes in the process. Moreover, we clarified how to perform an exploratory data analysis (EDA) and how to reasonably interpret its results. We also described how to properly execute a supervised or unsupervised machine learning analysis, and now to understand and interpret its outcomes. Eventually, we explained how to validate the findings obtained. We illustrated all these steps in the context of open science principles, by suggesting to the students to use only open source programming languages (R or Python in particular), open biomedical data (if available), and open access scientific articles (if possible). We believe our teaching proposal can be useful and of interest for anyone wanting to start to prepare a course on biomedical data science.
(Copyright: © 2025 Chicco and Coelho. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)

The authors have declared that no competing interests exist.