Treffer: LDmat: efficiently queryable compression of linkage disequilibrium matrices.

Title:
LDmat: efficiently queryable compression of linkage disequilibrium matrices.
Authors:
Weiner RJ; Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.; New York Genome Center, New York, NY 10013, USA.; Department of Computer Science, Columbia University, New York, NY 10027, USA., Lakhani C; New York Genome Center, New York, NY 10013, USA., Knowles DA; New York Genome Center, New York, NY 10013, USA.; Department of Computer Science, Columbia University, New York, NY 10027, USA.; Department of Systems Biology, Columbia University, New York, NY 10032, USA., Gürsoy G; Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.; New York Genome Center, New York, NY 10013, USA.; Department of Computer Science, Columbia University, New York, NY 10027, USA.
Source:
Bioinformatics (Oxford, England) [Bioinformatics] 2023 Feb 03; Vol. 39 (2).
Publication Type:
Journal Article; Research Support, N.I.H., Extramural
Language:
English
Journal Info:
Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: Oxford : Oxford University Press, c1998-
References:
Methods Mol Biol. 2007;376:59-70. (PMID: 17984538)
Brief Bioinform. 2004 Dec;5(4):355-64. (PMID: 15606972)
Nature. 2020 Sep;585(7825):357-362. (PMID: 32939066)
Nat Rev Genet. 2008 Jun;9(6):477-85. (PMID: 18427557)
Nat Genet. 2020 Dec;52(12):1355-1363. (PMID: 33199916)
Bioinformatics. 2011 Mar 1;27(5):718-9. (PMID: 21208982)
Anim Genet. 2014 Oct;45(5):754-7. (PMID: 25040320)
Methods Mol Biol. 2007;376:1-15. (PMID: 17984534)
Nat Genet. 2015 Mar;47(3):291-5. (PMID: 25642630)
Science. 2002 Jun 21;296(5576):2225-9. (PMID: 12029063)
Front Genet. 2020 Feb 28;11:157. (PMID: 32180801)
Am J Hum Genet. 2017 Oct 5;101(4):539-551. (PMID: 28942963)
Grant Information:
R00 HG010909 United States HG NHGRI NIH HHS; R35 GM147004 United States GM NIGMS NIH HHS; U01 AG068880 United States AG NIA NIH HHS
Entry Date(s):
Date Created: 20230216 Date Completed: 20230301 Latest Revision: 20230313
Update Code:
20250114
PubMed Central ID:
PMC9969815
DOI:
10.1093/bioinformatics/btad092
PMID:
36794924
Database:
MEDLINE

Weitere Informationen

Motivation: Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome.
Results: We sought to address the need for compressing and easily querying large LD matrices by developing LDmat. LDmat is a standalone tool to compress large LD matrices in an HDF5 file format and query these compressed matrices. It can extract submatrices corresponding to a sub-region of the genome, a list of select loci, and loci within a minor allele frequency range. LDmat can also rebuild the original file formats from the compressed files.
Availability and Implementation: LDmat is implemented in python, and can be installed on Unix systems with the command 'pip install ldmat'. It can also be accessed through https://github.com/G2Lab/ldmat and https://pypi.org/project/ldmat/.
Supplementary Information: Supplementary data are available at Bioinformatics online.
(© The Author(s) 2023. Published by Oxford University Press.)