Treffer: Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3.

Title:
Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3.
Authors:
Bredesen-Aa BA; Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway., Rehmsmeier M; Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany.
Source:
PloS one [PLoS One] 2022 Sep 09; Vol. 17 (9), pp. e0274338. Date of Electronic Publication: 2022 Sep 09 (Print Publication: 2022).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
Imprint Name(s):
Original Publication: San Francisco, CA : Public Library of Science
References:
Nat Rev Genet. 2011 Dec 06;13(1):59-69. (PMID: 22143240)
Nucleic Acids Res. 2019 Sep 5;47(15):7781-7797. (PMID: 31340029)
Nucleic Acids Res. 2012 Jul;40(13):5848-63. (PMID: 22416065)
PLoS Comput Biol. 2008 Oct;4(10):e1000173. (PMID: 18974822)
Nat Biotechnol. 2006 Apr;24(4):423-5. (PMID: 16601727)
PLoS Genet. 2014 Jul 10;10(7):e1004495. (PMID: 25010632)
Nat Rev Mol Cell Biol. 2014 May;15(5):340-56. (PMID: 24755934)
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. (PMID: 29155950)
Bioinformatics. 2019 Oct 1;35(19):3831-3833. (PMID: 30850831)
Nucleic Acids Res. 2013 Jan;41(Database issue):D751-7. (PMID: 23125371)
Dev Cell. 2003 Nov;5(5):759-71. (PMID: 14602076)
Bioinformatics. 2009 Jun 1;25(11):1422-3. (PMID: 19304878)
Bioinformatics. 2004 Mar 1;20(4):467-76. (PMID: 14990442)
J Bioinform Comput Biol. 2014 Dec;12(6):1442006. (PMID: 25385081)
Bioinformatics. 2011 Dec 15;27(24):3423-4. (PMID: 21949271)
Nat Methods. 2020 Mar;17(3):352. (PMID: 32094914)
Genome Res. 2011 Dec;21(12):2167-80. (PMID: 21875935)
Bioinformatics. 2015 Apr 15;31(8):1307-9. (PMID: 25504848)
Nat Methods. 2020 Mar;17(3):261-272. (PMID: 32015543)
Bioinformatics. 2020 Dec 30;36(Suppl_2):i857-i865. (PMID: 33381828)
Nature. 2009 Jun 18;459(7249):927-30. (PMID: 19536255)
Bioessays. 2014 Feb;36(2):163-72. (PMID: 24277632)
PLoS One. 2015 Mar 04;10(3):e0118432. (PMID: 25738806)
BMC Bioinformatics. 2021 May 7;22(1):234. (PMID: 33962556)
Genome Res. 2011 Feb;21(2):216-26. (PMID: 21177970)
Nat Rev Genet. 2012 Jun 18;13(7):469-83. (PMID: 22705667)
Genome Res. 2014 Mar;24(3):401-10. (PMID: 24336765)
Substance Nomenclature:
9007-49-2 (DNA)
Entry Date(s):
Date Created: 20220909 Date Completed: 20220913 Latest Revision: 20220926
Update Code:
20250114
PubMed Central ID:
PMC9462789
DOI:
10.1371/journal.pone.0274338
PMID:
36084008
Database:
MEDLINE

Weitere Informationen

Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running 'pip install gnocis'. The source code is available on GitHub, at https://github.com/bjornbredesen/gnocis.

The authors have declared that no competing interests exist.