Treffer: uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts.

Title:
uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts.
Authors:
Jiang M; Department of Computer Science, Utah State University, Logan, UT 84322-4205, USA. mjiang@cc.usu.edu, Anderson J, Gillespie J, Mayne M
Source:
BMC bioinformatics [BMC Bioinformatics] 2008 Apr 11; Vol. 9, pp. 192. Date of Electronic Publication: 2008 Apr 11.
Publication Type:
Journal Article; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S.
Language:
English
Journal Info:
Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: [London] : BioMed Central, 2000-
References:
J Mol Biol. 1983 Jan 15;163(2):171-6. (PMID: 6842586)
Nucleic Acids Res. 1980 Oct 10;8(19):4545-62. (PMID: 7433114)
BMC Bioinformatics. 2005 Oct 03;6:241. (PMID: 16202126)
Nucleic Acids Res. 1983 Apr 11;11(7):2205-20. (PMID: 6835847)
RNA. 2005 May;11(5):578-91. (PMID: 15840812)
Bioinformatics. 2004 Nov 22;20(17):2911-7. (PMID: 15217813)
Nucleic Acids Res. 1999 Apr 1;27(7):1578-84. (PMID: 10075987)
Comput Appl Biosci. 1988 Mar;4(1):153-9. (PMID: 2454711)
Nucleic Acids Res. 1999 Dec 15;27(24):4816-22. (PMID: 10572183)
Mol Biol Evol. 1985 Nov;2(6):526-38. (PMID: 3870875)
Bioinformatics. 1999 Dec;15(12):1058-9. (PMID: 10745997)
Bioinformatics. 2000 Jul;16(7):583-605. (PMID: 11038329)
Entry Date(s):
Date Created: 20080415 Date Completed: 20080603 Latest Revision: 20211020
Update Code:
20250114
PubMed Central ID:
PMC2375906
DOI:
10.1186/1471-2105-9-192
PMID:
18405375
Database:
MEDLINE

Weitere Informationen

Background: Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts.
Results: We present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact k-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided.
Conclusion: The uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community.