Artificial DNA sequence generator Long Description (required)
A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. https://github.com/caballero/Garlic jcaballero@uaq.mx
Step 1: Use the attribute tree to add new attributes or remove pre-selected attributes to describe the simulator.
Every sub-attribute is selected Not all sub-attributes are selectedFill Clear Expand Collapse Reset
Summary of Proposed Changes Step 2: Review list of proposed attribute addition(s) and subtraction(s).
Can't Find the Attribute You Are Looking For? If you would like to propose an attribute that you cannot find in the tree above, or if you would like to add a clarification to one or more attributes for this simulator (e.g. a specific file format for attribute /Output/File Format/Other), please list them in the Additional Comment box of the Submit tab .
Summary of Proposed Changes Current Citations/Applications
[Pubmed ID: 24803667 ],
Caballero J, Smit AF, Hood L, Glusman G ,
Realistic artificial DNA sequences as negative controls for computational genomics. ,
Nucleic Acids Res ,
07-01-2014 ,
https://www.ncbi.nlm.nih.gov/pubmed/?term=24803667, Primary Citation