GSR: Simulator - GARLIC
Attribute | Value |
---|---|
Title | GARLIC |
Short Description | Artificial DNA sequence generator |
Long Description | A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. |
Project Started | 2011 |
Last Release | 5 years, 3 months ago |
Homepage | https://github.com/caballero/Garlic |
Citations | Caballero J, Smit AF, Hood L, Glusman G, Realistic artificial DNA sequences as negative controls for computational genomics., Nucleic Acids Res, 07-01-2014 [ Abstract, cited in PMC ] |
GSR Certification | ✔ Accessibility |
Last evaluated | 05-15-2023 (334 days ago) |
Author verification | The basic description provided was derived from a website or publications by the GSR team and has not yet been verified by the simulation author. To modify this entry or add more information, propose changes to this simulator. |
Attribute Category | Attribute |
---|---|
Target | |
Type of Simulated Data | Haploid DNA Sequence, |
Variations | |
Simulation Method | Other, |
Input | |
Data Type | Reference genome, |
File format | |
Output | |
Data Type | Genotype or Sequence, |
Sequencing Reads | |
File Format | Fasta or Fastq, |
Sample Type | |
Phenotype | |
Trait Type | |
Determinants | |
Evolutionary Features | |
Demographic | |
Population Size Changes | |
Gene Flow | |
Spatiality | |
Life Cycle | |
Mating System | |
Fecundity | |
Natural Selection | |
Determinant | |
Models | |
Recombination | |
Mutation Models | |
Events Allowed | |
Other | |
Interface | Script-based, Web-based, |
Development | |
Tested Platforms | Linux and Unix, |
Language | Perl, |
License | GNU Public License, |
GSR Certification | Accessibility, Documentation, Support, |
Number of Primary Citations: 1
Number of Non-Primary Citations: 0
No example publication using GARLIC has been provided.
Please propose new citations if you are aware of publications that use this software.