GSR: Compare Simulators by Attribute

AdmixSim2 is an individual-based forward-time simulation tool that can flexibly and efficiently simulate population genomics data under complex evolutionary scenarious. It is based on the extended Wright-Fisher model, and it implements many common evolutionary parameters to involve gene flow, natural selection, recombination, and mutation. AdmixSim2 can be used to simulte data of diocious or monoecious populations, autosomes, or sex chromosomes.

Full AdmixSim2 Profile

Aladyn

http://www.katja-schiffers.eu/research.html

Tools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation

Description

Tools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation

Full Aladyn Profile

ALF

http://alfsim.org/#index

A Simulation Framework for Genome Evolution

Description

Artificial Life Framework (ALF) aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. ALF is available as a stand-alone application and a user-friendly yet powerful web interface.

Full ALF Profile

AliSim

http://www.iqtree.org/doc/AliSim

A fast and versatile phylogenetic sequence simulator

Description

AliSim is a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 hours and 1.3 GB RAM to simulate alignments with one million sequences or sits, whereas popular software like Seq-Gen, Dawg, and INDELible require 2-5 hours and 50-500 GB of RAM for the same task.

Full AliSim Profile

AliSim-HPC

https://github.com/iqtree/iqtree2/releases

parallel sequence simulator for phylogenetics

Description

AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.

Full AliSim-HPC Profile

Ana-FiTS

http://sco.h-its.org/exelixis/web/software/anafits/index.html

an efficient tool for simulating polymorphism data forward-in-time on the chromosome and genome level

Description

AnA-FiTS is an efficient tool for simulating polymorphism data forward-in-time on the chromosome and genome level. Its most striking features are high runtime efficiency, specifically when a part of the sequence to be simulated shall be neutral. Furthermore, for the neutral part of the sequence, AnA-FiTS stores (and outputs) a graph structure that allows to reconstruct the ancestral part of each haplotype that survived into present at any point in time.

Full Ana-FiTS Profile

ARGON

https://github.com/pierpal/ARGON

Fast, whole-genome simulation of the discrete time Wright-Fisher process

Description

ARGON simulates the discrete time Wright Fisher process (DTWF) backwards in time. The coalescent is equivalent to the DTWF process if the sample size is small compared to the effective population size, but will deviate from it as the sample size increases (Wakeley and Takahashi, MBE 2003; Bhaskar, Clark and Song, PNAS 2014). ARGON supports arbitrary demographic history, migration, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent (IBD) sharing data.

Full ARGON Profile

ART

http://www.niehs.nih.gov/research/resources/software/biostatistics/art/

ART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles.

Description

ART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles. ART supports simulation of single-end, paired-end and mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can perform regular genome sequencing simulation as well amplicon sequencing simulation. ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN/MAP and/or SAM format. ART can also generate alignments in UCSC BED file format.

Full ART Profile

art_modern

https://github.com/YU-Zhejian/art_modern/

A modernized ART for Illumina read simulation.

Description

High-performance simulation of realistic next-generation sequencing (NGS) data is a must for various algorithm development and benchmarking tasks. However, most existing simulators are either slow or generate data that does not reflect the real-world error profile of simulators. Here we introduce art_modern, a modern re-implementation of the popular ART simulator with enhanced performance and functionality. It can be used for anyone who wants to simulate sequencing data for their own research, like benchmarking of DNA- or RNA-Seq alignment algorithms, test whether the RNA-Seq pipeline built by your lab performs well or perform pressure testing of pipelines on a cluster. This simulator would be best suited for GNU/Linux-based High-End Desktops (HEDTs) with multiple cores and a fast SSD. However, it can also work on laptops or high-performance clusters (HPCs) with only one node. We believe with such simulators, the testing and benchmarking of NGS-related bioinformatics algorithms can be largely accelerated.

Full art_modern Profile

BAMSurgeon

https://github.com/adamewing/bamsurgeon

Methods for realistic simulation of mutations in real data.

Description

BAMSurgeon can add SNVs, INDELs, and several forms of structural variant (SV) to existing BAM files and using multiple alignment methods, which is useful for testing mutation detection software in a variety of contexts.

Full BAMSurgeon Profile

Bayesian Serial SimCoal

http://www.stanford.edu/group/hadlylab/ssc/index.html

Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program written by Laurent Excoffier, John Novembre, and Stefan Schneider.

Description

Bayes SSC is powerful because it allows flexible coalescent modelling from a variety of different priors. The enables parameter estimation, likelihood calculations, and Bayesian inference. Typically, BayeSSC generates thousands of hypothetical trees using slightly different population parameters. The simulated genetics of these trees can then be compared to the actual genetics of the user's samples to investigate which history of these many simulated histories is the most likely to have generated the samples.

Full Bayesian Serial SimCoal Profile

BaySICS

https://sites.google.com/site/baysicsabc/

An integral platform with a graphical interface for statistical inference based on approximate Bayesian computation.

Description

BaySICS is made of five programs accessible from the same graphical interface. The first program performs coalescent simulations and create reference tables containing summary statistics from simulated DNA alignments. The second and third programs perform post-simulation analysis employing the reference tables and obtain parameters estimations or model choice (hypothesis contrasts) respectively. The fourth and fifth programs perform validation procedures for assessing the statistical power as well as the robustness of the inference by means of pseudo-observed datasets. BaySICS was designed for be user-friendly and for optimizing studies of ancient DNA.

Full BaySICS Profile

BEERS

http://cbil.upenn.edu/BEERS/

BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data

Description

By default BEERS simulates either mouse or human paired-end RNA-Seq data modeled on the illumina platform. It starts with a large number of gene models (approx 500K) taken from about ten different published annotation efforts, and then chooses a fixed number of these genes at random (30,000 by default). This avoids biasing for or against any particular set of annotations. BEERS then introduces substitutions, indels, alternate spice forms, sequencing errors, and intron signal. BEERS can also simulate strand specific reads. BEERS does not simulate quality scores. There are four configuration files required (available below).

Full BEERS Profile

bmsim

https://github.com/pingchen09990102/BMSIM

BioNano Molecule SIMulator

Description

BioNano Molecule SIMulator (BMSIM) explicitly incorporated BioNano data models (BioNano molecule length distribution, FN and FP signals, DNA molecules stretching variations, variation in optical resolution, and fragile sites) and the methods to generate chimeric molecules and assign SNR scores for simulated BioNano molecules. We simulated noisy maps from ‘perturbed’ versions of the reference map. Using genomic sequences (.fasta file) as input, BMSIM simulated noisy maps with five main steps: I) generate BioNano molecules with random fragmentation and fragile site bias model; II) abel nicking sites for BioNano molecules by in silico restriction digestion.Our program supported all available nicking enzymes currently used in BioNano systerm (i.e., Nt.BspQI, Nb.BbvCI, Nb.Bsml and Nb.BsrDI), as well as any artificial nicking sequences that users chose to define; III) incorporate data models for FN sites, FP sites, stretching variations, optical resolution, and chimerism for BioNano molecules; IV) assign SNR and intensity scores for labelling sites; V) iterate for targeted coverage depth. The output of BMSIM is a BNX format text file (.BNX, see example BNX file) which contains molecule map length, label positions, and label signal score, ect.

Full bmsim Profile

Boquila

https://github.com/CompGenomeLab/boquila

NGS read simulator to eliminate read nucleotide bias in sequence analysis

Description

Boquila can be configured to generate reads from only specified regions of the reference genome. It also allows the use of input DNA sequencing to correct the bias due to the copy number variations in the genome. Boquila uses standard file formats for input and output data, and it can be easily integrated into any workflow for high-throughput sequencing applications.

Full Boquila Profile

BOTTLENECK

http://www1.montpellier.inra.fr/CBGP/software/Bottleneck/bottleneck.html

Bottleneck is a program for detecting recent effective population size reductions from allele data frequencies

Description

The program BOTTLENECK computes for each population sample and for each locus the distribution of the heterozygosity expected from the observed number of alleles (k), given the sample size (n) under the assumption of mutation-drift equilibrium. This distribution is obtained through simulating the coalescent process of n genes under two possible mutation models, the IAM and the SMM. This enables the computation of the average (Hexp) which is compared to the observed heterozygosity (Hobs, in the sense of Nei's gene diversity) to establish whether there is an heterozygosity excess or deficit at this locus. In addition, the standard deviation (SD) of the mutation-drift equilibrium distribution of the heterozygosity is used to compute the standardized difference for each locus ((Hobs-Hexp)/SD). The distribution obtained through simulation enables also the computation of a P-value for the observed heterozygosity.

Full BOTTLENECK Profile

BottleSim

https://github.com/chihhorngkuo/BottleSim

a simulation program for changes in genetic diversity during the process of population bottlenecks

Description

Population bottlenecks reduce genetic diversity and thus cause great concern in conservation biology. Previous theoretical studies often assume discrete generations in projecting declines in genetic diversity caused by bottlenecks. This assumption creates complexities when applying the models to long-lived species with overlapping generations. BottleSim is a program for simulating bottlenecks to estimate the impact on genetic diversity; the novelties include an overlapping-generation model, a wide range of reproductive systems, and flexible population size settings. With these features, BottleSim will be a useful tool for estimating the genetic consequences of bottlenecks, evaluating conservation plans, and performing power analysis.

Full BottleSim Profile

CAMISIM

https://github.com/CAMI-challenge/CAMISIM

Simulating metagenomes and microbial communities

Description

CAMISIM is a software to model abundance distributions of microbial communities and to simulate corresponding shotgun metagenome datasets. It was mainly developed for the Critical Assessment of Metagenome Annotation (CAMI) challenge, but should be suitable for general use. Please don't hesitate to open a new issue if you run into problems or need help.

Full CAMISIM Profile

CAMPAREE

https://github.com/itmat/CAMPAREE

a robust and configurable RNA expression simulator

Description

CAMPAREE is a RNA expression simulator that is primed using real data to give realistic output. CAMPAREE needs as input a reference genome with transcript annotations as well as fastq files of samples of the species to base the output on. For each sample, CAMPAREE outputs a simulated set of RNA transcripts mimicking expression levels with in the fastq files and accounting for isoform-level expression and allele-specific expression. It also outputs simulated diploid genomes and their corresponding annotations with phased SNP and indel calls in the transcriptome from fastq reads. Additionally the simulation outputs the underlying distributions used for expressing the transcripts.

Full CAMPAREE Profile

CancerInSilico

https://github.com/FertigLab/CancerInSilico

The CancerInSilico package provides an R interface for running mathematical models of tumor progresson. This package has the underlying models implemented in C++ and the output and analysis features implemented in R.

Description

Full CancerInSilico Profile

CASS

https://liberles.cst.temple.edu/Software/CASS/index.html

Protein Sequence Simulation

Description

CASS provides simulated protein (codon) sequences from a population genetic context with a protein structure-dependent explicit genotype-phenotype map.

Full CASS Profile

CastNet

https://github.com/carlosj-rr/CastNet

a systems-level sequence evolution simulator

Description

CastNet is a genome evolution simulator that assumes each genome is a collection of genes with constantly evolving regulatory interactions in between them. The regulatory interactions produce a phenotype in the form of gene expression profiles, upon which fitness is calculated. A genetic algorithm is then used to evolve a population of such entities through a user-defined phylogeny. Importantly, the regulatory mutations are a response to sequence mutations, thus making a 1-1 relationship between the rate of evolution of sequences and of regulatory parameters.

Full CastNet Profile

CDPOP

https://github.com/ComputationalEcologyLab/CDPOP

CDPOP is a landscape genetics tool for simulating the emergence of spatial genetic structure in populations resulting from specified landscape processes governing organism movement behavior.

Description

CDPOP (Cost Distance POPulations) is an individual-based simulator of gene flow in complex landscapes to explain observed population responses and provide a foundation for landscape genetics. It models genetic exchange among spatially located individuals as a function of individual-based movement through mating and dispersal, incorporating population dynamics and the all factors that affect the frequency of an allele in a population (mutation, gene flow, genetic drift, and selection). User’s initially specify individual locations, environmental conditions governing gene flow, spatially-explicit fitness landscapes governing selection, and various genic configurations, and CDPOP models divergence through time as function of individual-based movement, breeding and dispersal as functions of the given landscape surfaces.

Full CDPOP Profile

CellCoal

https://github.com/dapogon/cellcoal

CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples

Description

CellCoal simulates the somatic evolution of single-cells. CellCoal generates a coalescent genealogy for a sample of somatic cells –no recombination– obtained from a growing population, together with a another cell as outgroup, introduces mutations along this genealogy, and produces single-cell diploid genotypes (single-nucleotide variants or SNVs). CellCoal implements multiple mutations models (0/1, DNA, infinite and finite site models, deletion, copy-neutral LOH, 30 cancer signatures) and is able to generate read counts and genotype likelihoods considering allelic dropout, sequencing and amplification error, plus doublet cells.

Full CellCoal Profile

Clotho

https://github.com/putnampp/clotho

a C++ library of efficient data structures, algorithms, and tools for use in Forward Time Population Genetic Simulation

Description

Clotho is a C++ library of efficient data structures, algorithms, and tools for use in Forward Time Population Genetic Simulation. The name is in reference to the youngest sister of the Three Fates or Moirai. She was responsible for spinning the thread of human life.

Full Clotho Profile

Coala

https://github.com/statgenlmu/coala

Coala is an R package that simulates biological sequences according to a given model of evolution.

Description

Coala is an R package that simulates biological sequences according to a given model of evolution. The package calls simulators based on coalescent theory. All the simulators can simulate finite site mutation models when combined with Seq-gen. Coala then imports the output of the simulators into R and is capable of calculating their summary statistics.

Full Coala Profile

CoaSim

https://github.com/mailund/CoaSim

CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models.

Description

CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models. It effectively constructs the ancestral recombination graph for a given number of individuals and uses this to simulate samples of SNP, micro-satellite, and other haplotypes/genotypes. The generated sample can afterwards be separated in cases and controls, depending on states of selected individual markers. The tool can accordingly also be used to construct cases and control data sets for association studies. CoaSim is written in C++, Guile Scheme and Python, and is available as source code (under the GNU General Public License, GPL) and as binary versions as Linux RPM files. The source code has been successfully compiled on various Linux and UNIX systems, under OS X and under Windows with Cygwin. As I have only limited access to architectures other than Linux, it is not possible for me to make binary distributions for other platforms, but if anyone is willing to build the distributions I will be more than happy to put them on this site.

Full CoaSim Profile

cophesim

https://sites.duke.edu/barusoftware/othersoft/cophesim/

A Comprehensive Simulator of Phenotype-Genotype Connections for Testing Methods of Genetic Analysis

Description

Simulation tool to simulate phenotypes: continuous, dichotomous and survival for common variants from existing genotype data simulated with some other tool.

Full cophesim Profile

CoreSimul

https://github.com/lbobay/CoreSimul

a forward-in-time simulator of genome evolution for prokaryotes modeling homologous recombination

Description

CoreSimul is a forward-in-time simulator of core genome evolution for prokaryotes modeling homologous recombination. Simulations are guided by a phylogenetic tree and incorporate different substitution models, including models of codon selection.

Full CoreSimul Profile

cosi

http://www.broadinstitute.org/~sfs/cosi/

A coalescent-based simulator with a demographic model calibrated from empirical data.

Description

Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.

Full cosi Profile

cosi2

https://software.broadinstitute.org/mpg/cosi2/

an efficient coalescent simulator with support for simulating selection

Description

cosi2 is an efficient coalescent simulator with support for selection, population structure, variable recombination rates, and gene conversion. It supports exact and approximate simulation modes.

Full cosi2 Profile

CuReSim

http://www.pegase-biosciences.com/curesim-a-customized-read-simulator/

A customized read simulator

Description

CuReSim (Customized Read Simulator) is a customized tool which generates synthetic New-Generation Sequencing reads, supporting read simulation for major letter-base sequencing platforms. CuReSim is developed in Java and is distributed as an executable jar file. Wrappers to integrate CuReSim in Galaxy are also available.

Full CuReSim Profile

DAWG

https://github.com/reedacartwright/dawg

An application designed to simulate the evolution of recombinant DNA sequences in continuous time

Description

DNA Assembly with Gaps (Dawg) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation. The application accepts phylogenies in Newick format and can return the sequence of any node, allowing for the exact evolutionary history to be recorded at the discretion of users. Dawg records the gap history of every lineage to produce the true alignment in the output. Many options are available to allow users to customize their simulations and results.

Full DAWG Profile

DeepSimulator

https://github.com/liyu95/DeepSimulator

The first deep learning based Nanopore simulator which can simulate the process of Nanopore sequencing

Description

DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83 to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.

Full DeepSimulator Profile

DHOEM

http://dhoem.sourceforge.net/

a statistical simulation software for simulating new markers in real SNP marker data

Description

A simulation tool named DHOEM (densification of haplotypes by loess regression and maximum likelihood) which is free from population assumptions and simulates new markers in real SNP marker data. The main objective of DHOEM is to generate a new population, which incorporates real and simulated SNP by statistical learning from an initial population, which match the realized features of the latter.

Full DHOEM Profile

discoal

https://github.com/kr-colab/discoal

flexible coalescent simulations with selection

Description

discoal is a coalescent simulation program capable of simulating models with recombination, selective sweeps, and demographic changes including population splits and admixture events.

Full discoal Profile

DWGSIM

https://github.com/nh13/DWGSIM

Whole Genome Simulator for Next-Generation Sequencing

Description

Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li, and forked from DNAA. It was modified to handle ABI SOLiD and Ion Torrent data, as well as various assumptions about aligners and positions of indels. Many new features have been subsequently added.

Full DWGSIM Profile

dyngen

https://github.com/dynverse/dyngen

Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells

Description

We present dyngen, a multi-modal simulation engine for studying dynamic cellular processes at single-cell resolution. dyngen is more flexible than current single-cell simulation engines, and allows better method development and benchmarking, thereby stimulating development and testing of computational methods. We demonstrate its potential for spearheading computational methods on three applications: aligning cell developmental trajectories, cell-specific regulatory network inference and estimation of RNA velocity.

Full dyngen Profile

EAGLE

https://github.com/sequencing/EAGLE

Enhanced Artificial Genome Engine: next generation sequencing reads simulator

Description

The Enhanced Artificial Genome Engine (EAGLE) software is designed to simulate the behaviour of Illumina's Next Generation Sequencing instruments, in order to facilitate the development and testing of downstream applications.

Full EAGLE Profile

Easypop

https://www.unil.ch/dee/en/home/menuinst/open-positions-and-public-resources/softwares--dataset/softwares/easypop.html

EASYPOP is an individual based model intended to simulate datasets under a very broad range of conditions

Description

EASYPOP can simulate haploid, diploid or haplodiploid data. For diploids there is the choice between hermaphrodites or sexuals. For hermaphrodites, the proportion of clonal reproduction and selfing can be chosen, whereas for sexuals, complex breeding structures can be simulated (e.g. monogamy with a given proportion of extra-pair matings). The number of individuals can be selected for each population and dispersal is sex-specific. There are various migration models such as two-dimensional stepping stone or hierarchical island model. In addition there is an isolation-by-distance option which works with the coordinates of the populations on any number of dimensions. There are also several mutation models implemented, which are particularly oriented on the simulation of microsatellite loci. Genotypes are real multilocus, (i.e. there are not independent replicates for each locus). All mutation parameters can be set individually for each locus. EASYPOP is able to handle very large simulations on standard personal computers and is limited only by the memory of the machine. The computer code has been optimized for maximum speed. This allows running very large simulations on personal computers in a reasonable amount of time. In order to fit to analytical xpectations in particular for variances, the functions implemented in EASYPOP are probabilistic and not deterministic. In other words, the simulations rely on the genertation of random numbers.

Full Easypop Profile

EggLib

http://egglib.sourceforge.net/

EggLib is a C++/Python library and program package for evolutionary genetics and genomics.

Description

EggLib is a C++/Python library and program package for evolutionary genetics and genomics. Main features are sequence data management, sequence polymorphism analysis, coalescent simulations and Approximate Bayesian Computation. EggLib is a flexible Python module with a performant underlying C++ library (which can be used independently), and allows fast and intuitive development of Python programs and scripts. A number of pre-programmed applications of EggLib possibilities are available interactively.

Full EggLib Profile

EpiGEN

https://github.com/daisybio/epigen

An epistasis simulation pipeline

Description

EpiGEN is an easy-to-use epistasis simulation pipeline written in Python. It supports epistasis models of arbitrary size, which can be specified either extensionally or via parametrized risk models. Moreover, the user can specify the minor allele frequencies (MAFs) of both noise and disease SNPs, and provide a biased target distribution for the generated phenotypes to simulate observation bias.

Full EpiGEN Profile

EpiReSIM

https://github.com/CDMB-lab/EpiReSIM

A resampling method of epistatic model without marginal effects using under-determined system of equations

Description

EpiReSIM provides two strategies for solving eNME models. One is to calculate eNME models using prevalence constraints, and another is by joint constraints of prevalence and heritability. We transform the computation of the model into the problem of solving the under-determined system of equations. Introducing the complete orthogonal decomposition method and Newton’s method, EpiReSIM calculates the solution of the underdetermined system of equations to obtain the eNME model, especially the solution of the high-order model, which is the highlight of EpiReSIM. Second, based on the computed eNME model, EpiReSIM generates simulation data by a resampling method. Experimental results show that EpiReSIM has advantages in preserving the biological properties of minor allele frequencies and calculating high-order models, and it is a convenient and effective alternative method for curre

Full EpiReSIM Profile

EpiSIM

https://sourceforge.net/projects/episimsimulator/files/

EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis

Description

Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of the main factors in current efforts to detect missing heritability for complex diseases. Simulation is a critical tool in developing methodologies that can more effectively detect and study epistasis. Here we present a simulator, epiSIM (epistasis SIMulator), that can simulate some of the statistical properties of genetic data. EpiSIM is capable of expanding the range of the epistasis models that current simulators offer, including epistasis models that display marginal effects and those that display no marginal effects. One or more of these epistasis models can be embedded simultaneously into a single simulation data set, jointly determining the phenotype. In addition, epiSIM is independent of any outside data source in generating linkage disequilibrium patterns and haplotype blocks. We demonstrate the wide applicability of epiSIM by performing several data simulations, and examine its properties by comparing it with current representative simulators and by comparing the data that it generates with real data. Our experiments demonstrate that epiSIM is a valuable addition and a nice complement to the existing epistasis simulators. The software package is available online at https://sourceforge.net/projects/episimsimulator/files/.

Full EpiSIM Profile

ESCO

https://github.com/JINJINT/ESCO

ESCO: single cell expression simulation incorporating gene co-expression

Description

Ensemble Single-cell expression simulator incorporating gene CO-expression, ESCO, is constructed as an ensemble of the best features among current simulators to preserve the marginal performance, while allowing easily incorporating co-expression structure among genes using a copula. Particularly, ESCO allows realistic simulation of a homogeneous cell group, heterogeneous cell groups, as well as complex cell group relationships such as tree and trajectory structure, together with a flexible input of co-expression. As for technical noise, ESCO integrates the parametric and non-parametric approaches in current literature and gives the user flexibility to choose. In order to mimic a specific real dataset, ESCO can estimate all the hyperparameters in a feasible way for both a homogeneous cell group or heterogeneous cell groups. ESCO is implemented in the R package ESCO, which is built upon the R package Splatter (Zappia et al., 2017), in order to provide a unified software framework.

Full ESCO Profile

EvolSimulator

http://bioinformatics.org.au/tools/evolsim/

A simulation test bed for hypotheses of genome evolution

Description

EvolSimulator is a program that allows the simulation of evolution at the level of genes, gene families, and whole genomes. It was designed with the goal of investigating evolutionary phenomena like biased mutation regimes in different lineages, complicated patterns of selective pressure across sequences, and the confounding effects of paralogy and lateral genetic transfer.

Full EvolSimulator Profile

EvolveAGene

https://sourceforge.net/projects/evolveagene/?source=navbar

A realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions

Description

EvolveAGene 3 is a realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions, including variable regions of selection intensity within the sequence and variation in intensity of selection over branches. Variation includes base substitutions, insertions, and deletions.

Full EvolveAGene Profile

FASTQSim

https://sourceforge.net/projects/fastqsim

platform-independent data characterization and in silico read generation for NGS datasets

Description

FASTQSim is a tool that provides the dual functionality of Next-Gen Sequencing dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with matching error profiles.

Full FASTQSim Profile

fastsimcoal2

http://cmpg.unibe.ch/software/fastsimcoal2/

A continuous-‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios

Description

While preserving all the simulation flexibility of simcoal2, fastsimcoal is now implemented under a faster continous-time sequential Markovian coalescent approximation, allowing it to efficiently generate genetic diversity for different types of markers along large genomic regions, for both present or ancient samples. It includes a parameter sampler allowing its integration into Bayesian or likelihood parameter estimation procedure. fastsimcoal can handle very complex evolutionary scenarios including an arbitrary migration matrix between samples, historical events allowing for population resize, population fusion and fission, admixture events, changes in migration matrix, or changes in population growth rates. The time of sampling can be specified independently for each sample, allowing for serial sampling in the same or in different populations. Different markers, such as DNA sequences, SNPs, STRs (microsatellites) or multi-locus allelic data can be generated under a variety of mutation models (e.g. finite- and infinite-site models for DNA sequences, stepwise or generalized stepwise mutation model for STRs data, infinite-allele model for standard multi-allelic data). fastsimcoal can simulate data in genomic regions with arbitrary recombination rates, thus allowing for recombination hotspots of different intensities at any position. fastsimcoal implements a new approximation to the ancestral recombination graph in the form of sequential Markov coalescent allowing it to very quickly generate genetic diversity for >100 Mb genomic segments. fastsimcoal2 now allows one to estimate demographic parameters from the (joint) site frequency spectrum (SFS) using simulations to compute the expected SFS and a robust method for the maximization of the composite likelihood.

Full fastsimcoal2 Profile

FastSLINK

https://watson.hgen.pitt.edu/register/soft_doc.html

Simulation of Marker and Phenotype Data in Pedigrees

Description

FastSLINK permits simulation of marker and phenotype data in large pedigrees. Both power and significance can be evaluated. FastSLINK also supports locus heterogeneity.

Full FastSLINK Profile

FAVITES

https://github.com/niemasd/FAVITES

FrAmework for VIral Transmission and Evolution Simulation

Description

FAVITES (FrAmework for VIral Transmission and Evolution Simulation) is a robust modular framework for the simultaneous simulation of a transmission network and viral evolution, as well as simulation of sampling imperfections of the transmission network and of the sequencing process (Moshiri et al., 2018). The framework is robust in that the simulation process has been broken down into a series of interactions between abstract module classes, and the user can simply plug in each desired module implementation (or implement one from scratch) to customize any stage of the simulation process.

Full FAVITES Profile

FFPopSim

http://webdav.tuebingen.mpg.de/ffpopsim/

C++/Python library for population genetics.

Description

FFPopSim is a C++ and Python library to simulate large populations that are polymorphic at many loci. It allows for complex fitness functions, including pairwise and higher order epistasis. It is designed to study the effects of linked selection, the rare processes in large populations, and can be used to address a large variety of population genetics problems.

Full FFPopSim Profile

FIGG

http://insilicogenome.sourceforge.net/

Description

FIGG is a genome simulation tool that uses known or theorized variation frequency, per a given fragment size and grouped by GC content across a genome to model new genomes in FASTA format while tracking applied mutations for use in analysis tools or population simulations. FIGG uses Apache MapReduce and HBase to rapidly generate individual genomes and allow users to scale up generation to fit specific project needs.

Full FIGG Profile

FLUX SIMULATOR

http://confluence.sammeth.net/display/SIM/Home

The Flux Simulator aims at providing a deterministic in silico reproduction of the experimental pipelines for RNA-Seq, employing a minimal set of parameters.

Description

The FluxSimulator is the part of the FLUX project that aims at providing an in silico reproduction of the experimental pipelines for RNA-Seq, adopting a minimal set of parameters. Corresponding models were established after analyzing RNA-Seq experiments from different cell types, sample preparation protocols and sequencing platforms. The first step of the FLUX project is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are mimicked-whether they incur during library construction, or, in the sequencing process. The FluxSimulator provides a flexible base to design benchmark experiments based on the new sequencing technologies, as for instance abundance predictions of the FluxCapacitor.

Full FLUX SIMULATOR Profile

forqs

https://bitbucket.org/dkessner/forqs

Forward-in-time simulation of Recombination, Quantitative Traits, and Selection

Description

forqs is a forward-in-time population genetics simulation that tracks individual haplotype chunks as they recombine each generation. forqs also also models quantitative traits and selection on those traits. forqs is implemented as a command-line C++ program, using a modular design that gives the user great flexibility in creating custom simulations. It is freely available with a permissive BSD license.

Full forqs Profile

FPG

https://bio.cst.temple.edu/~hey/software

Forward Population Genetic simulation

Description

FPG (for Forward Population Genetic simulation) simulates a population of constant size that is undergoing various evolutionary processes, including: mutation, recombination, natural selection, and migration. The meaning of "forward" in this context is simply that time, within the simulation, moves forward just as it does in the real world. This is in contrast to coalescent population genetic simulation in which time, as represented within the simulation, proceeds back into the past. Coalescent simulations have many advantages, but they are unwieldy if they incorporate natural selection on multiple sites.

Full FPG Profile

FreeHi-C

https://github.com/yezhengSTAT/FreeHiC

FreeHi-C simulates high fidelity Hi-C data for benchmarking and data augmentation

Description

FreeHi-C (v2.0) is short for Fragment interactions empirical estimation for fast simulation of Hi-C data. It is a data-driven Hi-C data simulator for simulating and augmenting Hi-C datasets. FreeHi-C employs a non-parametric strategy for estimating an interaction distribution of genome fragments and simulates Hi-C reads from interacting fragments. Data from FreeHi-C exhibit higher fidelity to the biological Hi-C data. FreeHi-C not only can be used to study and benchmark a wide range of Hi-C analysis methods but also boosts power and enables false discovery rate control for differential interaction detection algorithms through data augmentation. Different from FreeHi-C (v1.0), a spike-in module is added enabling the simulation of true differential chromatin interactions. FreeHi-C is designed for studies that are prone to simulate Hi-C interactions from the real data and add deviations from the true ones. Therefore, FreeHi-C requires real Hi-C sequencing data (FASTQ format) as input along with user-defined simulation parameters. FreeHi-C will eventually provide the simulated genomics contact counts in a sparse matrix format (BED format) which is compatible with the standard input of downstream Hi-C analysis.

Full FreeHi-C Profile

FREGENE

http://www.ebi.ac.uk/projects/BARGEN

FREGENE is a C++ program that simulates sequence-like data over large genomic regions in large diploid populations.

Description

FREGENE works forwards-in-time which allows a wide range of demographic and selection scenarios to be implemented. Many such models are already incorporated into FREGENE, and since it is open source users can modify or extend these. Coalescent methods have difficulty incorporating large amounts of gene conversion or crossover (Hoggart et al. 2007), whereas these pose no particular problem for FREGENE. FREGENE offers a flexible model for recombination hotspots, and can readily simulate regions up to tens of Mb on a standard desktop computer. The principle limitation of forward-in-time algorithms is computational, since the entire population must be tracked through time, not only the chromosomes that are ancestral to the observed sample. FREGENE implements many features to enhance computational efficiency, and includes a rescaling option that greatly reduces computation time at the cost of some approximation.

Full FREGENE Profile

fwdpp

https://github.com/molpopgen/fwdpp

A C++ template library for implementing efficient forward simulations.

Description

Fwdpp is a C++11 library intended to help implement forward-time population genetic simulations.

Full fwdpp Profile

G2P

https://github.com/XiaoleiLiuBio/G2P

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

Description

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

Full G2P Profile

GAMETES

https://sourceforge.net/projects/gametes/?source=navbar

Genetic Architecture Model Emulator for Testing and Evaluating Software: Simulates complex SNP models with pure, strict epistatic interactions with n-loci.

Description

Rapid, user friendly software package, able to generate whole populations of “worst-case-scenario” complex genetic models with random architectures, but a user specified set of constraints (i.e. number of loci, heritability, allele frequencies, prevalence). Intended for testing and evaluating algorithms or software for their ability to detect and model epistatic interactions in the absence of any main effects. The next version will add the ability to generate heterogeneous datasets (specifically datasets which concurrently contain both epistatic and heterogeneous effects.

Full GAMETES Profile

GARLIC

https://github.com/caballero/Garlic

Artificial DNA sequence generator

Description

A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.

Full GARLIC Profile

GASP

http://research.nhgri.nih.gov/gasp/

Genometric Analysis Simulation Program. A software tool for testing and investigating methods in statistical genetics by generating samples of family data based on user specified models.

Description

The Genometric Analysis Simulation Program (G.A.S.P.) is a software tool that can generate samples of family data based on user specified genetic models. Data generated can be as simple as a single sample of random individuals with a single normally distributed trait or as complex as thousands of samples of extended families with multiple traits based on a linear combination of major locus, polygenic, common sibship environment and covariate components. Traits can be generated based on a number of user specified components, and components can be unique to a single trait or shared by multiple traits. The user first specifies a list of all desired components and then creates each trait by specifying the desired component weighted by its contribution to the phenotypic variance. G.A.S.P. can be used in two ways. First, data can be generated in a standalone fashion. The resulting family data can be saved and then used as sample data for demonstrating applications and methods of genetic analysis or for testing and verifying newly developed algorithms in statistical genetics. A simple driver ("dataonly") is provided for this application. Second, data can be generated and analyzed immediately using an existing statistical package. A driver can be designed to call subroutine versions of widely available genetic analysis programs.

Full GASP Profile

GCTA

http://cnsgenomics.com/software/gcta/

Genome-wide Complex Trait Analysis

Description

GCTA (Genome-wide Complex Trait Analysis) was originally designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits (the GREML method), and has subsequently extended for many other analyses to better understand the genetic architecture of complex traits. GCTA currently supports the following functionalities: 1) Estimate the genetic relationship from genome-wide SNPs; 2) Estimate the inbreeding coefficient from genome-wide SNPs; 3) Estimate the variance explained by all the autosomal SNPs; 3) Partition the genetic variance onto individual chromosomes; 4) Estimate the genetic variance associated with the X-chromosome; 5) Test the effect of dosage compensation on genetic variance on the X-chromosome; 6) Predict the genome-wide additive genetic effects for individual subjects and for individual SNPs; 7) Estimate the LD structure encompassing a list of target SNPs; 8) Simulate GWAS data based upon the observed genotype data; 9) Convert Illumina raw genotype data into PLINK format; 10) Conditional & joint analysis of GWAS summary statistics without individual level genotype data; 11) Estimating the genetic correlation between two traits (diseases) using SNP data; 12) Mixed linear model association analysis

Full GCTA Profile

GemSIM

http://sourceforge.net/projects/gemsim/

Next generation sequencing read simulator

Description

GemSIM is a software package for generating realistic simulated next generation sequencing reads with quality score values. Both Illumina and Roche/454 reads (single or paired end) can be simulated using empirically derived error models.

Full GemSIM Profile

GeneEvolve

https://github.com/rtahmasbi/GeneEvolve

A fast and memory efficient forward-time simulator of realistic whole-genome sequence and SNP data

Description

GeneEvolve is a user-friendly and efficient population genetics simulator that handles complex evolutionary and life history scenarios and generates individual-level phenotypes and realistic whole-genome sequence or SNP data. GeneEvolve runs forward-in-time, which allows it to provide a wide range of scenarios for mating systems, selection, population size and structure, migration, recombination and environmental effects. The software is designed to use as input data from real or previously simulated phased haplotypes, allowing it to mimic very closely the properties of real genomic data.

Full GeneEvolve Profile

GeneSPIDER

https://bitbucket.org/sonnhammergrni/genespider/src/master/

Gene regulatory network inference benchmarking with controlled network and data properties

Description

Inference of gene regulatory networks (GRNs) is a central goal in systems biology. It is therefore important to evaluate the accuracy of GRN inference methods in the light of network and data properties. Although several packages are available for modelling, simulate, and analyse GRN inference, they offer limited control of network topology together with system dynamics, experimental design, data properties, and noise characteristics. Independent control of these properties in simulations is key to drawing conclusions about which inference method to use in a given condition and what performance to expect from it, as well as to obtain properties representative of real biological systems.

Full GeneSPIDER Profile

GENLIB

https://github.com/R-GENLIB/GENLIB

An R package for the analysis of genealogical data

Description

GENLIB is an R package specifically designed to analyze large genealogical datasets. Genealogical data from human founder populations can contribute to research in diverse fields from genetic epidemiology to historical geography, along with population genetics, evolutionary biology, demography and social history. Animal and plant geneticists also need to analyze large pedigrees. GENLIB has several functionalities ranging from descriptive statistics specifically developed for genealogical data to simulations of genomic segments passed down the genealogies from the founders. GENLIB functions can be grouped into 4 categories: i) genealogical data management, ii) data description and visualisation, iii) computation of relevant statistics (e.g., kinship coefficients for pairs of individuals) and iv) simulations.

Full GENLIB Profile

GENOME

http://csg.sph.umich.edu/liang/genome/

A rapid coalescent-based whole genome simulator

Description

GENOME is a program to simulate sequences drawn from a population under the Wright-Fisher neutral model (Ewens 1979). It is based on a standard coalescent model (Hudson 1983, 1990; Donnelly & Tavaré 1995). Starting with the sampled sequences and moving backward in time, coalescent, recombination and migration events are simulated at each generation. These events could occur multiple times and could happen in the same generation. Each coalescent event is recorded and the resulting genealogy tree is constructed. Demographic events such as population bottlenecks and expansions or population merges and splits can also be simulated. In addition to uniform recombination rates, it is possible to allow recombination rates to vary so as to mimic the pattern of hotspots along the genome. After simulating a coalescent tree, mutations are placed along each branch. The number of mutations on each branch follows a Poisson distribution with mean equal to the product of the mutation rate and the branch length. The infinite-site mutation model is assumed, so no recurrent mutation can occur. The genealogy tree can also be output in Newick format, which is identical to that used by programs such as PHYLIP (Felsenstein 2005) and seq-gen (Rambaut & Grassly 1997). The program is written in C++ and is portable to multiple operating systems. The following sections will describe how to download and compile the program and how to specify the parameters for the program.

Full GENOME Profile

GenomePop2

http://acraaj.webs.uvigo.es/GenomePop2.htm

GenomePop2 is a specialization of the program GenomePop just to manage SNPs under more flexible and useful settings. If you need models with more than 2 alleles please use the GenomePop program version.

Description

This new version allows the forward simulation of sequences of biallelic positions. As in the previous version, a number of evolutionary and demographic settings are allowed. Several populations under any migration model can be implemented, contraction-expansion scenarios, directional or divergent selection. Theoretical or simulated initial equilibrium population can be computed the same as speciation processes via the simulation of user-defined population splits. Each population consists of a number N of individuals. Each individual is represented by one or more chromosomes with constant or variable (hotspots) recombination between binary sites.

Full GenomePop2 Profile

GenomeSimla

https://ritchielab.org/research/research-areas/statistical-genetics-and-gen-epi/methods/genomesimla

GenomeSIMLA is currently under development- however, we have a beta release that we are asking to be tested

Description

GenomeSimla uses Hardy-Weinburg mating to advance simulated genetic data forward through time from generation to generation. Next, we included two distinct algorithms to aide the user in developing various types of disease models: SIMLA for diseases with interactions and main effects and simPEN for embedding purely epistatic models.

Full GenomeSimla Profile

Genomic Variant Simulator

https://cadd.gs.washington.edu/simulator

generating simulated single nucleotide and indel variants

Description

The script for generating simulated single nucleotide and indel variants as well as the parameter files used to simulate the variants for the above manuscript are available for download here. This software is released under a MIT license (license text available from the ZIP-archive). Please see the README file contained in the ZIP-archive for further information about the software.

Full Genomic Variant Simulator Profile

GenPhyloData

https://code.google.com/p/jprime/

realistic simulation of gene family evolution

Description

PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and—perhaps more interestingly—also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock.

Full GenPhyloData Profile

GENS2

https://sourceforge.net/projects/gensim/

Simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions.

Description

The Gene-Environment iNteraction Simulator 2 (GENS2) simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. GENS2 is based on data with realistic patterns of linkage disequilibrium, and imposes no limitations either on the number of individuals to be simulated or on number of non-predisposing genetic/environmental factors to be considered. The GENS2 tool is able to simulate gene-environment and gene-gene interactions. To make the Simulator more intuitive, the input parameters are expressed as standard epidemiological quantities. GENS2 is written in Python language and takes advantage of operators and modules provided by the simuPOP simulation environment. GENS2 is not intended to simulate the evolution of a population, but to simulate complex gene-environment interactions in case-control sample. It shuold be used along with simuPOP, a software that allows realistic evolutionary simulation (or an equivalent simulator), to simulate dataset on which apply disease model .

Full GENS2 Profile

Geonomics

https://github.com/erthward/geonomics

A Python package for simulation of genomic evolution on complex and dynamic landscapes

Description

Geonomics is a Python package for forward-time, individual-based, continuous-space, population genomic simulations on complex and dynamic landscapes. Geonomics models are parameterized by way of an informatively annotated parameters file that provides the user a straightforward means of building models of arbitrary complexity while offering reasonable default settings and “off switches” for parameters and components unrelated to the user’s interests. Models consist of 1) a landscape with one or more environmental layers, each of which can undergo arbitrarily complex environmental change events and 2) one or more species having genomes with realistic architecture and any number of associated phenotypes. Species undergo non-Wright-Fisher evolution in continuous space, with localized mating and mortality, such that species-level phenomena and simulation dynamics are emergent properties of a model’s parameterization. Evolution is comprehensively tracked by way of tskit data structures that record the complete spatial pedigree, providing for the customizable output of rich, 3D data sets in a variety of common formats, including VCF and FASTA for genomic data, GeoTiff for landscape data, and CSV, Shapefile, and GeoJSON for individuals’ nongenomic data (location, environmental values, phenotypes, age, and sex). All of this allows Geonomics to produce realistic landscape genomic results useful for a wide variety of theoretical and empirical purposes.

Full Geonomics Profile

GPOPSIM

https://github.com/SCAU-AnimalGenetics/GPOPSIMv2

GPOPSIM is a simulation tool for pedigree, phenotypes, and genome data.

Description

GPOPSIM is a simulation tool for pedigree, phenotypes, and genome data. The software uses a variety of population and genome structures as well as trait genetic architectures. GROPSIM also provides parameter settings for a wide variety of disciplines. The package is capable of simulating multiple genetically correlated traits with given genetic parameters along with underlying genetic architectures.

Full GPOPSIM Profile

GppFst

https://github.com/radamsRHA/GppFst

GppFst is an open source R package that generates posterior predictive distributions of Fst and day under a neutral coalescent model to identify putative targets of selection from genomic data.

Description

GppFst is a posterior predictive simulation (PPS) framework to generate theoretical distributions of FST and dXY under the neutral coalescent model for two populations that accounts for demographic parameters in a probabilistic framework. Importantly, our method allows users to explicitly test the null hypothesis of genetic drift when conducting genomic scans. PPS is a popular method for evaluating model fit within a Bayesian framework that has been used to test a variety of evolutionary models (Gelman et al., 2004; Reid et al., 2014). GppFst explicitly accounts for the demographic history of two genetically-isolated species, including multiple demographic and experimental parameters (and uncertainty in those parameters), such as sample sizes, demographic parameters, unequal rates of genetic drift within populations (unequal s), and divergence time. Our method allows users to simulate theoretical distributions that are conditioned on sampling multiple linked SNPs per locus – allowing users to take full advantage of large genomic datasets. We provide our PPS model in the package GppFst (Genomic Posterior Predictive distributions of FST), which offers a user-friendly, open-source framework to generate theoretical distributions of FST and dXY under the neutral coalescent model.

Full GppFst Profile

Grinder

https://sourceforge.net/projects/biogrinder/

Grinder is a versatile open-source bioinformatic tool to create simulated omic shotgun and amplicon sequence libraries for all main sequencing platforms.

Description

Grinder is a versatile open-source bioinformatic tool to create simulated omic shotgun and amplicon sequence libraries for all main sequencing platforms.

Full Grinder Profile

GS

http://engr.case.edu/li_jing/gs.html

Generating samples for association studies based on HapMap data

Description

A new version of gs is available. In addition to the functionalities implemented earlier, gs2.0 has implemented a comprehensive yet flexible model to simulate genetic and environmental interactions. The program can be used to generate samples in testing algorithms for tag SNP selection, haplotype inference, as well as epistatic detection.

Full GS Profile

GWAsimulator

https://biostat.app.vumc.org/wiki/Main/GWAsimulator

A rapid whole genome simulation program

Description

GWAsimulator is a C++ program that can simulate genotype data for SNP chips that are used in genome-wide association (GWA) studies. It implements a rapid moving-window algorithm (Durrant et al. 2004. AJHG 75:35-43) to simulate whole genome case-control or population samples. It also can simulate specific regions if desired. For case-control data, the program retrospectively sample cases and controls according to a user-specified multi-locus disease model. The program requires phased data as input, and the simulated data will have similar LD patterns as the input data. The program can use HapMap phased data as input and has the flexibility of simulating genotypes for different populations and different SNP chips. Because many large-scale GWA data are becoming available, they can be used instead of the HapMap data as the input, as long as the phase information is generated. These data may provide a better representation of the population under study and more accurate LD information than the HapMap due to much larger sample sizes. See the manual for instructions and detailed description of the program

Full GWAsimulator Profile

HAP-SAMPLE

https://sites.google.com/a/umich.edu/leeshawn/software

An association simulator for candidate regions or genome scans

Description

HAP-SAMPLE is a web application for simulating SNP genotypes for case-control and affected-child trio studies by resampling from Phase I/II HapMap SNP data. The user provides a list of SNPs to be "genotyped," along with a disease model file that describes causal SNPs and their effect sizes. The simulation tool is appropriate for candidate regions or whole-genome scans. The stand-alone software is also available.

Full HAP-SAMPLE Profile

HAPGEN

https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html

A simulator for the simulation of case control datasets at SNP markers

Description

HAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs. We hope to add further facilities to simulate quantitive traits and admixture soon.

Full HAPGEN Profile

HaploDX

https://github.com/remytuyeras/HaploDynamics

A python library to develop genomic data simulators

Description

The HaploDX library provides a collection of functions to generate simulated population-specific genomic data in VCF format. The library includes parameters and functions to control mutation rates, linkage disequilibrium strength and block lengths, and number of individuals. To generate genomic data, the HaploDX framework offers a pipeline of functions that can be used to simulate: (1) the allele frequency spectra of different populations; (2) the Hardy-Weinberg principle for genotypes and haplotypes; (3) linkage disequilibrium across different populations.

Full HaploDX Profile

HapSim

http://cran.r-project.org/web/packages/hapsim/index.html

A simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients

Description

Package for haplotype data simulation. Haplotypes are generated such that their allele frequencies and linkage disequilibrium coefficients match those estimated from an input data set

Full HapSim Profile

HAPSIMU

http://l.web.umkc.edu/liujian/

A program that simulates heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model

Description

HAPSIMU, a program based on real haplotype data from the HapMap ENCODE project, can simulate heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model. Moreover, both qualitative and quantitative traits can be simulated using additive genetic model with various genetic parameters designated by users.

Full HAPSIMU Profile

IBDsim

http://raphael.leblois.free.fr/

IBDSim is a computer package for the simulation of genotypic data under general isolation by distance models.

Description

IBDSim can consider a large panel of subdivided population models representing discrete subpopulations as well as a large continuous population. Many dispersal distributions, with different tails, can be considered as well as various heterogeneities in space and time of the demographic parameters. For examples of various applications see Leblois et al. (2003), Leblois et al. (2004), Leblois et al. (2006), Rousset & Leblois (2007). The program runs on PC under Windows, Mac or Linux systems, and we provide the source code that can be easily compiled under any system using C++ ISO compiler.

Full IBDsim Profile

IgSimulator

http://yana-safonova.github.io/ig_simulator/

a versatile immunosequencing simulator

Description

IgSimulator is a tool for simulation of antibody repertoire and Ig-seq library. IgSimulator is designed for testing and benchmarking tools for reconstruction of Ig repertoires.

Full IgSimulator Profile

indel-Seq-Gen

http://bioinfolab.unl.edu/~cstrope/iSG/

A biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies

Description

indel-Seq-Gen (iSG) is a biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies. This is accomplished through the addition of subsequence length constraints and lineage- and site-specific evolution. iSG tracks insertion and deletion processes that occur during the simulation run. iSG records all evolutionary events and outputs the "true" multiple alignment of the sequences, and can generate a larger simulated sequence space by allowing the use of multiple related root sequences. iSG can be used to test the accuracy of multiple alignment methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein superfamily classification methods.

Full indel-Seq-Gen Profile

Indelible

http://abacus.gene.ucl.ac.uk/software/indelible/

A powerful and flexible simulator of biological evolution

Description

INDELible is a new, portable, and flexible application for biological sequence simulation that combines many features in the same place for the first time. Using a length-dependent model of indel formation it can simulate evolution of multi-partitioned nucleotide, amino-acid, or codon data sets through the processes of insertion, deletion, and substitution in continuous time.

Full Indelible Profile

InSilicoSeq

https://github.com/HadrienG/InSilicoSeq

A sequencing simulator

Description

InSilicoSeq is a sequencing simulator producing realistic Illumina reads. Primarily intended for simulating metagenomic samples, it can also be used to produce sequencing data from a single genome. InSilicoSeq is written in python, and use kernel density estimators to model the read quality of real sequencing data. InSilicoSeq support substitution, insertion and deletion errors. If you don't have the use for insertion and deletion error a basic error model is provided.

Full InSilicoSeq Profile

interSIM

https://cran.r-project.org/web/packages/InterSIM/index.html

InterSIM: simulation tool for multiple integrative omic datasets. Comput. Methods Prog. Biomed

Description

Generates three inter-related genomic datasets : methylation, gene expression and protein expression. Input: Number of samples, proportion of samples in the cluster groups, cluster mean shift parameter delta and a few other options. Output: Generation of three datasets, methylation, gene expression and protein expression with inter- and intra- correlations having cluster group information. Also, the true clustering clustering assignment to each subject is generated.

Full interSIM Profile

invertFREGENE

http://www.ebi.ac.uk/projects/BARGEN/

InvertFREGENE is a forward-in-time simulator of inversions in population genetic data

Description

invertFREGENE is the forward-in-time simulator of inversions in population genetic data, while SAMPLE samples genotype and haplotype data from the output of invertFREGENE simulations based on specified individual and marker ascertainment criteria, including a continuous and case-control disease model. invertFREGENE has been developed from a beta version of the population genetic simulator FREGENE, and as a result there are a small number of features not included in invertFREGENE (eg. it does not model natural selection), therefore we provide self-contained documentation for invertFREGENE. O'Reilly PF, Coin LJ, Hoggart CJ. invertFREGENE: software for simulating inversions in population genetic data. Bioinformatics. 2010 Mar 15;26(6):838-40.

Full invertFREGENE Profile

J-SPACE

https://github.com/BIMIB-DISCo/J-Space.jl

J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments

Description

J-SPACE is a Julia package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells. Firstly, the software simulates the spatial dynamics of the cells as a continuous-time multi-type birth-death stochastic process on a graph employing different rules of interaction and an optimised Gillespie algorithm. After mimicking a spatial sampling of the tumour cells, J-SPACE returns the phylogenetic tree of the sample and simulates molecular evolution of the genome under the infinite-site models or a set of different substitution models. Ther is also the possibility of include indels. Finally, employing ART, J-SPACE generates the synthetic single-end, paired-/mate-pair end reads of the next-generation sequencing platforms.

Full J-SPACE Profile

kernalPop

http://cran.r-project.org/src/contrib/Archive/kernelPop/

A spatially explicit population genetic simulation engine

Description

Individual-based, spatially explicit models provide a mechanism to understand distributions of individuals on the landscape; however, few models have been coupled with population genetics. The primary benefits of such a combination is to assess performance of populationgenetic estimators in realistic situations. KERNELPOP represents a flexible framework to implement almost any arbitrary population-genetic and demographic model in a spatially explicit context using a variety of dispersal kernels. Estimates of type I error associated with genome scans in metapopulations are provided as an illustration of this software’s utility

Full kernalPop Profile

LongISLND

https://github.com/bioinform/longislnd

LongISLND is a read simulator which profiles the characteristics of third generation, single-molecule sequencing technologies and simulates accordingly

Description

LongISLND is a read simulator which profiles the characteristics of third generation, single-molecule sequencing technologies and simulates accordingly. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. Please read on to see application examples to PacBio and oxford nanopre (ONT) data.

Full LongISLND Profile

LSH-GAN

https://github.com/Snehalikalall/LSH-GAN

LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data

Description

A fundamental problem of downstream analysis of scRNA-seq data is the unavailability of enough cell samples compare to the feature size. This is mostly due to the budgetary constraint of single cell experiments or simply because of the small number of available patient samples. Here, we present an improved version of generative adversarial network (GAN) called LSH-GAN to address this issue by producing new realistic cell samples. We update the training procedure of the generator of GAN using locality sensitive hashing which speeds up the sample generation, thus maintains the feasibility of applying the standard procedures of downstream analysis. LSH-GAN outperforms the benchmarks for realistic generation of quality cell samples. Experimental results show that generated samples of LSH-GAN improves the performance of the downstream analysis such as feature (gene) selection and cell clustering. Overall, LSH-GAN therefore addressed the key challenges of small sample scRNA-seq data analysis.

Full LSH-GAN Profile

MaCS

https://github.com/gchen98/macs

Markovian Coalescent Simulator

Description

MaCS is a simulator of the coalescent process that simulates geneologies spatially across chromosomes as a Markovian process. The algorithm is similar to the SMC algorithm (McVean and Cardin, Phil Trans Soc R B 2005) in that the algorithm scales linearly in time with respect to sample size and sequence length. However, it more accurately models the true coalescent, while supporting all demographic scenarios found in the popular program MS (Hudson, Bioinformatics 2002) making this program appropriate for simulating data for structured populations in genome wide association studies.

Full MaCS Profile

Marlin

http://www.patrickmeirmans.com/software/Marlin.html

Marlin provides a user-friendly interface for performing forward-in-time population genetic simulations.

Description

Marlin is a program for running spatially explicit forward-in-time population genetic simulations. It provides an intuitive user interface with realistic geographic scenarios can easily be easily created and simulated. But Marlin goes further than that and directly analyses and plots the results. This combination of creation, simulation, and analysis makes Marlin ideal for teaching and for scientists who are interested in doing simulations without having to learn command-line operations.

Full Marlin Profile

Mason

http://www.seqan.de/projects/mason/

A package for the simulation of nucleotide data.

Description

Mason is a package for the simulation of nucleotide data. Starting with a genome, you can simulate variants and optionally also methylation levels. From this, reads of different technologies can be simulated, optionally simulating bisulphite treatment. The variants can also be specified as a VCF file. The result are FASTQ files with the reads and optionally a SAM file with the alignment to the reference sequence. Substeps of the process are available as standalone tools, e.g. for the simulation of reads from preselected/-simulated fragments, computing of genomic sequences with variants. The time intensive part of read simulation has been parallelized.

Full Mason Profile

MaSS-Simulator

https://github.com/pcdslab/MaSS-Simulator

MaSS-Simulator: A Highly Configurable Simulator for Generating MS/MS Datasets for Benchmarking of Proteomics Algorithms

Description

MaSS-Simulator offers many configuration options to allow the user a great degree of control over the test datasets, which can enable rigorous and large- scale testing of any proteomics algorithm. MaSS-Simulator is assessed by comparing its performance against experimentally generated spectra and spectra obtained from NIST collections of spectral library. The results show that MaSS-Simulator generated spectra match closely with real-spectra and have a relative-error distribution centered around 25%. In contrast, the theoretical spectra for same peptides have relative-error distribution centered around 150%. MaSS-Simulator will enable developers to specifically highlight the capabilities of their algorithms and provide a strong proof of any pitfalls they might face. Source code, executables, and a user manual for MaSS-Simulator can be downloaded from https://github.com/pcdslab/MaSS-Simulator.

Full MaSS-Simulator Profile

mbs

http://www.sendou.soken.ac.jp/esb/innan/InnanLab/software.html

modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection

Description

A software application to generate samples of DNA sequences when there is a biallelic site targeted by selection. mbs is developed by modifying Hudson's ms. The mbs software is so flexible that it can incorporate any arbitrary histories of population size changes and any mode of selection as long as selection is operating on a biallelic site.

Full mbs Profile

Mendel's Accountant

http://mendelsaccount.sourceforge.net/

Mendel's Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time and was developed collaboratively by Sanford, Baumgardner, Brewer, Gibson and ReMine

Description

MENDEL is a genetic accounting program that allows realistic numerical simulation of the mutation/selection process over time. MENDEL is applicable to either haploid or diploid organisms, having either sexual or clonal reproduction. Each mutation that enters the simulated population is tracked from generation to generation to the end of the experiment - or until that mutation is lost either as a result of selection or random drift. Using a standard personal computer, the MENDEL program can be used to generate and track millions of mutations within a single population. MENDEL's input variables include such things as mutation rate, distribution specifications for mutation effects, extent of dominance, mating characteristics, selection method, average fertility, heritability, non-scaling noise, linkage block properties, chromosome number, genome size, population size, population sub-structure, and number of generations. The MENDEL program outputs, both in tabular and graphic form, provide several types of data including: deleterious and beneficial mutation counts per individual, mean individual fitness as a function of generation count, distribution of mutation effects, and allele frequencies. MENDEL provides biologists with a new tool for research and teaching, and allows for the modeling of complex biological scenarios that would have previously been impossible.

Full Mendel's Accountant Profile

MetaPopGen

https://github.com/MarcoAndrello/MetaPopGen

Simulates genetics in large size metapopulations

Description

MetaPopGen is a population genetics simulator. Features included in the model are age-structure, monoecious and dioecious (or separate sexes) life-cycles, mutation, dispersal and selection. All demographic parameters can be genotype-, sex-, age-, deme- and time-dependent. MetaPopGen is therefore indicated to study large populations and very complex demographic scenarios.

Full MetaPopGen Profile

MetaSim

https://software-ab.informatik.uni-tuebingen.de/download/metasim/welcome.html

A tool to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets

Description

The aim of MetaSim is to provide a tool for the simulation of reads based on given genome sequences refecting (adaptable) error models of current sequencing technologies. Additionally, the user is able to determine the abundance of the chosen taxa. Therefore, MetaSim integrates an induced tree view of the NCBI taxonomy that can be used to interactively select taxa and inner nodes of the taxonomy to congure their relative abundances. Another feature of MetaSim allows the user to simulate an evolved population of a single genome sequence, using a population simulator. This feature is aimed at simulating the common real world situation that many dierent, but closely related strains of a lineage coexist in the same habitat. The resulting data sets can be used to plan and design metagenome studies and for evaluation and improvement of metagenomic software tools and assembly algorithms.

Full MetaSim Profile

metaSPARSim

https://gitlab.com/sysbiobig/metasparsim

metaSPARSim is a sparse count matrix simulator intended for usage in the development of pipelines for 16S rRNA metagenomic data processing.

Description

metaSPARSim is a sparse count matrix simulator intended for usage in development of pipelines 16S rRNA metagenomic data processing. metaSPARSim implements a new generative process that models the sequencing process with a Multivariate Hypergeometric in order to realistically reproduce these data considering their characteristic aspects, such as compositionality and sparsity. It provides ready-to-use count matrices and comes with the possibility to reproduce different pre-coded scenarios or to tune internal parameters in order to create a tailored count matrix that better fits some prior information or specific characteristic an expert user may want to consider.

Full metaSPARSim Profile

MichiGAN

https://github.com/welch-lab/MichiGAN

MichiGAN: sampling from disentangled representations of single-cell data using generative adversarial networks

Description

MichiGAN is a novel neural network that combines the strengths of VAEs and GANs to sample from disentangled representations without sacrificing data generation quality. MichiGAN allows us to manipulate semantically distinct aspects of cellular identity and predict single-cell gene expression response to drug treatment.

Full MichiGAN Profile

MimicrEE2

https://sourceforge.net/projects/mimicree2/

MimicrEE2: Genome-wide forward simulations of Evolve and Resequencing studies

Description

MimicrEE2 is a multi-threaded Java program for genome-wide forward simulations of evolving populations. MimicrEE2 enables the convenient usage of available genomic resources, supports biological particulars of model organism frequently used in E&R studies and offers a wide range of different adaptive models (selective sweeps, polygenic adaptation, epistasis). Due to its user-friendly and efficient design MimicrEE2 will facilitate simulations of E&R studies even for small labs with limited bioinformatics expertise or computational resources. Additionally, the scripts provided for executing MimicrEE2 on a computer cluster permit the coverage even of a large parameter space. MimicrEE2 runs on any computer with Java installed.

Full MimicrEE2 Profile

Minnow

https://github.com/COMBINE-lab/minnow

Minnow is a read level simulator for droplet based single cell RNA-seq data.

Description

Analysis pipelines usually validate their results by using marker genes and simulated data from gene-count-level simulators. The impact of using different read-alignment or UMI deduplication methods has not been investigated. Assessments usually start by assuming a count matrix where the effects for resolving UMI counts from raw read data are ignored. Minnow differs in the respect by modeling Unique Molecule Identifiers selection. Minnow is a read level simulator for droplet based single cell RNA-seq data. Minnow simulates reads by either sampling sequences from the de-Buijin graph of the reference transcriptome or by sampling sequences from the reference transcriptome itself.

Full Minnow Profile

mlcoalsim

https://github.com/CRAGENOMICA/mlcoalsim-v2

Multilocus Coalescent Simulations

Description

The application program mlcoalsim (multilocus coalescent simulations) is designed to: (i) Generate samples and calculate neutrality tests, and other statistics, under stationary model, several demographic models or strong positive selection by mean of coalescent theory. (i) Perform coalescent simulations with the mutational phase given: 1. the population mutation rate θ (θ = 4Nμ, where N is the effective population size and μ is the mutational rate). 2. a fixed number of mutations. 3. a distribution of θ values. A prior uniform (bounded) and a gamma distributions are enabled. 4. a fixed number of biallelic segregating sites taking into account the uncertainty of the population mutation rate (conditioning on biallelic segregating sites). A prior uniform (bounded) and a gamma distributions are enabled. (iii) Perform coalescent simulations with recombination given: 1. the population recombination rate R (R = 4Nr, where r is the recombination rate). 2. a distribution of r values. A prior uniform (bounded) and a gamma distributions are enabled. 3. a fixed number of minimum recombination events (Rm) taking into account the uncer- tainty of the population recombination rate (fixing Rm). A prior uniform (bounded) and a gamma distributions are enabled. 4. a fixed number of minimum recombination events (Rm) and a fixed number of haplo- types, considering the uncertainty of the population recombination rate. (iv) Perform multilocus analyses. Linked loci and unlinked loci are enabled. Multilocus statistics for unlinked loci are the average and the variance for each statistic. (v) Include recurrent mutations (multiple hits) or not. (vi) Include heterogeneity in mutation rate across the length of the sequence. A gamma distri- bution is used. Also, a number of invariant positions can also be defined. (vii) Include heterogeneity in recombination rate across the length of the sequence. A gamma distribution is used. Hotspots or a constant value for all positions are possible. This program is based on a previous version of Hudson’s coalescent program ms (Hudson, 2002) and modified for the above purposes. The function to calculate minimum recombinant values is a modification of Wall’s code (Wall, 2000). The gamma function was partially obtained from Grassly, Adachi and Rambaut code (Grassly et al., 1997). This program is distributed under the GNU GPL License. Version 2 includes parallel computation for multiple locus and the possibility to include priors for each of the parameters (useful for ABC computation analysis). The input file has been modified.

Full mlcoalsim Profile

MOSim

https://www.bioconductor.org/packages/release/bioc/html/MOSim.html

An R package for the simulation of multi-omic experiments that mimic regulatory mechanisms within the cell.

Description

MOSim is an R package for the simulation of multi-omic experiments that mimic regulatory mechanisms within the cell. Gene expression (RNA-seq count data) is the central data type simulated by MOSim, while the rest of available omic data types provide gene regulation information and include ATAC-seq (DNase-seq), ChIP-seq, small RNA-seq and Methyl-seq. In addition to these omics, regulation by transcription factors (TFs) can also be modeled.

Full MOSim Profile

ms

http://home.uchicago.edu/~rhudson1/source/mksamples.html

The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets.

Description

The program ms can be used to generate many independent replicate samples under a variety of assumptions about migration, recombination rate and population size to aid in the interpretation of such polymorphism studies. The samples are generated using the now standard coalescent approach in which the random genealogy of the sample is rst generated and then mutations are randomly place on the genealogy (Kingman, 1982; Hudson, 1990; Nordborg, 2001). The usual small sample approximations of the coalescent are used. An infinitesites model of mutation is assumed, and thus multiple-hits and back mutations do not occur. However, when used in conjunction with other programs, finite-site mutation models or micro-satellite models can be studied. For example, the gene trees themselves can be output, and these gene trees can be used as input to other programs which will evolve the sequences under a variety of finite-site models. These are described later. The program is intended to run on Unix, or Unix-like operating systems, such as Linux or MacOsX. The next section describes how to download and compile the program. The subsequent sections described how to run the program and in particular how to specify the parameter values for the simulations.

Full ms Profile

msHOT

http://home.uchicago.edu/~rhudson1/

Description

This addition to Hudson’s (2002) ms, called msHOT, allows for implementation of multiple crossover hotspots and/or multiple gene conversion hotspots in the simulated genetic region. Crossover hotspots may overlap with gene conversion hotspots, but crossover hotspots may not overlap with each other and gene conversion hotspots may not overlap with each other.

Full msHOT Profile

msms

http://www.mabs.at/ewing/msms/index.shtml

A coalescent Simlation tool with selection.

Description

This document describes how to use msms, a tool to generate sequence samples under both neutral models and a single locus selection model. msms permits the full range of demographic models provided by ms(Hudson, 2002). In partic-ular, it allows for multiple demes with arbitrary migration patterns, population growth and decay in each deme, and for population splits and mergers. Selection (including dominance) can depend on the deme and also change with time. The program is designed to be command line compatible to ms, however no prior knowledge of ms is assumed for this document. Applications of this program include power studies, analytical comparisons, approximated Bayesian computation among many others. Because most applications require the generation of a large number of independent replicates, the code is designed to be efficient and fast. For the neutral case, it is comparable to ms and even faster for large recombination rates. For selection, the performance is only slightly slower, making this one of the fastest tools for simulation with selection. The program has been developed with a wide number of possible operating systems and hardware in mind. For this reason, the code has been developed in Java and can run on any hardware that supports Java 1.6. This includes Mac OS X, all current versions of MS Windows, and most Unix flavors (Linux, Sun, BSD). The Java programing language is also popular and widely known which should facilitate the writing of extensions for the program.

Full msms Profile

msnsam

https://github.com/rossibarra/msnsam

Hudson's ms with variable sample size across loci

Description

This version is the October 2007 version of the ms code with the added ability to include the number of samples (nsam) as a tbs argument. Please see Hudson's website for details on ms as well as installation instructions, but please email me for questions or bug reports, as bugs are likely mine and NOT part of the original code. The primary motivation for this is to allow efficient simulation of datasets with unequal sampling across loci. Running this version of ms using sample size as a tbs argument appears to be much faster than running an independent ms run for each of many loci.

Full msnsam Profile

msprime

https://pypi.python.org/pypi/msprime

A fast and accurate coalescent simulator.

Description

Msprime is a reimplementation of Hudson’s classical ms program for modern datasets.

Full msprime Profile

Mutation-Simulator

https://github.com/mkpython3/Mutation-Simulator

Mutation-Simulator: fine-grained simulation of random mutations in any genome

Description

Mutation-Simulator allows the introduction of various types of sequence alterations in reference sequences, with reasonable compute-time even for large eukaryotic genomes. Its intuitive system for fine-grained control over mutation rates along the sequence enables the mimicking of natural mutation patterns. Using standard file formats for input and output data, it can easily be integrated into any development and benchmarking workflow for high-throughput sequencing applications.

Full Mutation-Simulator Profile

MySSP

http://www.rosenberglab.net/software.html

A program for the simulation of DNA sequence evolution across a phylogenetic tree

Description

MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package.

Full MySSP Profile

nanosim

https://github.com/bcgsc/NanoSim

Nanopore sequence read simulator

Description

NanoSim is a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology.

Full nanosim Profile

NEAT

https://github.com/zstephens/neat-genreads

NEAT read simulation tools

Description

NEAT-genReads is a fine-grained read simulator. GenReads simulates real-looking data using models learned from specific datasets. There are several supporting utilities for generating models used for simulation.

Full NEAT Profile

Nemo

https://nemo2.sourceforge.io/

A forward-time, individual-based, genetically explicit, and stochastic simulation program designed to study the evolution of genetic markers, life history traits, and phenotypic traits in a flexible (meta-)population framework.

Description

Nemo implements many different life cycles and evolvable traits with a large variety of genetic architectures. Species interaction between a parasite and its host can also be modeled (i.e., Cytoplasmic-Incompatibility inducing endosymbiont: Wolbachia). All this is framed within a flexible metapopulation model that allows for patch-specific carrying capacities, dispersal rates (dispersal matrices), stochastic extinction/harvesting rates, and demographic stochasticity. Populations can be dynamically modified during a simulation, allowing for population bottlenecks, patch fusion/fission, population expansion, etc. Spatially heterogeneous selection on quantitative traits can also be modeled. Nemo's interface is a simple text file containing the simulation parameters. Large batches of simulations can be run from a single parameter file with multiple parameter values. Many complex evolutionary and demographic scenarios can be modeled easily by providing temporally varying parameter values.

Full Nemo Profile

NeSSM

http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php

A Next-Generation Sequencing Simulator for Metagenomics

Description

NeSSM is a tool to generate Next-Generation Sequencing (NGS) reads with parameters set by users. The goal of NeSSM is to generate metagenome sequencing reads close to the reality. Currently, 454, Illumina sequencing platforms are supported. It can help develop methods or systems for metagenomics analysis.

Full NeSSM Profile

NetRecodon

http://code.google.com/p/netrecodon/

Coalescent simulation of coding DNA sequences with recombination (inter and intracodon), migration and demography

Description

NetRecodon is a population genetic simulator that generates samples of nucleotide and codon sequences from haploid/diploid populations with inter and intracodon recombination, migration, growth and dated tips. It can also run in several processors using MPI. Operative systems Source code and a makefile are provided for compilation in any OS with a C compiler, along with some compiled executables.

Full NetRecodon Profile

NPBSS

https://github.com/NWPU-903PR/NPBSS_Octave

PacBio sequencing simulator

Description

By analyzing the characteristic features of CLR data from PacBio SMRT (single molecule real time) sequencing, we developed a new PacBio sequencing simulator (called NPBSS) for producing CLR reads. NPBSS simulator firstly samples the read sequences according to the read length logarithmic normal distribution, and choses different base quality values with different proportions. Then, NPBSS computes the overall error probability of each base in the read sequence with an empirical model, and calculates the deletion, substitution and insertion probabilities with the overall error probability to generate the PacBio CLR reads. Alignment results demonstrate that NPBSS fits the error rate of the PacBio CLR reads better than PBSIM and FASTQSim. In addition, the assembly results also show that simulated sequences of NPBSS are more like real PacBio CLR data.

Full NPBSS Profile

OmicsSIMLA

A simulation tool for generating multi-omics data with disease status

Description

OmicsSIMLA is a simulation tool for generating multi-omics data with disease status. Currently, OmicsSIMLA has four main modules: SeqSIMLA, pWGBSSimla, RNA-Seq, and RPPA. SeqSIMLA can simulate sequence data in families with multiple affected and unaffected siblings or unrelated case-control samples under different disease models. pWGBSSimla is a profile-based whole-genome bisulphite sequencing data simulator, which can simulate whole-genome DNA methylation (WGBS), reduced representation bisulfite sequencing (RRBS), and oxidative bisulfite sequencing (oxBS-seq) data while modeling methylation quantitative trait loci, allele-specific methylations, and differentially methylated regions. RNA-Seq uses a negative binomial distribution to simulate NGS read counts for gene expression. Finally, RPPA uses a mass-action kinetic action model to simulate protein expression data.

Full OmicsSIMLA Profile

OncoSimulR

https://github.com/rdiaz02/OncoSimul

BioConductor package for Forward Genetic Simulation of Cancer Progresion with Epistasis

Description

An R/BioConductor package that provides functions for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulations use continuous-time models and can include driver and passenger genes and modules. Also included are functions for simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, and additive models) and plotting them.

Full OncoSimulR Profile

PARA-suite

https://github.com/akloetgen/PARA-suite

PAR-CLIP specific sequence read simulation and processing

Description

PAR-CLIP Analyzing suite. Useful tools for short and error prone sequence read handling. Note, that the PARA-suite addon of the Burrows-Wheeler Aligner (BWA) is necessary for the mapping tool of the PARA-suite.

Full PARA-suite Profile

PaSS

https://cgm.sjtu.edu.cn/PaSS/

PaSS is an effective sequence simulator for PacBio sequencing

Description

PacBio Sequencing Simulator (PaSS) can learn sequence patterns from PacBio sequencing data currently available. In addition to the distribution of read lengths and error rates, we included a context-specific sequencing error model. Compared to existing PacBio sequencing simulators such as PBSIM, LongISLND and NPBSS, PaSS performed better in many aspects. Assembly tests also suggest that reads simulated by PaSS are the most similar to experimental sequencing data.

Full PaSS Profile

PBSIM2

https://github.com/yukiteruono/pbsim2

a simulator for long-read sequencers with a novel generative model of quality scores

Description

PacBio sequencers produced two types of characteristic reads: CCS (short and low error rate) and CLR (long and high error rate), both of which could be useful for de novo assembly of genomes. PBSIM simulates those PacBio reads by using either a model-based or sampling-based simulation.

Full PBSIM2 Profile

PEDAGOG

https://bcrc.bio.umass.edu/pedigreesoftware/node/5

Software for simulating eco-evolutionary population dynamics

Description

PEDAGOG is a Windows program that simulates population dynamics at the individual level, allows for heritability and selection of traits, records individual genotype and pedigree information, and allows for several types of errors to manifest in the output which can be formatted for 57 existing software programs. In all, parameters can be specified for genetics, demographics, mating strategy, mutations and genetic/demographic errors, growth models, heritability and selection, and output. Demographic parameters can be either age or size based, and all parameters can be drawn from twelve statistical distributions where appropriate.

Full PEDAGOG Profile

pg-gan

https://github.com/mathiesonlab/pg-gan

create realistic simulated data that matches real population genetic data.

Description

This software can be used to create realistic simulated data that matches real population genetic data. It implements a GAN-based algorithm (Generative Adversarial Network).

Full pg-gan Profile

PGsim

https://github.com/lrjuan/PGsim

A Comprehensive and Highly Customizable Personal Genome Simulator

Description

we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed.

Full PGsim Profile

phastSim

https://github.com/NicolaDM/phastSim

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets

Description

We present phastSim, a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.

Full phastSim Profile

phenosim

http://evoplant.uni-hohenheim.de/downloads/

A tool to add phenotypes to simulated genotypes

Description

phenosim reads the output of commonly used coalescent simulators and simulates a phenotype based on a user-defined trait model for each individual. The simulated data can be used to assess the influence of various factors such as demography, genetic architecture or selection on the statistical power of association methods to detect causal genetic variants under a wide variety of population genetic scenarios.

Full phenosim Profile

PhenotypeSimulator

https://github.com/HannahVMeyer/PhenotypeSimulator

flexible simulation of phenotypes from different genetic and non-genetic (noise) components.

Description

PhenotypeSimulator allows for the simulation of complex phenotypes under different models, including genetic variant effects and infinitesimal genetic effects (reflecting population structure) as well as correlated, non-genetic covariates and observational noise effects. Different phenotypic effects can be combined into a final phenotype while controlling for the proportion of variance explained by each of the components. For each component, the number of variables, their distribution and the design of their effect across traits can be customised.

Full PhenotypeSimulator Profile

phylodyn

https://github.com/mdkarcher/phylodyn

Phylodyn facilitates phylodynamic inference and analysis in an approachable R package.

Description

Phylon is an r package for phylodynamic analysis based on gene genealogies. The package applies Bayesian nonparametric estimation for population size fluctuations over time. The software includes Markov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference. The genealogical data describes the timed ancestral relationships of individuals sampled from a population of interest. The individuals within the software are simulated according to isochronous sampling or heterochronous sampling. The purpose of phylodyn is to fascilitate phylodynamic inference and analysis in an approachable R package.

Full phylodyn Profile

PhyloSim

http://www.ebi.ac.uk/goldman-srv/phylosim/

An R package for the Monte Carlo simulation of sequence evolution

Description

PhyloSim is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, PhyloSim can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing PhyloSim to be adapted to specific needs. Key features of PhyloSim include 1) Simulation of the evolution of a set of discrete characters with arbitrary states evolving by a continuous-time Markov process with an arbitrary rate matrix. 2) Explicit implementations of the most popular substitution models (nucleotide, amino acid and codon substitution models). 3) Simulation under the popular models of among-sites rate variation, like the gamma (+G) and invariant sites plus gamma (+I+G) models. 4) The possibility to simulate under arbitrarily complex patterns of among-sites rate variation by setting the site specific rates according to any R expression. 5) ... please refer to our documentation for details.

Full PhyloSim Profile

piBUSS

https://rega.kuleuven.be/cev/ecv/software/pibuss

a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios

Description

πBUSS is a BEAST/BEAGLE utility for sequence simulation, which provides an easy to use interface that allows flexible and extensible phylogenetic data fabrication, delegating computationally intensive tasks to the BEAGLE library and thus making full use of multi-core architectures.

Full piBUSS Profile

pIRS

https://code.google.com/p/pirs/

Profile-based Illumina pair-end reads simulator

Description

It simulates Illumina reads with empirical Base-Calling and GC%-depth profiles trained from real re-sequencing data. It considers error & quality distributions, as well as coverage bias patterns. In addition, pIRS also comes with a tool to simulate the heterozygous diploid genomes.

Full pIRS Profile

Polyester

http://bioconductor.org/packages/release/bioc/html/polyester.html

simulating RNA-seq datasets with differential transcript expression

Description

simulate RNA-seq reads from differential expression experiments with replicates. The reads can then be aligned and used to perform comparisons of methods for differential expression.

Full Polyester Profile

POWSC

https://github.com/suke18/POWSC

POWSC is a computational tool that is used for power evaluation and sample size estimation in scRNA-seq.

Description

POWSC is an R package designed for sc-RNA-seq. The software plays three roles: parameter estimator, data simulator, and power assessor. As a parameter estimator, POWSC accurately captures the characterized parameters for any specific cell type from expression data. The simulator generates synthetic data based on a rigorous simulation mechanism that includes zero expression values. POWSC also performs comprehensive power analysis and reports stratified target powers for DE genes.

Full POWSC Profile

powsimR

https://github.com/bvieth/powsimR

powsimR assess power and sample size requirements for differential expression (DE) analysis of single cell and bulk RNA-seq experiments.

Description

powsimR assess power and sample size requirements for differential expression (DE) analysis of single cell and bulk RNA-seq experiments. The number of replicates required to achieve the desired statistical power is determined by technical noise and biological variability. Both of these variables are considerably larger if the biological replicates are single cells. powsimR can not only estimate sample sizes necessary to achieve a certain power, but also informs about the power to detect DE in a data set at hand.

Full powsimR Profile

PReFerSim

https://github.com/LohmuellerLab/PReFerSim

PReFerSim is an ANSI C program that performs forward simulations under the PRF model.

Description

PReFerSim is an ANSI C program that performs forward simulations under the PRF model. PReFerSim models changes in population size, inbreeding, dominance, and distributions of selective effects. PReFerSim allows the tracking of summaries for genetic variations over time along with the output trajectories of selected alleles.

Full PReFerSim Profile

PROSSTT

https://github.com/soedinglab/prosstt

PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes

Description

PROSSTT (PRObabilistic Simulations of ScRNA-seq Tree-like Topologies) is a package with code for the simulation of scRNAseq data for dynamic processes such as cell differentiation. PROSSTT is open source GPL-licensed software implemented in Python. Single-cell RNAseq is revolutionizing cellular biology, and many algorithms are developed for the analysis of scRNAseq data. PROSSTT provides an easy way to test the performance of trajectory inference methods on realistic data with a known "gold standard". The algorithm can produce datasets with user-defined topologies while simulating any number of sampled cells and genes.

Full PROSSTT Profile

ProteinEvolver

http://code.google.com/p/proteinevolver/

Simulation of protein evolution along phylogenies under structure-based substitution models

Description

ProteinEvolver generates samples of protein-coding genes and protein sequences evolved along phylogenies under structure-based substitution models. These models consider the protein structure to evaluate candidate mutations, which can be accepted (substitutions) or rejected depending on the energy of the protein structure of the mutated sequence. The simulation of molecular evolution occurs along phylogenetic histories, which can be either user-specified or simulated by the coalescent modified with recombination (including recombination hotspots), migration, demographics and longitudinal sampling.

Full ProteinEvolver Profile

pSBVB

https://github.com/lauzingaretti/pSBVB

Polyploid sequence based virtual breeding (pSBVB) is a modification of SBVB software that allows simulating traits of an arbitrary genetic complexity in polyploids.

Description

pSBVB is a modification of SBVB software that simulates traits of an arbitrary genetic complexity in polyploids. pSBVB simulates complex traits and genotype data starting with a vcf file that contains the genotypes of founder individuals and follows a given pedigree. The main output is the genotypes of all individuals in the pedigree and/or molecular relationship matrices (GRM) using all sequence or a series of SNP lists, together with phenotype data. The program implements very efficient algorithms where only the recombination breakpoints for each individual are stored, therefore allowing the simulation of thousands of individuals very quickly.

Full pSBVB Profile

pWGBSSimla

https://omicssimla.sourceforge.io/index.html

a profile-based whole-genome bisulphite sequencing data simulator

Description

a profile-based whole-genome bisulphite sequencing data simulator

Full pWGBSSimla Profile

Pysim-sv

https://github.com/xyc0813/pysim/

Pysim-sv: a package for simulating structural variation data with GC-biases

Description

Pysim-sv is a package for simulating HTS data to evaluate performance of SV detection algorithms. Pysim-sv can introduce a wide spectrum of germline and somatic genomic variations. The package contains functionalities to simulate tumor data with aneuploidy and heterogeneous subclones, which is very useful in assessing algorithm performance in tumor studies. Furthermore, Pysim-sv can introduce GC-bias, the most important and prevalent bias in HTS data, in the simulated HTS data.

Full Pysim-sv Profile

Pyvolve

https://github.com/sjspielman/pyvolve

A Flexible Python Module for Simulating Sequences along Phylogenies

Description

Pyvolve is an open-source Python module for simulating sequences along a phylogenetic tree according to continuous-time Markov models of sequence evolution.

Full Pyvolve Profile

QMSim

http://www.aps.uoguelph.ca/~msargol/qmsim/

QTL and Marker Simulator

Description

Linkage disequilibrium (LD) and linkage analyses have been used extensively to identify quantitative trait loci (QTL) in human and livestock. Owing to the recent developments in genotyping technologies, dense marker maps are now available for several livestock species. Even though genotyping costs have substantially declined, large scale genome-wide association studies are still costly. For this reason many studies in livestock suffer from small sample size or from low density of markers. However, simulation is a highly valuable tool for assessing and validating new proposed methods for association studies at very low cost. During the last few decades, simulation has played a major role in answering a wide variety of questions in genomics. Several software have been developed for simulating genomes especially in human research. However most of the developed software tools do not provide functionality required for many of the applications in livestock. QMSim was developed to simulate large scale genomic data in livestock populations. QMSim is a family based simulator, which can also take into account predefined evolutionary features, such as LD, mutation, bottlenecks and expansions. The simulation is basically carried out in two steps: In the first step, a historical population is simulated to establish mutation-drift equilibrium and, in the second step, recent population structures are generated, which can be complex.

Full QMSim Profile

quantiNEMO

http://www2.unil.ch/popgen/softwares/quantinemo/

An individual-based program for the analysis of quantitative traits with explicit genetic architecture potentially under selection in a structured population

Description

quantiNEMO is an individual-based, genetically explicit stochastic simulation program. It was developed to investigate the effects of selection, mutation, recombination, and drift on quantitative traits with varying architectures in structured populations connected by migration and located in a heterogeneous habitat. quantiNEMO is highly flexible at various levels: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, mating system, etc. quantiNEMO is a console program, and is coded in standard C++ using an object oriented approach, runs on any computer platform, and is distributed under an open source license.

Full quantiNEMO Profile

quantinemo 2

https://www2.unil.ch/popgen/softwares/quantinemo/

A swiss knife to simulate complex demographic and genetic scenarios, forward and backward in time.

Description

QuantiNemo 2 is a stochastic simulation program for quantitative population genetics. It was developed to investigate the effects of selection, mutation, recombination and drift on quantitative traits and neutral markers in structured populations connected by migration and located in heterogeneous habitats. A specific feature is that it allows to switch between an individual-based full-featured mode and a population-based, faster mode. Several demographic, genetic and selective parameters can be finetuned in QuantiNemo 2: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, and mating system are the main features.

Full quantinemo 2 Profile

readsim

https://sourceforge.net/projects/readsim/

Simple reads simulator for pacbio & nanopore

Description

Simple reads simulator for pacbio & nanopore

Full readsim Profile

RECOAL

https://github.com/cjkang/RECOAL

Simulates new haplotype data from a reference population of haplotypes.

Description

RECOAL simulates new haplotype data from a reference population of haplotypes. A coalescent genealogy for the reference haplotype data is sampled from the appropriate posterior probability distribution, then a coalescent genealogy is simulated which extends the sampled genealogy to include new haplotype data. The new haplotype data will therefore contain both some of the existing polymorphic sites and new polymorphisms added based on the structure of the simulated coalescent genealogy.

Full RECOAL Profile

Recodon

https://github.com/MiguelArenas/recodon

Coalescent simulation of coding DNA sequences with recombination, migration and demography

Description

Recodon can simulate samples of coding DNA sequences under complex scenarios in which several evolutionary forces can interact simultaneously (namely, recombination, migration and demography). The basic codon model implemented is an extension to the general time-reversible model of nucleotide substitution with a proportion of invariable sites and among-site rate variation. In addition, the program implements non-reversible processes and mixtures of different codon models.

Full Recodon Profile

REGENS

https://github.com/EpistasisLab/regens/tags

Simulates whole autosomes from real genomic segments in a way that preserves the input autosomes' linkage disequilibrium (LD) pattern.

Description

REGENS (REcombinatory Genome ENumeration of Subpopulations) is an open-source Python package that simulates whole genomes from real genomic segments. REGENS recombines these segments in a way that simulates completely new individuals while simultaneously preserving the input genomes' linkage disequilibrium (LD) pattern with extremely high fidelity. It takes plink (bed, bim, fam) file sets of existing genotype data as input and produces new (bed, bim, fam) file sets as output. REGENS can also simulate mono-allelic and epistatic single nucleotide variant (SNV) effects on a continuous or binary phenotype without perturbing the simulated LD pattern. REGENS was measured to be 88.5 times faster and require 6.2 times lower peak RAM on average than a similar algorithm called Triadsim. Our publication (https://doi.org/10.21105/joss.02743) and supplementary repository (https://github.com/EpistasisLab/regens-analysis) both contain more technical details. See our REGENS repository (REGENS repository (https://github.com/EpistasisLab/regens) for the source code, as well as detailed instructions and examples.

Full REGENS Profile

ReSeq

https://github.com/schmeing/ReSeq

ReSeq simulates realistic Illumina high-throughput sequencing data

Description

Real Sequence Reproducer shortens the gap between simulated and real data evaluations by adequately reproducing key statistics of real data, like the coverage profile, systematic errors and the k-mer spectrum. When these characteristics are translated into new synthetic computational experiments (i.e. simulated data), the performance can be more accurately estimated. Combining our simulator and real data gives two valuable perspectives on the performance of tools to minimize biases.

Full ReSeq Profile

REvolver

http://www.cibiv.at/software/revolver/

Modeling sequence evolution under domain constraints

Description

REvolver is a program to simulate protein sequence evolution. REvolver automatically integrates domain information described by a profile Hidden Markov Model (pHMM) into the simulation. In the simulation of protein evolution it often had been assumed that sites evolve identically and independently from each other. This simplification is necessary since information concerning site specific evolution is frequently unavailable. However, homologous sequences and domains have been collected, aligned, and pHMMs built. The pHMM describes the variability and shared characteristics of sequences that share a common ancestor. Here we do have knowledge about what sites are conserved, at what positions in the sequences insertions are more likely, or what sites can be deleted. Pfam (Finn et al., 2010) and SMART (Letunic, Doerks and Bork, 2009) are examples for databases providing such data. REvolver is the first method, for simulating protein sequence evolution that integrates this pre-existing information about evolution in an automatic fashion.

Full REvolver Profile

rlsim

http://bit.ly/rlsim-git

A package for simulating RNA-seq library preparation with parameter estimation

Description

The rlsim package is a collection of tools for simulating RNA-seq library construction, aiming to reproduce the most important factors which are known to introduce significant biases in the currently used protocols: hexamer priming, PCR amplification and size selection. It allows for a systematic exploration of the effects of the individual biasing factors and their interactions on downstream applications by simulating data under a variety of parameter sets. The implicit simulation model implemented in the main tool (rlsim) is inspired by the actual library preparation protocols and it is more general than the models used by the bias correction methods hence it allows for a fair assessment of their performance. Although the simulation model was kept as simple as possible in order to aid usability, it still has too many parameters to be inferred from data produced by standard RNA-seq experiments. However, simulating datasets with properties similar to specific datasets is often useful. To address this, the package provides a tool (effest) implementing simple approaches for estimating the parameters which can be recovered from standard RNA-seq data (GC-dependent amplification efficiencies, fragment size distribution, relative expression levels).

Full rlsim Profile

Rmetasim

http://cran.r-project.org/web/packages/rmetasim/index.html

Rmetasim is a front-end for the metasim engine that is implemented as a package that runs in the statistical computing environment R

Description

Rmetasim provides a flexible environment in which to perform individual-based population genetic simulations. A wide range of landscape-level dynamics, population structures, and within-population demographies can be represented using the framework implemented in this software. In addition, temporal variation in all demographic characteristics can be simulated, both deterministically and stochastically. Such simulations can be used to produce null distributions of genotypes under realistic conditions. These genotypic data can then be used by a variety of analytical programs to develop null expectations of any population genetic statistic estimated from genotypic data.

Full Rmetasim Profile

RNA Seq Simulator

https://github.com/HuntsmanCancerInstitute/USeq

RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.

Description

RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.

Full RNA Seq Simulator Profile

Rose

http://bibiserv.techfak.uni-bielefeld.de/rose/

Random model of sequence evolution

Description

Rose implements a new probabilistic model of the evolution of RNA-, DNA-, or protein-like sequences. Guided by an evolutionary tree, a family of related sequences is created from a common ancestor sequence by insertion, deletion and substitution of characters. During this artificial evolutionary process, the `true' history is logged and the `correct' multiple sequence alignment is created simultaneously. The model also allows for varying rates of mutation within the sequences, making it possible to establish so-called sequence motifs. The data created by Rose are suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships. It can also be useful when teaching courses in or developing models of sequence evolution and in the study of evolutionary processes.

Full Rose Profile

RSVSim

https://bioconductor.org/packages/release/bioc/html/RSVSim.html

an R/Bioconductor package for the simulation of structural variations

Description

RSVSim is a tool for the simulation of deletions, insertions, inversions, tandem duplications and translocations of various sizes in any genome available as FASTA-file or data package in R. The structural variations can be generated randomly, based on user-supplied genomic coordinates or associated to various kinds of repeats. The package further comprises functions to estimate the distribution of structural variation sizes from real datasets.

Full RSVSim Profile

santa-sim

https://github.com/santa-dev/santa-sim

SANTA simulates the evolution of gene sequences.

Description

SANTA is JAVA software application that simulates the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components; replication (including recombination), mutation (including indels), fitness and selection. SANTA is easy to use and is well-suited to simulate pathogen evolution according to different scenarios.

Full santa-sim Profile

scDesign

https://github.com/Vivianstats/scDesign

scDesign assess scRNA-seq experimental design in the context of differential gene expression analysis.

Description

scDesign quantitatively assesses scRNA-seq experimental design. The software also assists in computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. scDesign is reproducible across biological replicates and independent studies.

Full scDesign Profile

scDesign3

https://github.com/SONGDONGYUAN1994/scDesign3

scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics

Description

The R package scDesign3 is an all-in-one single-cell data simulation tool by using reference datasets with different cell states (cell types, trajectories or and spatial coordinates), different modalities (gene expression, chromatin accessibility, protein abundance, DNA methylation, etc), and complex experimental designs. The transparent parameters enable users to alter models as needed; the model evaluation metrics (AIC, BIC) and convenient visualization function help users select models

Full scDesign3 Profile

cscGAN

https://github.com/imsb-uke/scGAN

cscGAN learns non-linear gene-gene dependencies from cell type samples in order to generate realistic cells of defined types.

Description

cscGAN learns non-linear gene-gene dependencies from cell type samples in order to generate realistic cells of definitely types. Augmenting sparse cell populations improves the detection of marker genes, the robustness of and reliability of classifiers as well as the assessment of novel analysis algorithms.

Full cscGAN Profile

scrm

https://scrm.github.io/

A coalescent simulator optimized for long sequences and large samples.

Description

The Sequential Coalescent with Recombination Model (SCRM) is a new method that efficiently and accurately approximates the coalescent with recombination. It closes the gap between current approximations and the exact model and can be used to simulate genomic-scale data sets with an essentially correct linkage structure. The efficient C++ implementation scrm is available for all major platforms and as an R package on CRAN.

Full scrm Profile

SCSilicon

https://github.com/xikanfeng2/SCSilicon

SCSilicon: a tool for synthetic single-cell DNA sequencing data generation

Description

SCSilicon efficiently generates single-cell in silicon DNA reads with minimum manual intervention. SCSilicon first creates the genome sequence (FASTA file) for each single-cell by automatically simulating a collection of genomic aberrations, including SNP, SNV, Indel, and CNV. Likewise, SCSilicon yields the ground truth of CNV segmentation breakpoints and subclone cell labels. Then, SCSilicon amplifies the genome and generates FASTQ reads. We have manually inspected a series of synthetic variations (SNP, SNV, Indel, and CNV breakpoint) generated by SCSilicon, and evaluated three start-of-the-art single-cell CNV callers.

Full SCSilicon Profile

SCSIM

https://github.com/flahertylab/scsim

SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data

Description

SCSIM simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.

Full SCSIM Profile

SECNVs

https://github.com/YJulyXing/SECNVs

SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes

Description

SECNVs (Simulator of Exome Copy Number Variants) is a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants.

Full SECNVs Profile

SELECTOR

https://ua.unige.ch/en/agp/outils/selector/

SELECTOR is a program to simulate lineages under selection in a spatially-explicit population framework, written in C++ and running under MS windows and linux.

Description

SELECTOR investigates the evolution of multi allelic genes under balancing or positive selection while also simulating the complex evolutionary scenarios that integrate demographic growth and migration in a spatially explicit population framework. The parameters can be varied in both space and time in order to account for geographical, environmental and cultural heterogeneity. The software can be used to investigate genetic differentiation of loci under balancing selection in interconnected demes with spatially heterogeneous gene flow. SELECTOR is intended to be used for building insight into human settlement history and evolution.

Full SELECTOR Profile

SelSim

https://github.com/trvrb/selsim

population genetic simulation (Not SelSim from Spencer & Coop 2004, which is currently unavailable)

Description

With selsim, an evolving population of sequences is simulated according to a haploid Wright-Fisher model with discrete generations. This uses a Jukes-Cantor mutation model with a specified mutation rate. In each subsequent generation, the population is reconstituted by sampling sequences with replacement proportional to their frequency multiplied by their fitness. Mutations can be advantageous or deleterious and affect fitness in a multiplicative fashion (additive on a log-scale). Sequences are sampled at random time points after a period of burn-in.

Full SelSim Profile

SELVa

https://github.com/bazykinlab/SELVa

Simulator of evolution with landscape variation

Description

SELVa is a simulator of sequence evolution that allows the fitness landscape to vary according to user-specified rules. It is geared towards exploring the effects of landscape change on molecular sequence evolution. SELVa has a variety of options for specifying the rules of landscape change, allowing the user to tailor the simulation to his or her needs and to explore various evolutionary scenarios.

Full SELVa Profile

Seq-Gen

http://tree.bio.ed.ac.uk/software/seqgen/

An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees.

Description

Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution.

Full Seq-Gen Profile

SeqNet

https://github.com/tgrimes/SeqNet

An R package for simulating RNA-seq counts from gene-gene association networks.

Description

Methods to generate random gene-gene association networks and simulate RNA-seq data from them, as described in Grimes and Datta (2021) . Includes functions to generate random networks of any size and perturb them to obtain differential networks. Network objects are built from individual, overlapping modules that represent pathways. The resulting network has various topological properties that are characteristic of gene regulatory networks. RNA-seq data can be generated such that the association among gene expression profiles reflect the underlying network. A reference RNA-seq dataset can be provided to model realistic marginal distributions. Plotting functions are available to visualize a network, compare two networks, and compare the expression of two genes across multiple networks.

Full SeqNet Profile

SEQPower

http://bioinformatics.org/spower/

Statistical power analysis for sequence-based association studies

Description

SEQPower is a software to simulate rare variants data associated with complex traits and to perform power and sample size estimation for sequence based association studies. It features on analytic sample size estimates, power comparison of rare variant association methods as well as validation and evaluation of novel association tests under various study designs.

Full SEQPower Profile

SeqSIMLA

http://seqsimla.sourceforge.net/

SeqSIMLA can simulate sequence data with user-specified disease and quantitative trait models. Family or unrelated case-control data can be simulated.

Description

SeqSIMLA can simulate sequence data in families with multiple affected and unaffected siblings or unrelated case-control data under different disease models. SeqSIMLA accepts a population of sequences generated by other sequence generators. We implemented two disease models, in which the user can flexibly specify the number of disease loci, effect sizes or population attributable risk, disease prevalence, and risk or protective loci. We also implemented a quantitative trait model, in which the user can specify the number of quantitative trait loci (QTL), proportions of variance explained by the QTL, and genetic models. In 2014, we extended SeqSIMLA to create SeqSIMLA2, which can simulate correlated traits and considers the shared environmental effects. SeqSIMLA2 can also simulate prespecified large pedigree structures. There are no restrictions on the number of individuals that can be simulated in a pedigree. In 2015, we implemented SeqSIMLA2_exact, which can simulate sequences with multiple disease sites in large pedigrees with given disease status for each pedigree member, assuming that the disease prevalence is low.

Full SeqSIMLA Profile

SERGIO

https://github.com/PayamDiba/SERGIO

Description

Sergio is a simulator for single-cell gene expression data that models the stochastic nature of the transcription and regulation of genes via transcription factors according to a user-provided gene regulatory network. The package can simulate cell types in steady states or cells differentiating to multiple fates. The datasets generated by SERGIO are statistically comparable to experimental data generated by Illumina HiSeq200, Drop-Seq, Illumina 10x chromium and Smart-seq

Full SERGIO Profile

Serial NetEvolve

https://biorg.cis.fiu.edu/SNE/index.htm

A flexible utility for generating serially-sampled sequences along a tree or recombinant network

Description

Serial NetEvolve is a modification of the Treevolve program in which serially sampled sequences are evolved along a randomly generated coalescent tree or network (Grassly et al. 1999; Hudson 1983; Kingman 1982) . Treevolve offers a variety of evolutionary model and population parameters including a rate of recombination and as such it was chosen over other programs to be adapted for the simulation of serially sampled data. The new features include the choice of either a clock-like model of evolution or a variable rate of evolution, simulation of serial samples and the output of the randomly generated tree or network in Newick format or in our newly formulated NeTwick format.

Full Serial NetEvolve Profile

SFS_CODE

http://sfscode.sourceforge.net/SFS_CODE/index/index.html

SFS_CODE can perform forward population genetic simulations under a general Wright-Fisher model with arbitrary migration, demographic, selective, and mutational effects.

Description

SFS_CODE (Selection on Finite Sites under COmplex Demographic Events) performs forward population genetic simulations under a general Wright-Fisher model with arbitrary demographic, selective, and mutational effects.

Full SFS_CODE Profile

SIApopr

https://github.com/olliemcdonald/siapopr#readme

Siapopr is an R package that wraps the C++ functions SIApop. These functions simulate birth-death-mutation processes with mutations having random fitnesses to simulate clonal evolution.

Description

Siapopr is an R package that wraps the C++ functions SIApop. These functions simulate birth-death-mutation processes with mutations having random fitnesses to simulate clonal evolution.

Full SIApopr Profile

SIBSIM

http://sourceforge.net/projects/sibsim/

Quantitative phenotype simulation in extended pedigrees

Description

SIBSIM is a modern and powerful computer program to simulate genotype and quantitative trait data in extended pedigrees. In the current release (2.1.2), we put emphasis on the simulation of a quantitative trait in pedigrees of arbitrary size without monozygotic twins. Well known software as, e.g., the SIMULATE package are not as scalable as SIBSIM. As an advantage over both G.A.S.P. and SIMLA no predefined boundaries restrict SIBSIM in its potential, neither in genome nor in family size. Instead, SIBSIM is as highly scalable as possible to meet any needs. SIBSIM may not only be used in simulation studies, but also in the validation, verification and testing process of other applications which deal with the implementation of statistical analysis of genomic data. We successfully used SIBSIM in the latter respect and detected a bug in a widely used genetic epidemiological software package.

Full SIBSIM Profile

sim1000G

https://github.com/adimitromanolakis/sim1000G

sim1000G integrates fully with R and can simulate existing variation from a single VCF file. In addition it can also simulate arbitrary pedigrees.

Description

We develop a new user-friendly and integrated R package, sim1000G, which simulates genomic regions for unrelated individuals or for families. Only a single input of raw phased Variant Call Format (VCF) file is needed. Haplotypes are extracted to compute linkage disequilibrium in the simulated region and then for the generation of new genotype data for unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Arbitrary pedigree sizes are generated by modeling recombination events within sim1000G. Various simulation scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation family data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need for any tuning parameters.

Full sim1000G Profile

sim3C

https://github.com/cerebis/sim3C

Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies

Description

We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error.

Full sim3C Profile

SimAdapt

https://www.openabm.org/model/3137

A spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton.

Description

SimAdapt is a spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton to represent evolutionary processes of adaptation and population dynamics in changing landscapes, using the NetLogo environment.

Full SimAdapt Profile

SimBA

https://github.com/ComputationalGenomics/SimBA

SimBA is a non-generative approach to population simulations based on a combination of stochastic techniques and discrete methods.

Description

SimBA is a non-generative approach to population simulations, based on a combination of stochastic techniques and discrete methods. The package contains a hill climbing algorithm and multiple subpopulation structures. SimBA is very sensitive to the input specifications, i.e., very similar but distinct input characteristics result in distinct outputs with high fidelity to the specified distributions. This property of the simulation is not explicitly modeled or studied by previous methods.

Full SimBA Profile

SimBit

https://github.com/RemiMattheyDoret/SimBit

Simbit is an all-purpose, high-performance forward-in-time population genetics simulator.

Description

Simbit is an all-purpose, high-performance forward-in-time population genetics simulator. The software is capable of simulating selection scenarios, demographic scenarios, and mating systems. Simbit can also simulate multiple species along with their ecological relationships. The package comes with an R wrapper that simplifies the management of research projects from the creation of a grid of parameters, and can run simulations and gather inputs for analysis.

Full SimBit Profile

SIMCOAL2

http://cmpg.unibe.ch/software/simcoal2/

A coalescent program for the simulation of complex recombination patterns over large genomic regions under various demographic models

Description

We present here SIMCOAL2, an extended version of the SIMCOAL program (Excoffier et al. 2000), to simulate the neutral genetic diversity at partially linked loci under different histories and a wide range of migration and demographic models. SIMCOAL2 includes a number of new features compared to the previous version: The possibility of arbitrary recombination rates between adjacent loci Multiple coalescent events per generation, allowing the correct simulation of very large samples and very large recombining genomic regions The simulation of SNP data with arbitrary minimum frequency, for instance to simulate ascertainment bias The output of diploid genotypic data generated under the assumption of Hardy-Weinberg equilibrium The simulation of a mixture of different data types (DNA sequence, RFLP, STR, or SNP) along a single chromosome.

Full SIMCOAL2 Profile

SimCopy

http://bit.ly/simcopy

An R package simulating the evolution of copy number profiles along a tree.

Description

SimCopy is an R package simulating the evolution of copy number profiles along a tree. It relies on the PhyloSim package for performing the simulations by encoding the genomic regions as sites in sequences and using modified processes acting on them. Please note, that the SimCopy simulations are restricted to a single chromosome. The genomes are encoded as a sequence of sites containing integers identifying genomic regions. Negative integers represent inverted genomic regions. SimCopy supports 1) deletion - deletes genomic regions, 2) duplication - duplicates genomic regions, 3) inversion - changes the orientation of the genomic regions by taking the opposite of the corresponding integer, 4) inverted duplication - duplicates genomic regions and flips their orientation and 5) translocation - translocates a stretch of genomic regions.

Full SimCopy Profile

SIMLA

http://dmpi.duke.edu/simla-simulation-software-version-32

SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies.

Description

SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies. It allows the user flexibility in specifying marker and disease placement, locus heterogeneity, disequilibrium between markers and between markers and disease loci. Output is in the form of a LINKAGE (Lathrop et al., Proc Natl Acad Sci USA 81, 1984) pedigree file and is easily utilized, either directly or with minimal reformatting, as input for various genetic analysis packages.

Full SIMLA Profile

SimLoRD

https://bitbucket.org/genomeinformatics/simlord/src/master/

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Description

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model. Reads are simulated from both strands of a provided or randomly generated reference sequence.

Full SimLoRD Profile

simNGS

http://www.ebi.ac.uk/goldman-srv/simNGS/

software for simulating observations from Illumina sequencing machines using the statistical models behind the AYB base-calling software.

Description

simNGS is software for simulating observations from Illumina sequencing machines using the statistical models behind the AYB base-calling software. By default, observations only incorporate noise due to sequencing and do not incorporate effects from more esoteric sources of noise that may be present in real data ("dust", bubbles, merged clusters, sequence-heterogeneous clusters, etc). Many of these additional sources may optionally applied. simNGS takes fasta format sequences and a file describing the covariance of noise between bases and cycles observed in an actual run of the machine, randomly generates noisy intensities representing the signals for the sequence at each cycle and calculates likelihoods for all possible base calls.

Full simNGS Profile

SimPed

http://bioinformatics.org/simped/

A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures

Description

SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.

Full SimPed Profile

SimPEL

https://github.com/precisionomics/SimPEL

SimPEL is short for Simulation-based Power Estimation for sequencing studies of Low-prevalence conditions.

Description

SimPEL is short for Simulation-based Power Estimation for sequencing studies of Low-prevalence conditions. SimPEL addresses the need for power estimation in low-prevalence condition studies, taking into account their inherently small sample sizes and analytical procedures. SimPEL integrates customizable parameters to realistically model study design outcomes and provide applicable information towards further refinement of experimental procedure. SimPEL is implemented as a function of the established JAWAMix5 tool (Long et al., 2013)⁠, an HDF5-based Java implementation for association mapping.

Full SimPEL Profile

SimPhy

https://github.com/adamallo/SimPhy

A comprehensive simulator of gene family evolution

Description

SimPhy simulates the evolution of multiple gene families under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible.

Full SimPhy Profile

Simprot

http://www.uhnresearch.ca/labs/tillier/software.htm#3

A program to simulate protein evolution by substitution, insertion and deletion

Description

Protein evolution has been largely modelled by considering the amino acid substitution process; however they have been few studies of the process of insertion and deletion. Simprot allows for several models of amino acid substitution (PAM, JTT and PMB), allows for gamma distributed sites rates according to Yang's model, and implements a parameterised Qian and Goldstein distribution model for insertion and deletion.

Full Simprot Profile

SimRare

http://code.google.com/p/simrare/

Rare variant simulation and analysis tool

Description

A program to generate and analyze sequence-based data for rare variant association studies of quantitative and qualitative traits

Full SimRare Profile

SimSeq

https://github.com/jstjohn/SimSeq

An illumina paired-end and mate-pair short read simulator.

Description

This project attempts to model as many of the quirks that exist in Illumina data as possible. Some of these quirks include the potential for chimeric reads, and non-biotinylated fragment pull down in mate-pair libraries . Additionally the program provides the ability to model both site a…

Full SimSeq Profile

simuG

https://github.com/yjx1217/simuG

simuG: a general-purpose genome simulator

Description

Simulated genomes with pre-defined or random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG as a light-weighted tool for simulating the full spectrum of genomic variants (SNPs, INDELs, CNVs, inversions, and translocations). In addition, simuG enables a rich array of fine-tuned controls, such as simulating SNPs in different coding partitions (e.g. coding sites, noncoding sites, 4-fold degenerate sites, or 2-fold degenerate sites); simulating CNVs with different formation mechanisms (e.g. segmental deletions, dispersed duplications, and tandem duplications); and simulating inversions and translocations with specific types of breakpoints. The simplicity and versatility of simuG make it a unique general purpose genome simulator for a wide-range of simulation-based applications.

Full simuG Profile

simuGWAS

https://github.com/BoPeng/simuPOP-examples/tree/master/published/simuGWAS

A forward-time simulator that simulates realistic samples for genome-wide association studies.

Description

simuGWAS evolves a population forward in time, subject to rapid population expansion, mutation, recombination and natural selection. A trajectory simulation method is used to control the allele frequency of optional disease predisposing loci. A scaling approach can be used to improve efficiency when weak, additive genetic factors are used.

Full simuGWAS Profile

simuPOP

http://simupop.sourceforge.net/

simuPOP is a general-purpose individual-based forward-time population genetics simulation environment.

Description

simuPOP is a general-purpose individual-based forward-time population genetics simulation environment. The core of simuPOP is a scripting language (Python) that provides a large number of objects and functions to manipulate populations, and a mechanism to evolve populations forward in time. Using this environment, users can create, manipulate and evolve populations interactively, or write a script and run it as a batch file. Owing to its flexible and extensible design, simuPOP can simulate large and complex evolutionary processes with ease.

Full simuPOP Profile

simuRare

https://ysph.yale.edu/c2s2/software/simurare/

Simulating realistic genomic data with rare variants

Description

simuRare is a regression-based resampling method to use real data and simulate rare variants obtained from the 1000 Genomes Project

Full simuRare Profile

SimuSCoP

https://github.com/qasimyu/simuscop

reliably simulate Illumina sequencing data based on position and context dependent profiles

Description

a novel tool to reliably Simulate Illumina Sequencing data based on position and Context dependent Profiles

Full SimuSCoP Profile

SInC

https://sourceforge.net/projects/sincsimulator/

An accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data

Description

An open-source variant simulator and read generator capable of simulating all the three common types of biological variants taking into account a distribution of base quality score from a most commonly used next-generation sequencing instrument from Illumina. SInC is capable of generating single- and paired-end reads with user-defined insert size and with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes.

Full SInC Profile

SISSI

http://www.cibiv.at/software/sissi/

A software tool to generate data of related sequences along a given phylogeny, taking into account user defined system of neighbourhoods and instantaneous rate matrices.

Description

Simulating Site-Specific Interactions (SISSI) that simulatesevolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences

Full SISSI Profile

skeleSim

https://github.com/christianparobek/skeleSim

an extensible, general framework for population genetic simulation in R

Description

skeleSim is a tool to guide users in choosing appropriate simulations, setting parameters, calculating summary genetic statistics, and organizing data output, all within the R environment. skeleSim is designed to be an extensible environment that can 'wrap' around any simulation software to increase its accessibility and use.

Full skeleSim Profile

SLiM

https://messerlab.org/slim/

A framework for implementing forward genetic simulations, including an interactive development environment and a highly flexible scripting language.

Description

SLiM is an evolutionary simulation framework that combines a powerful engine for population genetic simulations with the capability of modeling arbitrarily complex eco-evolutionary scenarios. Simulations are configured via the integrated Eidos scripting language that allows interactive control over practically every aspect of the simulated scenarios. The underlying individual-based simulation engine is highly optimized to enable modeling of entire chromosomes in large populations. For macOS, Linux, and Windows (native and WSL) users, we also provide a graphical user interface for easy simulation set-up, interactive runtime control, and dynamical visualization of simulation output.

Full SLiM Profile

SMARTPOP

http://smartpop.sourceforge.net/

Simulating Mating Alliance as a Reproductive Tactic for Populations

Description

SMARTPOP is a fast and flexible forward-in-time simulator for population genetics. Specially developed for speed, it is available in a serial and a parallel versions. Developed for anthropological inference on human populations and eco-anthropological questions, SMARTPOP simulates individuals with sequences of sex-linked DNA (mitochondria, X and Y chromosomes) and autosomes. Studies of social dynamics are enabled using SMARTPOP flexible demographic model and social rules of mating.

Full SMARTPOP Profile

SNPsim

http://code.google.com/p/phylosoftware/

Coalescent simulation of hotspot recombination

Description

SNPsim is a population genetic simulator that generates samples of SNP (Single Nucleotide Polymorphisms) haplotypes and diploid biallelic genotypes. It is based on the coalescent with recombination (Hudson 1983) modified by Wiuf and Posada (2003) to include recombination hotspots. SNPsim also allows for the specification of demographic periods and different mutation models.

Full SNPsim Profile

SomatoSim

https://github.com/BieseckerLab/SomatoSim

SomatoSim: precision simulation of somatic single nucleotide variants

Description

SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results.

Full SomatoSim Profile

SPARSIM

https://gitlab.com/sysbiobig/sparsim

SPARSim is an R tool used for simulating single cell RNA-seq (scRNA-seq) count tables.

Description

SPARSim is an scRNA count data simulator based on a Gamma-Multivariate Hypergeometric model. The package generates count data that resembles real data in terms of count intensity, variability and sparsity. The simulator is capable of simulating count matrices that resemble the distribution of different expression intensities observed in real count data. The package can also simulate single cell RNA-seq count tables.

Full SPARSIM Profile

SpartaABC

http://spartaabc.tau.ac.il/webserver

a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm

Description

SpartaABC is an Approximate Bayesian Computation (ABC) reject algorithm to infer indel parameters from sequence data. It focuses on the inference of three indel parameters: IR — the indel-to-substitution rate ratio, A — the shape parameter for the power law distribution controlling the indel length, and RL — the root length parameter. SpartaABC extracts a vector of summary statistics from its input; it then performs repeated simulations using an integrated sequence simulator (Fletcher and Yang 2009, Cartwright 2016) under various indel parameters. From each such simulated dataset it extracts a vector of summary statistics and computes its distance from the vector extracted for the input using a weighted Euclidean distance.

Full SpartaABC Profile

SPIP

https://swfsc.noaa.gov/textblock.aspx?Division=FED&id=3434

SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user

Description

SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user. Numerous variables controlling the age structure of the population, the number of offspring produced, the variance in male and female reproductive success, survival rates of different age classes, mate fidelity, duration of simulation, etc. can be specified by the user. The program stores the pedigree of all individuals in the simulated population. This pedigree is used to simulate genetic data on sampled individuals by tracing lineages back through paternal or maternal genes within each sampled individual. Data may be simulated for an arbitrary number of loci that are assumed to be independently segregating and to not be subject to natural selection, nor linked to any selected genes. Genotypes are reported in terms of both "founder alleles" (i.e., each distinct allele amongst the founders of the pedigree is given a distinct label) and also in terms of alleles whose frequencies amongst the founding members of the pedigree may be specified by the user.

Full SPIP Profile

Splatche

http://www.splatche.com/

Spatial and Temporal Coalescences in Heterogeneous Environment

Description

SPLATCHE (for SPatiaL And Temporal Coalescences in Heterogenous Environment) is a program that allows to incorporate the influence of environment in the simulation of migration of a given species from one or several origin(s). In a second phase, the molecular genetic diversity of one or several samples drawn from the simulated species can be generated. Geographic area and environmental information have to be specified by the program user in a series of input files. Basically, the virtual world where migrations take place is constituted by a matrix of demes. Each deme has its own environmental characteristics according to the input files. A coalescent-based approach allows to generate the molecular diversity of any population sample. The molecular data obtained can then be analyzed in order to study the signature of the simulated demographic scenario. The goal of this online manual is to describe the technical aspects of the software SPLATCHE (version 1.1). This manual complements the article from Currat, Ray and Excoffier, published in 2004. Further details on the methodology can also be found in Ray (2003) and Currat (2004). The pdf version of the user manual could also be download there.

Full Splatche Profile

Splatter

https://bioconductor.org/packages/devel/bioc/vignettes/splatter/inst/doc/splatter.html

an R package for the simple simulation of single-cell RNA sequencing data. This vignette gives an overview and introduction to Splatter’s functionality.

Description

As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.

Full Splatter Profile

SPsimSeq

https://github.com/CenterForStatistics-UGent/SPsimSeq

SPsimSeq simulates datasets from estimated marginal distributions with Gaussian-copulas.

Description

SPsimSeq uses an exponential family for density estimation to construct distributions of gene expression levels from RNA sequencing data, thereby simulating a new dataset from marginal distributions with Gaussian-copulas in order to retain the dependence between genes. It also allows for the simulation of multiple groups and batches with any required sample size and library size.

Full SPsimSeq Profile

SRTsim

https://github.com/xzhoulab/SRTsim

SRTsim is an independent, reproducible, and flexible Spatially Resolved Transcriptomics (SRT) simulation framework that can be used to facilitate the development of SRT analytical methods for a wide variety of SRT-specific analyses.

Description

SRTsim is an independent, reproducible, and flexible SRT simulation framework that can be used to facilitate the development of SRT analytical methods for a wide variety of SRT-specific analyses. It utilizes spatial localization information to simulate SRT expression count data in a reproducible and scalable fashion. Two major simulation schemes are implemented in SRTsim: reference-based and reference-free. Available both as an R package and a website: https://jiaqiangzhu.shinyapps.io/srtsim/

Full SRTsim Profile

srv

https://github.com/BoPeng/simuPOP-examples/tree/master/published/simuRareVariants

Simulator of Rare Varaints (srv) is a simulator for the simulation of the introduction and evolution of (rare) genetic variants.

Description

srv simulates the introduction and evolution of genetic variants in one or more regions of chromosomes. These regions span roughly 10k to 100k basepair and can be considered as a gene. During evolution, mutants are introduced to the population and change the fitness of individuals who carry these mutants. The most distinguishing feature of this script is that it allows multi-locus fitness schemes with random or locus-specific diploid single-locus selection models to newly arising mutants. A multi-locus selection model is used to assign a fitness value to individuals according the mutants they carry.

Full srv Profile

SVEngine

https://bitbucket.org/charade/svengine/src/master/

Allele Specific and Haplotype Aware Structural Variants Simulator

Description

SVEngine is a multi-purpose and self-contained simulator for whole genome scale spike-in of thousands of SV events of various types in both single-sample and matched sample scenarios.

Full SVEngine Profile

SymSim

https://github.com/YosefLab/SymSim

SymSim simulates single cell RNA sequencing data thereby allowing users to tune the variation of gene-expression on different levels.

Description

SymSim models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to three sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states, and technical variation due to low sensitivity and measurement noise and bias.

Full SymSim Profile

Synggen

https://bcglab.cibio.unitn.it/synggen

Fast and data-driven generation of synthetic heterogeneous NGS cancer data

Description

Synggen is a tool written in C programming language to generate synthetic NGS files, in the form of whole-exome or targeted sequencing experiments, representing heterogeneous cancer genomes and matched controls. The tool provides two execution modes which allow to (i) exploit a set of control (non-cancer) NGS sequencing files (BAM format) to generate reference models capturing a collection of data summary statistics; and (ii) combine these reference models and a set of user-specified germline and somatic genomic profiles to create synthetic sequencing files (FASTQ format). Synggen allows to input specific lists of germline variants and somatic genomic events, including phased germline SNPs and somatic allele-specific CNAs and SNVs, together with local and global parameters including the clonality of somatic events and the overall sample tumor content, allowing for the emulation of varied and realistic cancer- and patient-specific data across the different multi-subclones composition, tumor purity, aneuploidy and tumor evolution scenarios.

Full Synggen Profile

TreesimJ

http://code.google.com/p/treesimj/

A flexible, forward-time population genetic simulator

Description

TreesimJ is a forward-time simulator of an evolving population that tracks the evolutionary tree of the entire population. The application offers an intuitive GUI, a variety of pre-configured models of fitness, mutation, and demography, and a suite of data collectors that analyze the population and emit data to one or more sources. To the user, TreesimJ offers a simple, easy to use interface, a variety of interchangeable 'models' describing many aspects of the evolving population, and many ways to quantify and summarize the state of the population. Since the entire tree of the population is tracked, TreesimJ can easily be used to asses the average time to most recent common ancestor, the level of tree imbalance, or the mean pairwise coalescent time. It can also compute a number of familiar population genetic statistics, such as the nucleotide diversity and the number of segregating sites (if a model of fitness that includes DNA is used). The list of potential data collecting items is long, and getting

Full TreesimJ Profile

Variant Simulation Tools

http://varianttools.sourceforge.net/Simulation/HomePage

A simulation tool for post-GWAS genetic epidemiological studies using whole-genome or whole-exome next-gen sequencing data, with an emphasis on user-friendliness and reproducibility.

Description

Variant Simulation Tools is a module of Variant Tools for the simulation of genetic variants for sequencing-based genetic epidemiological studies. Although multiple simulation engines are provided, the core of VST is a novel forward-time simulation engine that simulates real nucleotide sequences of the human genome using DNA mutation models, fine-scale recombination maps, and a selection model based on amino acid changes of translated protein sequences. The design of VST allows users to easily create and distribute simulation methods and simulated datasets for a variety of applications and encourages fair comparison between statistical methods through the use of existing or reproduced simulated datasets.

Full Variant Simulation Tools Profile

VarSim

https://github.com/bioinform/varsim

A high-fidelity simulation validation framework for high-throughput genome sequencing with cancer applications

Description

VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing.

Full VarSim Profile

VirusTreeSimulator

https://github.com/PangeaHIV/VirusTreeSimulator

Simulates virus trees within a given transmission tree

Description

Simulates virus trees within a given transmission tree

Full VirusTreeSimulator Profile

VISOR

https://github.com/davidebolo1993/VISOR

VISOR is a haplotype-aware structural variants simulator for short and long read sequencing

Description

VISOR is an efficient and versatile command-line application, capable to simulate structural variants and small/single-nucleotide variants in a haplotype-resolved manner. VISOR currently supports simulations of bulk short (Illumina) and long (PacBio-ONT) reads sequencing data. VISOR also supports simulations of single cell, strand-seq data and includes a module, actively under development, capable to simulate 10X linked reads data. VISOR is readily applicable to canger genomics, enabling the simulation of tumour purity (normal in tumour contamination), heterogeneity (mix of several subclones) and aneuploidy. VISOR also incorporates capture biases, a crucial feature for whole-exome data sets and panel sequencing applications.

Full VISOR Profile

Vortex

https://scti.tools/vortex/

VORTEX is an individual-based simulation model for population viability analysis (PVA).

Description

VORTEX is an individual-based simulation model for population viability analysis (PVA). This program will help you understand the effects of deterministic forces as well as demographic, environmental, and genetic stochastic (or random) events on the dynamics of wildlife populations. VORTEX models population dynamics as discrete, sequential events (e.g., births, deaths, catastrophes, etc.) that occur according to defined probabilities. The probabilities of events are modeled as constants or as random variables that follow specified distributions. Since the growth or decline of a simulated population is strongly influenced by these random events, separate model iterations or “runs” using the exact same input parameters will produce different results. Consequently, the model is repeated many times to reveal the distribution of fates that the population might experience under a given set of input conditions.

Full Vortex Profile

Wessim

http://sak042.github.io/Wessim/

Whole Exome Sequencing SIMulator

Description

Wessim is a simulator for a targeted resequencing as generally known as exome sequencing. Wessim basically generates a set of artificial DNA fragments for next generation sequencing (NGS) read simulation. In the targeted resequencing, we constraint the genomic regions that are used to generated DNA fragments to be only a part of the entire genome; they are usually exons and/or a few introns and untranslated regions (UTRs).

Full Wessim Profile

WgSim

https://github.com/lh3/wgsim

a small tool for simulating sequence reads from a reference genome.

Description

Wgsim is a small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms. Wgsim outputs the simulated polymorphisms, and writes the true read coordinates as well as the number of polymorphisms and sequencing errors in read names. One can evaluate the accuracy of a mapper or a SNP caller with wgsim_eval.pl that comes with the package.

Full WgSim Profile

XomeBlender

https://github.com/rsemeraro/XomeBlender

Generates synthetic cancer genomes with different contamination level and intra-tumor heterogeneity and devoid of any synthetic element

Description

Xome-Blender is a collection of python and R scripts based on SAMtools functions that allows to generate synthetic cancer genomes with user defined features such as the number of subclones, the number of somatic variants and the presence of CNV, without the addition of any synthetic element. It is composed of two modules: InXalizer and Xome-Blender. The first module is devoted to the blending process initialization. It takes as input a single BAM file, a set of user-defined parameters and returns the coverage of the sample and the input-files for the second module (Xome-Blender). Optionally, it creates a file containing the coordinates to insert CNV in the final product. The second module generates the synthetic heterogeneous sample.

Full XomeBlender Profile

XS

http://bioinformatics.ua.pt/software/xs/

a FASTQ read simulator

Description

is a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. XS handles Ion Torrent, Roche-454, Illumina and ABI-SOLiD simulation sequencing types. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores).

Full XS Profile

Zombi

https://github.com/AADavin/Zombi

Zombi generates species trees, gene trees and sequences.

Description

Zombi is a flexible platform of genome evolution which can be of great interest to those who want to test different evolutionary hypotheses under simulations and need to use a fast and easy-to-use tool to generate species trees, gene trees or sequences. Zombi's output is especially simple and easy to read understand and parse.

Full Zombi Profile

Step 1: Select attributes to compare

Fill Clear Expand Collapse

Search the attribute tree

Target
- Type of Simulated Data
  - Genotype at Genetic Markers
  - Diploid DNA Sequence
  - Haploid DNA Sequence
  - RNA
  - Gene Expression
  - Sex Chromosomes
  - Mitochondrial DNA
  - Protein Sequence
  - Sequencing Reads
  - Phenotype
  - Single-Cell Sequencing
  - Bulk Sequencing
  - Proteomics
  - Chromatin Conformation
- Variations
  - Biallelic Marker
  - Multiallelic Marker
  - Single Nucleotide Variation
  - Amino acid variation
  - Microsatellite
  - Insertion and Deletion
  - CNV
  - Inversion and Rearrangement
  - Alternative Splicing
  - Missing Genotypes
  - Genotype or Sequencing Error
  - Ionization
  - Other
Simulation Method
- Standard Coalescent
- Exact Coalescent
- Machine Learning
- Forward-time
- Resample Existing Data
- Phylogenetic
- Gene dropping
- Neural network
- Other
Input
- Data Type
  - Allele Frequencies
  - Empirical
  - Ancestral Sequence
  - Saved simulation
  - Reference genome
  - Other
- File format
  - Arlequin
  - CREATE
  - Fstat
  - GDA
  - Genepop
  - MIGRATE
  - MS
  - SAM or BAM
  - NEXUS
  - Phylip
  - STRUCTURE
  - XML
  - Tree Sequence
  - Program Specific
  - Other
Output
- Data Type
  - Genotype or Sequence
  - Phenotypic Trait
  - Individual Relationship
  - Phylogenetic Tree
  - Demographic
  - Mutation
  - Methylation
  - Gene Expression
  - Protein Expression
  - Linkage Disequilibrium
  - Diversity Measures
  - Fitness
  - Sequencing Reads
    - Illumina
    - Roche 454
    - SOLiD
    - IonTorrent
    - PacBio
    - Nanopore
    - Other
  - Other
- File Format
  - Arlequin
  - Fasta or Fastq
  - Fstat
  - Genepop
  - Linkage
  - MIGRATE
  - MS
  - PED
  - Phylip
  - NEXUS
  - STRUCTURE
  - VCF
  - SAM or BAM
  - Tree Sequence
  - Program Specific
  - Other
- Sample Type
  - Random or Independent
  - Sibpairs, Trios and Nuclear Families
  - Extended or Complete Pedigrees
  - Case-control
  - Longitudinal
  - Other
Phenotype
- Trait Type
  - Binary or Qualitative
  - Quantitative
  - Multiple
- Determinants
  - Single Genetic Marker
  - Multiple Genetic Markers
  - Sex-linked
  - Gene-Gene Interaction
  - Environmental Factors
  - Gene-Environment Interaction
Evolutionary Features
- Demographic
  - Population Size Changes
    - Constant Size
    - Exponential Growth or Decline
    - Logistic Growth
    - Bottleneck
    - Carrying Capacity
    - User Defined
  - Gene Flow
    - Stepping Stone Models
    - Island Models
    - Continent-Island Models
    - Sex or Age-Specific Migration Rates
    - Influenced by Environmental Factors
    - Admixed Population
    - User-defined Matrix
    - Other
  - Spatiality
    - Discrete Models
    - Continuous Models
    - Landscape Factors
- Life Cycle
  - Discrete Generation Model
  - Age structured
  - Overlapping Generation
  - User-Defined transition matrices
- Mating System
  - Random Mating
  - Monogamous
  - Polygamous
  - Haplodiploid
  - Selfing
  - Age- or Stage-Specific
  - Assortative or Disassortative
  - Other
- Fecundity
  - Constant Number
  - Randomly Distributed
  - Individually Determined
  - Influenced by Environment
  - Other
- Natural Selection
  - Determinant
    - Single-locus
    - Multi-locus
    - Codon-based
    - Fitness of Offspring
    - Phenotypic Trait
    - Environmental Factors
  - Models
    - Directional Selection
    - Balancing Selection
    - Multi-locus models
    - Epistasis
    - Random Fitness Effects
    - Disruptive
    - Phenotype Threshold
    - Frequency-Dependent
    - Other
- Recombination
  - Uniform
  - Varying Recombination Rates
  - Gene Conversion Allowed
- Mutation Models
  - Two-allele Mutation Model
  - Markov DNA Evolution Models
  - k-Allele Model
  - Infinite-allele Model
  - Infinite-sites Model
  - Stepwise Mutation Model
  - Codon and Amino Acid Models
  - Indels and Others
  - Heterogeneity among Sites
  - Others
- Events Allowed
  - Population Merge and Split
  - Varying Demographic Features
  - Population Events
  - Varying Genetic Features
  - Change of Mating Systems
  - Other
- Other
  - Phenogenetic
  - Polygenic background
Interface
- Command-line
- Graphical User Interface
- Integrated Development Environment
- Script-based
- Web-based
Development
- Tested Platforms
  - Windows
  - Mac OS X
  - Linux and Unix
  - Solaris
  - Others
- Language
  - C or C++
  - Java
  - R
  - Python
  - Perl
  - Visual Basic
  - Other
- License
  - GNU Public License
  - BSD
  - Creative Commons
  - MIT
  - Other
GSR Certification
- Accessibility
- Documentation
- Application
- Support

Step 2: Select matching simulators

(ordered by match quality)

Compare Simulators by Attribute
Browse and Search Simulators
How to Use this Tool
How to Use this Tool
Steps
Select your desired simulator attributes in the Select attributes to compare pane in one of two ways:
Navigate the attribute tree
Use the text-box and its typeahead features to populate the attribute tree
Observe the simulators ranked by their match quality in the Matching simulators pane to the right
Select at least one and at most six simulators by checking their checkboxes to the right of each simulator and click the Compare button to view the comparison table
Important
This tool is best viewed in one of the following browsers:
Google Chrome
Firefox, 4+

56083564303116651567561163

365105156658562561560

163516510395301193031560561

56356156082833739533333433539853234534754058501225962

560

5853262656453453347851753017616617554424324215324624414837401633186548549550551552561562

128517532533153538303154854955055155255355437556405605615625635645258596257465577579695825835897859259583474476242243

83516548532583510

560561

562563561560733333343355303453475403031163363747484985862

5958538515529695514785175301665585185237301633183511560561562563566567568572573

163371665588356456657351031

59585425386265645155152953351753049173558371633182511560561562563

58621763949983373130119122163164246259308334347456457458474484503510530561562

595854253849969478517530490484164503166518333334345540335347454449451455438121456536246383086561

3316651311633230656153353437510514515530561562

5603356437587572

371665874615645668657362

86122306326633336547149951651753253754037530562

51631323333343353453683924004785165185325335405468337503130122163347484504510530560561562563

56251755852982564534572511563561560

82163306151031563561560572573166591

5638337119561560

5495054855153351753016617545281294283284242246373183510560561563

1665921633751053082532583031

45696312112813113214815216417539279282283284289302308333334335368400449451458478484489490491504516518532535536538539540546824737385950313062122163259266285286291345365366391392395398438454455471481510517530542560561562563

5301634993783510315615635935325962575562

560

56251645333334498499585405631633711951131561560

863858535216217539333838247483731306362122147148163259266308312334456499510530560561562

561560

8254858456247562561563

58164333532834869373112214715216339308334456478499510530560561562

58532499498530506334540121536122280474816215254137163313283560561562

626556656756856916636561564

58626554853017616650717547242153246373016331325198333550552560561562563

16652982835645857162563562561560

516532515293730119

562561560563333334335833453475401624547484996258510

62655295665685691663783510560561563

58711956256156059356358561529

56351265166515328357331561560564

471655035045165328348375030122163164259266308333394395398449454461471484510530561562

5165958532626349953017649548416650651833333454034745444945112145612240047152243259537266147308302148373930163318382510541546560561562

516505137510741755308262

5151686397458762

59529458517530175537479554556383086561

3951011956256156059356361591

83331481221661761755031163586224428029130233433539149353655237510530560561562

58549626554855153017616655050728128324215324637301633133552561562

5635615603716651053051825862

5862333532834837313012216317539246259266308334456457474499507510530560561562563

516585425325466263526529458533517530493164528518555317475777854347481523730163313283511561562

5175185192665295301481535403031323316354816416655137361755056452566567568569585705725716169333334828386119345347478490498499243242246244510562563561560

585325054845869551553478530493490495484489164518547747576777879525803333343455403353471281223684529128153929428528327928928428828229528628047481521473083739163318382510560561562

58532529533530166373136511560561

3669381661647675525051311633230595865616247849351251451551651751852652953253353453854254654937530560561562563

59585325466250478530493491495489518747576777854352533333434554034745444945112153612812252236845291539294285283279289284295474815225914730830255637301633183512560561562

5165328337385030122163283284289333449484489491510530560561

828351756169587175563560

45474837595853525031326283128152162163175279288289291333334347449456484490491527532510530560561

4994985051548557529560563

563163516363059031561

516542532625265294585335175304931645253747879525803738301633136510284560561562

5635615605852957353016639403133510

5166217652557576777833337313251986511525529531532530561

5345153351749317574757678556373183510511560561563

5958538626564515534529533517530164166544554556371633182510560561562564

56316383378658731291455615603334733433550347152345585954062

164375183951052952533562561560866147831563

335165171645185571755305318353311959478561560563

595852326214717517633434549049149551853254054183483738313012216339333347449478493498510530560561

3645473852128147152279288291347365367398400451458506516517532538540541546550831224837503130163283289333335449454456504510530561

8347624955165321224837503130163334335345458510530561

8251375962511560561

51251363759563560561562

516585325051529458533517530493490495164525374777879525803341211281321223983654529128348152259373830163313282510553560

516561560507382166375171631642943959212813151813239252239853053253327928015228354128531544289587302175474852308312624587576777848448949049336549536824412253654043859579580333334335345347563

5859333505321223476286373031563

49969533517164175119560

333716651051615857257362563562561560

16337758330315165305153262

8369797877745331306251651852937510530560561562

83693753555162516518532407877743159510530561563

562516532516953351717553777879371633183119511561

51651073458589529530825325331481523475403116339552553491495523085962

1193731163323058458529530532533510560561562

516625051695301757475777833312147383083511561

82385958636212814714815224330833449050450550751051651953253653754054154683373130122163244246259266333449454484498530560561

8331561562

58549655485515301761662432421532463738301633183510552562561560

585495326554853016624215324637301633183551552510560561562

56256156051537166529825645665862511563

562510561560586585584

5958532626450530493490484518525375333347449456122452912894748162152373183510540560561

166511558175592863031563561560

4905605165321191223750313016339246266308309335353366392457471472503504510530561

5626552636166558564532570574563561560

61376256382589119593

5833353283483731122308312334449473491493499506510530560561562563

5165325466255263504785304904841655183333455403353474544494514554565361281223953984003653674748152259266246381638386511560561

595862656452953351753016655852371633183512560561562563

363738510561560563595596

5853256283483731122163279283308334449473499506530561

8312152163164284288333347455489493516532540122483840503130246283289334335365400449454458491510530560561

516532546625052945847851749048416451833333534745612813113235312236536836736645280259308373911956056156356228944929140028530

36385833351512237503130246266308309335353366392456457458490504510530561562564

5623783119510560561591563

408251056159356361598591

5937625632893655649350530282284573510562561560

5625885635618316337315153357261

58333506532834837313012216316439246283335449456484499510530533560561

51621195961510560561

4737585231326283121147162176333454456540541122163308334449499506510530561562

474837585231326283162176333454540122163308334449499506510530561562

58333532364837313012216316439284308312334449456458473499506510530560561562

562563561

53262499498473530491506523333343455403353474544494514554381214565361224748152541147308302373916331328382510560561562563

51716637397353082573510561560563562

58626554853016650717524315324637383031561

56256351551163371668256457151031561560

56253254662655152957351756616637301633182511560561563564

1211284547695958807877747675535551306362148152164165175176280283284291309312333334345347366367438451456458491516518522525531532533536537538539540543544546831311321224837727950311471633925926627928228528828929430230833535336536839539840044945445546147848448949049350451053055355455656056156256357758057954739259049552

56233833755851564561

8347595831306216316617624224324628030233333444949049149950751653053254054755055137510560561

56251564371668657030

5628353254953411958573561

585325155055153347853017649049148448951852547544334345132367366452915392852832892842881521483739301633183119510560561562563

5635615603637510558530515665862

5626533515515648362575561

56356156056256483510

17551653212237385030179246266279280282302334335345353365392395398455456460484490530561

5615621633733452982499532583473031

3351716637587461530533573563561560

51755016652953024282510563562561560

5294735304931755253747576777879371633182560561563

5625635615601631195113031590

5631633711951131561499516

5865548533474476517530491166175281243242153246393016331119510552560561562

561548549373855236592499562

59585155155347853016637318333512560561562563564

52961511119561560562563

5935856111937511562563561560

56356211937511561560

56356156051633333433583532279345283347540312891635058510

5635485855346282373961510562561560593

595854962552499498548695515334904911665505075473333343455404544494513534748243242244147308302148371633183510560

5937865615875557563

5853483119561

651663958855853082573575561

562560563561

63495516532831211281321224748375958755031306214715216317639259283294295308309333334347365368391395398456458478484490493510515518526530531537540546561562

12845476959807978777476756362147148152164165176259279284289291294302333334345347366367368458516525531532540546831224837503130163246266283285308335365395398400449454455456461478484489490493510530560561

5165954253254662634994985052945869553457461474530493176490484489164503165507525317475767778543795258033333434554033534745444945145543845653657712812239539840052236536836736639045291539294285283279289284288295474815225926624614730830214830931237301633183510561562563

8216657057162564

58532624995169473530506523333344748152243371633183560

5165958532624995301764904911665073333345404492804724324224630237301633183510550551560561

517747578795308230315441633749317551573511561560563

5635615603758758882835668631

56016336375495105525293031561

6152951716617537318286510560561

516532635304845035183355372663083023930163313283119510560

36119311635123761530560561563

586154965548478530243153313283560562

562560561

56251647561363750279283563560

5635623786561560

54416451739587175593516111957331

593585615898258786563561560562

5853249947353016450652333334540454449121456122474854114730814837301633183119510560561562563

5636582587166561560585962575593

8237166587573563561560593594

1663958755853082573575561560

833730315922915164550561

4547483758523132306283122163175266279283333451493499510530563561

5615605635775795165805873333343353453475403655762

5854954855147653017655050724324224637301633183561562563

56261563561163833711930554560510556534

585325265155051458695534785304935185253747578333334540347449128122365291289288475412441473739163318311982510560561562563

5165953262499515575515304935285255531537475767778793730318336510561562

5635628258756156037

58333532834837383130122243334449476493499510530560561

45474837405958525031323065628312112814714815216216416617524328028829530931233335336539845145848148449051251751851952252953253353453553653753954054254612215316339242244246279281283289291302308334335345347367394400449454455456473478489491506510530560561562563

163375118386119303153724456356051645561

557733352953153251645875373183510526527528530560

5605635165326252953351753052555314748393016331119510561562

5638251737573510562561560

516532546635052945853346147851753049048916450316550453551817578543525803453475361281225223673662915392852892802661473083021483123738301633186560561562

833786511529516560

51657958045445545633333433553253634534754029145475056443812112259635638337511561560449577451

47595863623335035165325348348373012239244259266308334347456457474498507510530561562

585526564534548529474476478164166243153246393031119510560561

51653049352855531537475777879525803333730163313283532557

56256356156082587533303116337166564566567568569570571572

461515335323783510560561564

516542532546530176747537303183557560561

163516373659051305736231561

56251853014753353615354131162163548549166373917517647505859615756533383474347478490242499243246122507561510563560

549546655485515301662422463840303183511552

51658532625265154985045869517530493490164725253747577783333343455403474544491281223983683674529153929428928828647481525375413830163318311982510561562

560564

56264546516587175530515326259542563561560

5165325069533530493518525553777879333334454121128122365452912832792892844715237383930163313282510560561563

45474859588079787774767553555250513132306362828311912112813113214714816216316416517517639244259282280284288302333334368391461491510516518522525531532533535538539540542543546547553560561563122152243246266279283285286289291294295308309335347365367392394395398400438449454455456471473476484489490493495504506530550562

546166395305153211930585736231

515513716656453256683510563561560

62655291663783560562563566574575564

595861626554816655017524215337301633151055183561560

5049951611939560563561

5901281665165958542538549532546625526365645145265155345045869551553533473478517530493176490491495484489164506735181755255531535475447475767778543795258033333434554033534745444945145543812145653613113235312239539839440039239152236536836736639045291281539294285283279289284288282295286280474816215224325953754126624215324624414730830214830931255455637383916331838651056056156256358258357557757958058157330

595854253862505535334785304841645031665185233333434554033534712812239539839452236824330830230931237301633183510560561562

5958152176333516518532541834837313012216339334347478499510530560561

823762588558563562561560

5635628337510119561560

655295305168355140561560

175176516532831223750313016339308347365395455456458471490504510530563

835958526362121175302308333335345347449454478490517532538546373830122147246266438498503510530561562

562560563561

5935855915946151237119562563561560

54416451739401755161119573510

5953250533530491489518523333345403474491281223652912892884754124414737383930163313282510542560561563

51716656156051237461558529825315256658573575563

56356211937511561560

5155185875888347831371665646257556056165

36147128122453816617616450311633230596224330233333434736545648451052753237530560

5165953254662505307352555374757833333434554033534745444945145543812145653612240029128047481621522592421471483093739163318382510560561

512546163516517377352982532305736231563562561560

548373658510560

5625958542564626365575645291665583782510560561563

12114813112245807476751762592802822832842852862892942953093673683914004494544904935165185255275315325335345375395405425468337385030288291302308335353366392395398455456458471478484504530563561562

59585295335175301665274525554556373182511561562

1633716653051833031561563564

563561560693755853051528256611958510575

4994985051548557529561560564563

823751056156059258563