Available applications
Virtual Machine name | Operating System | Software installed | PMES Application / URL |
---|---|---|---|
serial_maker+ | Debian (6.0.5) | Maker (2.28), Exonerate (2.2.0), Snap (2006-07-28), Augustus (2.5.5), Blast (2.2.28+), RepeatMasker (1.295), TRF (4.07b) |
Maker (help) Exonerate (help) Augustus (help) |
bwapipeline | Debian (6.0.5) | BWA (0.1.17), bcftools (0.1.18), samtools (0.7.5a) | Bwa (help) |
bowtie+ | Debian (6.0.5) | Bowtie (2-2.2.3), tophat (2.0.12), boost (1_55), samtools boost (1_55), samtools (0.1.80.1.8) |
Bowtie (help) Tophat (help) |
bignasim | Ubuntu (8.04) | MongoDB v.2.6.2, Cassandra 2.1, Curves+ 2.0, R 2.15.0, Gnuplot 4.2, Grace 5.1.21, GROMACS 5.0, JSMol 14.0.5, PCASuite 1.1, Ambertools 14, Netpbm 10.0, VMD 1.8.5, ffmpeg 2.5, MDAnalysis, RnamlView |
BigNASim (help) |
nucleosomedynamics | Debian (8.2) | R 2.15.0, Gnuplot 4.2, Grace 5.1.21, JBrowse 1.11.6, MongoDB 3.0.7 | Nucleosome Dynamics (help) |
Description of the applications
PMES applications allow the user to configure and launch the software installed within each cloud virtual machine. Here are detailed the arguments required by each application, the specific software used by each wrapper or pipeline, as well as the input files that each virtual machine need to access in order to run the specified software. More detailed information about the execution process and the general I/O management can be found in the Dashboard documentation.
Genome Annotation
MAKER
Application Type: stand alone
This application runs the genome annotation pipeline MAKER 2.
- MAKER: identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions and automatically synthesizes these data into gene annotations.
Arguments | configuration file Opts | Maker configuration file detailing general options and input files |
configuration file Bopts | Maker configuration file detailing the similarity parameters | |
cpus blast | Number of CPUs of BLAST2. They should correspond to the total number of CPUs reserved. | |
basename | Base-name of the pipeline output | |
Input Files | The two configuration files need to be uploaded, together with any other file referenced into such control files. | |
Output Files | The application returns a compressed folder called [BASENAME].tar.gz | |
Sample Files |
Configuration files: http://transplantdb.bsc.es/documents/samples/maker/maker_opts.ctl http://transplantdb.bsc.es/documents/samples/maker/maker_bopts.ctl Files refered into the configuration file Opts: http://transplantdb.bsc.es/documents/samples/maker/dpp_contig.fasta http://transplantdb.bsc.es/documents/samples/maker/dpp_est_fasta |
|
Special requirements |
AUGUSTUS
Application Type: stand alone
AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be used as an ab initio program, but the program may also incorporate hints on the gene structure coming from extrinsic sources such as EST, MS/MS, protein alignments and synthenic genomic alignments.
Arguments | query sequence | The query file contains the DNA input sequence and must be in uncompressed (multiple) fasta format |
specie | Choose one of the followings, for which Augustus has been trained:
human, fly, arabidopsis, brugia, aedes, tribolium, schistosoma, tetrahymena, galdieria, maize, toxoplasma, caenorhabditis, , aspergillus_fumigatus, aspergillus_nidulans, aspergillus_oryzae, aspergillus_terreus, botrytis_cinerea, candida_albicans, candida_guilliermondii, candida_tropicalis, chaetomium_globosum, coccidioides_immitis, coprinus, coprinus_cinereus, cryptococcus_neoformans_gattii, cryptococcus_neoformans_neoformans_B, cryptococcus_neoformans_neoformans_JEC21, debaryomyces_hansenii, encephalitozoon_cuniculi_GB, eremothecium_gossypii, fusarium_graminearum, histoplasma_capsulatum, kluyveromyces_lactis, laccaria_bicolor, lamprey, leishmania_tarentolae, lodderomyces_elongisporus, magnaporthe_grisea, neurospora_crassa, phanerochaete_chrysosporium, pichia_stipitis, rhizopus_oryzae, saccharomyces_cerevisiae_S288C, saccharomyces_cerevisiae_rm11-1a_1, schizosaccharomyces_pombe, trichinella, ustilago_maydis, yarrowia_lipolytica, nasonia, tomato, chlamydomonas, amphimedon, pneumocystis |
|
optional parameters | Original optional Augustus parameters. Default: –strand=both –genemodel=partial –maxDNAPieceSize=200000
Consult http://augustus.gobics.de/binaries/README.TXT to modify the default parameter or add others. |
|
output | Base-name of the GFF that will be generated | |
Input Files | Augustus only requires the FASTA file corresponding to the ‘query sequence’ parameter. | |
Output Files | The pipeline generates an output file called [OUTPUT].gff | |
Sample Files |
query sequence: http://transplantdb.bsc.es/documents/samples/augustus/sequence.fa specie: arabidopsis |
|
Special requirements |
Pairwise Sequence Alignment
EXONERATE
Application Type: stand alone
Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using many alignment models, either exhaustive dynamic programming, or a variety of heuristics.
Arguments | Query | query sequence/s required. These must be in a FASTA format file. Single or multiple query sequences may be supplied in one or more files. | |||
Target | target sequence/s required. Also, must be in a FASTA format file. As the query sequences, single or multiple target sequences and files may be supplied.
Are also available though the shared storage, the Plant Esembl genomes (release 20) and the GRCh37 human genome. In order to use them, specify one of the following options:
|
||||
Options | original optional Exonerate parameters. Consult them in: http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.html
Default: –model ungapped –bestn 0 –score 100 –exhaustive FALSE –showtargetgff yes |
||||
Chunks Query | equivalent to querychunktotal. Number of chunks into which the query will by split in order to run on different nodes (*). | ||||
Chunks Target | equivalent to targetchunktotal. Number of chunks into which the target will by split in order to run on different nodes (*). | ||||
Output basename | basename of the output | ||||
Input Files | The Query FASTA file/s need to be uploaded. And also the Target FASTA file/s, unless the target correspond to a sequence included in the Plant Esembl genomes (release 20) and or the GRCh37 human release. In such cases, only the “Target“ parameter need to be specified. | ||||
Output Files | The application generates a GZIP file containing the concatenation of all Exonerate outfiles. File: BASENAME.gz | ||||
Sample Files | Query: http://transplantdb.bsc.es/documents/samples/exonerate/TAIR_partial.fa Target: arabidopsis_lyrata Chunks Query: 1 Chunks Target: 4 Adanved tab → Cores: 4 (*) (*) Consider that the total number of cores reserved in the cloud should correspond to: Chunks-Query multiplied by Chunks-Target. If, for example, you wish to split the target database into 3 parts and the query into 2, 6 exonerate jobs would run, so 6 cores need to be reserved. The granularity of the chunk goes down to a single sequence. |
||||
Special requirements | The application requires access to the DATA2 data storage, is no Target file is uploaded and instead, Ensembl or GRCh37 databases are specified. |
NGS Alignment
BWA
Application Type: stand alone
This application is a sequential pipeline that uses BWA to align paired-end reads against a reference genome and converts the resulting alignment into a BAM file using SAM Tools.
- BWA (Burrows-Wheeler Alignment): software package for mapping low-divergent sequences against a large reference genome.
- SAM Tools: provides various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
Arguments | fastq1 | paired-end reads file 1 in fastq format. | |||
fastq2 | paired-end reads file 2 in fastq format. | ||||
Reference Genome | indexed reference genome (Ensembl release 20). Options:
|
||||
Output basename | Base-name of the pipeline output | ||||
Input Files | The files required to run the application correspond to the arguments fastq1 and fastq2. | ||||
Output Files | The pipeline generates an output file called [BASENAME].bam | ||||
Sample Files |
Fastq1: http://transplantdb.bsc.es/documents/samples/bwa/1.fastq.gz Fastq2: http://transplantdb.bsc.es/documents/samples/bwa/2.fastq.gz Reference Genome: arabidopsis_thaliana Output base-name: results |
||||
Special requirements | The application requires access to the DATA2 data storage, where Ensembl database is stored. |
TopHat
Application Type: stand alone
This application executes the TopHat program. Additionally, it runs bowtie2-build to build the genome bowtie2 indexes.
[ bowtie2-build ] → TopHat
- TOPHAT: is a program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie.
The application, as the original software, behaves differently according to the given arguments. For instance:
- Align reads:
- Build transcriptome from GTF:
- Resume:
Arguments | read | A comma-separated list of files containing reads in FASTQ or FASTA format. For paired-end reads, this should be the *_1 files. |
read2 | A comma-separated list of files containing reads in FASTA or FASTA format. Only used for paired end reads. It contains the *_2 set of files, which must appear in the same order as the *_1 files. | |
index | Genome to be searched. The parameter accepts two types of values:
-1 : Bowtie2 indexes basename. The program will look index*bt2 and index*rev.bt2 files, which require to be uploaded (Input tab). -2: comma-separated list of files containing reads in FASTA format. They will be indexed using Bowtie2-build program. |
|
cpus | Number of threads to align reads. They should correspond to the number of cores reserved in the ‘advanced’ tab. Notice that Bowtie2-build do not parallelise. | |
output | Basename of the directory in which TopHat will write all of its output. | |
topHat options | Native options of TopHat program. Notice that some options are input files (i.e. -j file.juncs), therefore, they require to be uploaded to the cloud though the ‘input’ tab.
Check options at: http://ccb.jhu.edu/software/tophat/manual.shtml |
|
Input Files | When running Tophat to align RNA-Seq reads, they need to be uploaded to the virtual machine in FASTQ or FASTA format. The target name or target path (3th column), should correspond to the argument ‘read’ and ‘read2’.
When using pre-built indexes in ‘index’, *.1.bt2 and *.rev.1.bt2 files need to be uploaded. When new indexes are to be build, the original genomic FASTA files should be transfered. Additionally, when user supplies their own insertions, deletions, or list of known transcripts, the corresponding .GTF, .BED, .JUNCS, etc., files need to be correclty specified within ‘TopHat options’, as well as uploaded through the ‘input’ tab. |
|
Output Files | The application returns a [OUTPUT].tar.gz, a compressed version of the standard Tophat ouput directory.
When a new transcriptome index is created ( –GTF & –transcriptome-index within ‘TopHat options’), it is included in the [OUTPUT].tar.gz, so it can be reused in other TopHat runs. |
|
Sample Files |
read : http://transplantdb.bsc.es/documents/samples/bowtie/reads_1.fq read2: http://transplantdb.bsc.es/documents/samples/bowtie/reads_2.fq index: arabidopsis_lyrata cpus: 8 topHat options: -r 20 |
|
Special requirements |
Bowtie
Application Type: stand alone
This application allows launch the fast aligner Bowtie2. Additionally, the wrapper includes the option to build the indexes from input reference genomes, if the pre-build indexes of ENSEMBL Plants full-version genomes were not suitable. The application includes the following software:
[ bowtie2-build ] → bowtie2 → samtools
- BOWTIE2: It is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters to relatively long (e.g. mammalian) genomes.
- BOWTIE2-build: The program indexes the genome with an FM Index (based on the Burrows-Wheeler Transform or BWT) to keep its memory footprint small. This step is only performed when the user supplies a genome sequence to be indexed.
- SAM Tools: provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
Arguments | read | unpaired reads to be aligned OR paired-end reads containing mate 1s. FASTQ is the default format. A list of comma-separated read files is also accepted. | |||
read2 | paired-end reads containing mate 2s when mate 1s is specified in argument read. FASTQ is the default format. A list of comma-separated read files is also accepted. | ||||
reference | reference genome against whom reads are aligned. The parameter accept two possible type of data:
|
||||
only index | yes|no. Return only the indexes build from ‘reference’ sequence/s. As no reads will be aligned, the following arguments will be ignored:’read’, ‘read2’, ‘cpus’, ‘bowtie2-parameters’. Default: no | ||||
cpus | number of thhreads created by bowtie2. They should correspond to the number of cores reserved in the ‘advanced’ tab. Notice that bowtie2-build do not parallelise. | ||||
output | base-name of the final packed and compressed output | ||||
bowtie2-build parameters | native options of bowtie2-build program. Consult them in: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer
Default: -f –offrate 5 |
||||
bowtie2 parameters | native options of bowtie. Consult them in http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#command-line
Default: –end-to-end –sensitive –un-gz unpaired.sam.gz –met-file metrics.log –time |
||||
Input Files | Read files in FASTQ or Illumina’s QSEQ format (bowtie2 parameters = –qseq) need to be uploaded to the virtual machine.
The target name or target path (3th column) should correspond to the argument ‘read’ and ‘read_2’. Additionally, when user supplies their own indexes, all *.1.bt2 and *.rev.1.bt2 files need to be uploaded, and their path and base-names set in the argument ‘reference’. However, if ‘reference’ argument refers to genomic sequences, the files to upload are the FASTA reference genome sequences. |
||||
Output Files | The application returns a [OUTPUT].tar.gz file.
It will contain a variable number of files, created according to the provided options:
|
||||
Sample Files |
read: http://transplantdb.bsc.es/documents/samples/bowtie/reads_1.fq read2: http://transplantdb.bsc.es/documents/samples/bowtie/reads_2.fq reference: arabidopsis_lyrata only_ index: no |
||||
Special requirements | The application requires access to the DATA2 data storage, if Ensembl database is used as reference genome. |