Software installation guide for whole genome sequencing and transposon mapping

Introduction

This section discusses the main pipeline for processing satay datasets, from the raw fastq files of the sequencing output to the bed, wig and pergene text files which can be directly used for analysis. See the image below for a schematic overview of the pipeline with the used software tools between brackets and the file type after each processing step on the left.

2. Fastqc (Windows or Linux)

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Fastqc is used for quality checking. To install Fastqc in Linux, open the bash and go to the fastqc folder in the bash. Type in the command sudo apt install fastqc (this might require a password). To run interactively, click the ‘run-fastqc’ batch file. To run non-interactively, enter in ‘fastqc’ in the command line.

3a. Trimmomatic-0.39 (Windows or Linux)

http://www.usadellab.org/cms/?page=trimmomatic

Trimmomatic is used for trimming fastq files. It is java based and therefore does not need to be installed. To run, enter ‘trimmomatic-0.39’ in the command line. The adapters folder contains some adapters that can be used during trimming if desired.

123Fastq-v1.1 (Windows) (optional)

https://sourceforge.net/projects/project-123ngs/

123Fastq is Fastqc and Trimmomatic combined in one interactive program. It is java based and therefore does not need to be installed. Click the 123fastq executable jar file to run the program.

3b. BBDuk-38.84 (Windows or Linux)

https://jgi.doe.gov/data-and-tools/bbtools/

BBDuk is an alternative for Trimmomatic for trimming of fastq files. It java based and therefore does not need to be installed. It is part of the bbtools packages (named the bbmap when downloaded). Once downloaded, unpack the .tar.gz package. Run the bbduk.sh executable in the bbmap directory. The adapter.fa file is included and located in the /resources directory.

4. BWA (Linux)

http://bio-bwa.sourceforge.net/

BWA is used for aligning the reads to a reference genome and to index the reference sequence. After downloading the software in the VM, install it by entering the following commands in the terminal:

bunzip2 bwa-0.7.17.tar.bz2
tar -xvf bwa-0.7.17.tar
cd bwa-0.7.17
sudo apt-get update
sudo apt-get install bwa

To run, enter bwa in the terminal.

5. SAMTools and bcftools (Linux)

http://www.htslib.org/

Samtools is used for processing after alignment, for example for converting SAM files to BAM files. After downloading the software in the VM, install it by entering the following commands in the terminal:

bunzip2 samtools-1.10.tar.bz2
tar -xvf samtools-1.10.tar
cd samtools-1.10
sudo apt-get update
sudo apt-get install samtools

To run, enter samtools in the terminal. Do the same protocol for bcftools.

6. Sambamba (Linux)

https://lomereiter.github.io/sambamba/

Sambamba is used for processing after alignment, for example for sorting and indexing the BAM files. After downloading the software in the VM, install it by entering the following commands in the terminal:

gunzip sambamba-0.7.1-linux-static.gz
chmod +x sambamba-0.7.1-linux-static
sudo ln -s /path/to/sambamba-0.7.1-linux-static /usr/local/bin

(where in the last line /path/to/ needs to be replaced with the actual path.) To run, enter sambamba in the terminal.

7. IGV (Windows) (optional)

https://software.broadinstitute.org/software/igv/

IGV (Integrative Genomic Viewer) is used for visually check the results. Click the IGV_Win_2.8.0-installer and run the install process.

8. Matlab Transposon count (Windows)

https://sites.google.com/site/satayusers/complete-protocol/bioinformatics-analysis

This code relates the number of reads and transposon counts to the genes. This code is provided from the Kornmann-Lab.