How to use the pipeline in the docker container¶
Pull the docker image from Docker Hub
docker pull mwakok/satay:latest
Build the image and create the docker container locally in your computer
docker build . -t mwakok/satay:latest
Run the pipeline¶
Move to the location where you have the data you would like to mount to the container , to use
$(pwd)
in the command bellow (simplest option) , otherwise indicate the absolute path from your computer you would like to be loaded.
# For Windows (and WSL):
docker run --rm -it -e DISPLAY=host.docker.internal:0 -v /$(pwd):data/ mwakok/satay:latest
# For macOS
docker run --rm -it -e DISPLAY=docker.for.mac.host.internal:0 -v $(pwd):/data mwakok/satay
# For Linux
docker run --rm -it --net=host -e DISPLAY=:0 -v $(pwd):/data mwakok/satay
Access the terminal of the docker container¶
# For Windows (and WSL):
docker run --rm -it -e DISPLAY=host.docker.internal:0 -v /$(pwd):data/ mwakok/satay:latest bash
# For macOS
docker run --rm -it -e DISPLAY=docker.for.mac.host.internal:0 -v $(pwd):/data mwakok/satay bash
# For Linux
docker run --rm -it --net=host -e DISPLAY=:0 -v $(pwd):/data mwakok/satay bash
The flag
-e
enables viewing of the GUI outside the container via the XserverThe flag
-v
mounts the current directory (pwd) on the host system to the data/ folder inside the container
Creating a custom adapterfile.fa in your data/ folder¶
To know the adapters sequence , one way is to look for the overrepresented sequences in your dataset. Steps:
Run the pipeline with the
[x] Quality checking raw data CHECKED
[x] Quality check interrupt CHECKED
When the GUI ask you to continue , say NO and go to your local
/data/fastqc_out/
Open the corresponding html file and go to the “Overrepresented sequences” section
Copy the sequence that has more than 15% of representation.
Create the adapterfile.fa in your local data folder
Open a bash terminal and move to the location where you have the data you would like to mount in the pipeline (fastq files)
cd /data
Create the adapterfile file customized to your dataset.
nano adapterfile.fa
Inside the
nano
editor , edit the file as follows:
> \> Sequence1 > > Overrepresented sequence 1 > > \> Sequence2 > > Overrepresented sequence 2
Ctrl-O save , Ctrl-X and quit the editor
Note
Note to not put empty lines in the text file, otherwise BBDuk might yield an error about not finding the adapters.fa file.!
Run again the container and it will automatically look for that file (adapterfile.fa) in the data folder .
Troubleshooting¶
When running the container , mainly for the 1st time , after a reboot of your PC, this may pops up:
Gtk-WARNING **: cannot open display: :0
There is a solution in Linux is typing the following command in the terminal : xhost +