Using Singularity on UCD FARM Cluster

Why using Singularity?

Singularity is a container platform and is the only container installed on FARM cluster because of its higher security compared to Docker, which is the other (more) famous container. Therefore, for tasting the benefit of using containers in bioinformatic analyses, I played around with Singularity using some tutorial examples.

I would say there are far more tutorials for Docker than for Singularity. In addition, there is a larger number of existing bioinformatic container imagines in Docker Hub, a Docker registry. For example, I found there was no container imagine for HISAT2 (a popular aligner for RNAseq) in the registry of Singularity called Sylabs Cloud (this post was written in Oct 2020), but there were tons of HISAT2 container imagines in Docker Hub.

Fortunately, Singularity can import Docker container images. That means you can still utilize the deposits on Docker Hub! For example, try:

1
singularity run docker://godlovedc/lolcow

There are some warning messages, but it still works and you can still see the cow talking some weird jokes on your screen.
Imgur


Run Singularity on FARM: Basic

You will see me go through most of them bellow for running the example provided by Biocontainers. Here are some brief introductions of those commands.

Create a Singularity image

Well, you cannot do this on FARM because the sudo command (system administration only) is required. Forget it or do it on your own computer.

Pull images

It is a good practice to create a directory for downloading the images you have interest in.

1
mkdir Singularity

Then enter the directory.

1
cd Singularity

Now you can use singularity pull to download both the Singularity and Docker images to your system.

Interact with Singularity images

There are 3 ways to interact with Singularity imagines: shell, run, and exec.

  1. singularity shell: Create a new shell within your container, so it is like you enter the container and use a small computer to work inside.
  1. singularity exec: You can specify the custom command in the container.
  1. singularity run: You will ask the container to run the runscripts created by the image author.

You may want to experience these three different flavors by running the lolcow example in the Singularity documentation.


Run Singularity on FARM: with the Blast example on Bocontainers

Even though it is exciting to find the bioinformatic container you want in the registry, here comes the other problem: how to choose the container if there are so many redundancies (eg ~10 containers labeled as “HISAT2” in Docker)? Also, you may wonder if those containers are still maintained by the author and updated properly. The other concern is that according to the Singularity documentation, *”Pulling Docker images reduces reproducibility. If you were to pull a Docker image today and then wait six months and pull again, you are not guaranteed to get the same image.”* That indicates it would be better to find some reliable Singularity registry rather than just import the Docker image if you can. I found the Biocontainers project is a good solution for those problems, and I will have the other post about Biocontainers. Now let’s just go through the BLAST example in Biocontainers documentation with some modifications: Running first container — BioContainers documentation The goal of this demonstration is to find if the zebrafish has a protein similar to human’s prion protein by BLAST.

Human prion protein sequence -(BLAST)-> Zebrafish protein database

Download BLAST image

First, it seems directly convert docker pull biocontainers/blast:2.2.31 to singularity pull biocontainers/blast:2.2.31 does not work. It could be because the URI does not work anymore. Therefore, I checked the Biocontainers registry directly:
Imgur
Imgur

OK, now I found the suggested command to download the Singularity image:

1
singularity pull https://depot.galaxyproject.org/singularity/blast:2.9.0--pl526he19e7b1_7

To print the help page of BLAST:

1
singularity run blast:2.9.0--pl526he19e7b1_7 blastp -help

The blastp -help command after singularity run specifies the container to open the help page.
Imgur

Download the necessary data for BLAST

Interestingly, even though the documentation suggested using

1
docker run biocontainers/blast:2.2.31 curl https://www.uniprot.org/uniprot/P04156.fasta >> P04156.fasta

to download the data required for the trial run, I could not use something like

1
singularity run biocontainers/blast:2.2.31 curl https://www.uniprot.org/uniprot/P04156.fasta >> P04156.fasta

to do so.

Therefore, just download the necessary data into the directory containing the BLAST image:

Download the human prion sequence:

1
curl https://www.uniprot.org/uniprot/P04156.fasta >> P04156.fasta

Download the zebrafish database:

1
curl -O ftp://ftp.ncbi.nih.gov/refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz

Unzip it:

1
gunzip zebrafish.1.protein.faa.gz

Now, here are what we have in the directory:
Imgur

Run the BLAST

If you only run:

1
singularity run blast:2.9.0--pl526he19e7b1_7

As you can see, you will open a new shell (you will find that the custom prompt is now “Singularity>”), just like you do:

1
singularity shell blast:2.9.0--pl526he19e7b1_7

Imgur
I would guess that is because the author writes this in the runscript to make this happen.

Anyway, let’s use singularity shell blast:2.9.0--pl526he19e7b1_7 and enter the container. Then, in the Singularity shell:

  1. Prepare the database:
    1
    makeblastdb -in zebrafish.1.protein.faa -dbtype prot

That will create files:

  • zebrafish.1.protein.faa.pin
  • zebrafish.1.protein.faa.phr
  • zebrafish.1.protein.faa.psp
  1. Run the BLAST:

    1
    blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt

    Here are the files in the directory now:
    Imgur

  2. Try to use nano to check the result in the container:
    You will see the error, bash: nano: command not found, that is normal! Because there is no nano installed in the container, even though it is installed on FARM.

  3. Leave the container:

    1
    exit

    You will see that the custom prompt is now back to the original one.

  4. Check the result in the original shell:

    1
    nano results.txt

    Imgur


Reference and helpful tutorials:

  1. Singularity official website
    Documentation: The Singularity on FARM is singularity/3.5.2 in Oct 2020.
  1. Nextflow training: Check “8.2. Singularity”. Just enough Singularity for working on FARM.
  1. BioContainers’s documentation!: I used the example in the “Getting started with Docker” section.