Runing Conda/Bioconda on UCD FARM Cluster (I): Install and setup Conda/Bioconda

What is Conda?

Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.

This introduction from the official website clearly describes 2 important features of Conda:

  1. Version management
  2. Environment control system

With these features, Conda is endowed with the ability to:

  1. Provide the user with full control of the software and its dependencies
  2. Decrease the reliance on the administration privileges, especially using the computer cluster
  3. Increase the reproducibility on different machines

Why using Conda on the FARM cluster?

Although there is a list of software that has already been set up on FARM, you may still find the one you need is not available. It could be either too new that the cluster managers do not add it to the list yet, or too old but you still want to use it to reproduce some published results. Of course, you could email the admin and request it, but sometimes they will just ask you to do it by yourself. I met this situation when I was playing with GATK in 2018: there was only GATK3.6 on FARM, but GATK4 was online for a while and had a full set of best practice workflows! However, installing software is not fun for me because there are diverse installation methods for the packages constructed in different languages, and it could be sort of troublesome to check them one by one. Also, regular users cannot use sudo (administration privileges required) on FARM and that can be quite frustrating if I want to use the package immediately. Furthermore, if there are more than one packages that I have to run in a single assay, it could be very time-consuming to go through all the documentation/readmes to settle down the pipeline.

Fortunately, Conda can solve all the abovementioned problems that could be met on the computer cluster! It normalizes the way to download the programs from different platforms, and you can skip the sudo command to finish the installation. Additionally, it provides a very helpful function to create and manage the software in an isolated environment. Those features largely decrease the efforts for maintaining the tools on FARM.
(Check the Bioconda article to see the inspiring introduction!)

Relationships between Anaconda, Miniconda, Conda, and Bioconda

There are several related things that include “conda” in their names, and it is kind of confusing for newcomers who are not familiar with this family. Here, I am covering a brief introduction of them:

  1. Anaconda: A distribution of the Python and R language. It is a huge toolkit for data science and includes a lot of packages after installation. It possesses a package management system, Conda, for the user to easily install more packages and manage them.

  2. Miniconda: The bootstrap version of Anaconda that encompasses fewer packages, but the Conda is still included!

  3. Conda: The package management system in Anaconda/Miniconda. You can set up the “channel” for downloading the software you have an interest in.

  4. Bioconda: A channel for Conda, which is specified for bioinformatics software.

    The longer introduction of channel of Conda: Here

Here is the relationship between them:
Imgur

Conda installation

What I want to use is Conda with Bioconda channel, so I decided to install Miniconda rather than Anaconda so as to save the space of my disk space on FARM. You can decide if you need Anaconda or Miniconda by checking this page.

Procedures:

Check the Miniconda you want on the website. I chose Miniconda3 Linux 64-bit, which includes Python 3.8, according to the Linux system on FARM.

Download:

1
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Install:

1
bash Miniconda3-latest-Linux-x86_64.sh

There are some questions popping out during the installation:
→ accept the license terms [yes]
→ install in [The_Directory_you_like]
→ Do you wish the installer to initialize Miniconda3 by running conda init? [yes]

Now you have the Miniconda with Conda inside. Here is a glimpse of the installation. Notice that it modified the bashrc file (see the discussion below).
Imgur

You can check the details of conda you have by:

1
conda info

Imgur

Set up for Bioconda

Bioconda is one of the channels for Conda to download packages. In addition to Bioconda, you can add more channels if you want. Some could help you improve the download speed. Here I’d like to introduce the basic one suggested on the official website. It is important to add them in this order so that the priority is set correctly (that is, conda-forge is the highest priority).

1
2
3
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Check the channels:

1
conda config --get channels

Some notes

After installation, you may notice that there is a “(base)” appearing in front of your terminal prompt. That is because Miniconda modifies the bashrc file during the installation process and add something inside.

Go to the home directory and check the bashrc file:
Imgur

You can see it asks the system to launch conda when logging in, and also it adds the bin in the miniconda directory into PATH.

You can get rid of the “(base)” in front of your terminal prompt simply by:

1
conda deactivate

Imgur

But it will appear again when you log in next time (just like it sets up in the bashrc file). For me, it is fine so I just let it stay there. However, if you feel annoying and want to make it go away permanently, modify some files as described in this post:
command line - Why does “(base)” appear in front of my terminal prompt? - Ask Ubuntu


Reference:

  1. Conda — Conda documentation

  2. News — Bioconda documentation

  3. Bioconda: sustainable and comprehensive software distribution for the life sciences | Nature Methods

  4. Miniconda — Conda documentation

  5. Installing on Linux — conda 4.9.0.post9+ff618c12 documentation