Running Conda/Bioconda on UCD FARM Cluster (II): Installing Packages Manually by `conda install`

Conda normalizes the installation procedures of a wide variety of programs and provides 2 ways for installation:
(a) installing packages manually by conda install
(b) installing through a YAML file.

In this post, I am covering the first method.

What is the Conda environment and why using it?

Before diving into how to use Conda to install packages, let’s talk about some aspects of the “environment”.

As mentioned in the previous post, Conda has two features: (1) a package manager for program installing and version control and (2) an environment manager. Environment management is especially important for developers because they have to be confident that their applications can run smoothly in different environments (e.g. different dependencies, software versions, or operating systems) used by their cohorts or clients. So, you might ask that: does the environment really matter to me if I am just a user of packages?

Well, a good concept of environment will help you avoid some undesirable pitfalls when using Conda, and Conda environments will become a great tool to increase the reproducibility of your assay. Thus, first, let’s take a look at how Conda acts as an environment manager.

After the installation is completed, the installer, either Anaconda or Miniconda, will create a “base” environment, which contains a list of packages. All the packages are ready to use in the base environment, and it is quite handy most of the time. However, if you are going to add more gears in Conda, especially very old or novel tools, the setting of the base environment may not fit the requirements of those tools. Conda provides a simple way to create an isolate environment and you can run it without affecting other environments. You can imagine that the base environment is the bench in your lab and you can do most of the examines on it, but for some special analyses, say, radioactive experiments, you have to go to a specific room with particular types of equipment.

Even for the bioinformatic programs that you think are regularly-used and up-to-date, it is still not a good idea to put all of them in the base environment because the dependency conflict can easily undermine reproducibility. For example, the newly installed software may update some of the existing packages in your base environment, and turns out this change tweaks the setting of the existing programs so that you cannot reproduce the results you ran a month ago anymore. It would be really painful if you have to figure this out when your PI asks you what causes the difference…Besides, the official document of Conda mentions that the packages with similar filenames and serve similar purposes may cause some problem if they are installed in the same environment. Therefore, create isolated environments for your bioinformatic tools! This also provides an additional benefit: you can export the environment of a specific pipeline for your cohort so that he/she can run the pipeline without worrying about the setting. This will save both you and your cohort’s lives.

Here is a very good article for whom is interested in more details of Conda environment management. The author used very vivid comparisons to explain how Conda works.

Get your tools manually by conda install

Goal: Create a new environment called “aligners”, and install Hisat2 and STAR, two aligners used in my project, in the aligners environment.

1. Create an environment for your task: conda create

As I mentioned in the previous section, remember to create an isolated environment for the packages installed by yourself!

1
conda create -n aligners

-n: environment

Imgur

Conda will tell you the localization of the environment, and it’s an interactive process, so remember to select [yes]. Also, you can see that at the end of the installation, the system shows how to activate/deactivate the environment.

1
conda search hisat2

Imgur

There are lots of versions of Hisat2 available, and they are ordered from the oldest (at the top) to the newest (at the bottom).
Let’s use a simple bash command to check the latest version:

1
conda search hisat2 | tail -n 1

Imgur

3. Install packages: conda install

1
conda install -n aligners hisat2 star

-n: environment; Note, if you didn’t set this up, Conda will install the package in the currently activate environment (base environment if you do not activate any).

Imgur

At the beginning of the installation process, Conda will let you know which packages will be downloaded/installed and where they are going to be installed. As you can see, in addition to STAR and Hisat2, lots of dependencies were installed. Besides, you can find which channel was used for downloading the packages. For example, STAR and Hisat2 were from Bioconda, and others were from conda-forge.

4. Check the environment: conda list

1
conda list -n aligners

-n: environment

Imgur

Again, you can see the detailed information of each package. By the way, if you do not assign the environment with -n, the packages in the currently active environment will be shown.

5. Activate the environment and use the packages: conda activate

1
conda activate aligners

Here is a small test: I called Hisat2 before and after activating the alignersenvironment.
Imgur
When I was in the base environment, the system told me that command hisat2 was not found. In contrast, after activating the aligners environment, key in hisat2 returned the corresponding help page, indicating the computer found the program. Note that the terminal prompt changed upon the activation of the environment.

6. Leave the environment

1
conda deactivate

7. List all environments you have:

1
conda env list

Imgur
This command will list all the environments and their directories. The activate environment is marked with a star sign (*).

Set up at once

Here is the help page of conda create, and I labeled some of the commonly used commands.

1
conda create --help

Imgur

(red boxes)
-y: yes to all the interactive installation procedures
-c: assign the channel for downloading the package
-n: environment

Note, there is an option for installing packages (the green box). Therefore, you can create an environment and install all the package at once without answering [yes] during the process:

1
conda create -n aligners2 hisat2 star -y

Alternatively, you can do similar things using conda install if the environment has already existed:

1
2
conda create -n aligners3 -y
conda install -n aligners3 hisat2 star -y

Playing around with the aligners examples

Goal: Test if the environments are independent of each other.

Logic flow: (a) install the different version of STAR in the base environment, and then (b) call STAR when activating different environments and check the versions.

1. Check the released version of STAR and install the old version in base environment:

1
conda search star

Imgur
I chose the previous version: 2.6.1a (red box).

1
conda install star=2.6.1a

2. Examine if the environments are isolated

Here is the version of STAR in the base environment.
Imgur

Here is the version of STAR in the aligners environment.
Imgur

As you can see, STAR with different versions were launched in different environments.

Tip

  1. If possible, install all packages at once because Conda can then take care of all dependencies at the same time.

  2. Always use conda install in your environment if possible rather by the classic installation method.

  3. For more information on the package, you can check the package Index on the Bioconda website. It also provides Docker image.

Reference

Why You Need Python Environments and How to Manage Them with Conda

Conda — conda 4.9.0.post12+a9f5f25d documentation

GCB 2020 Tutorial — Bioconda documentation