Conda normalizes the installation procedures of a wide variety of programs and provides 2 ways for installation:
(a) installing packages manually by conda install
(b) installing through a YAML file.
In this post, I am covering the first method.
What is the Conda environment and why using it?
Before diving into how to use Conda to install packages, let’s talk about some aspects of the “environment”.
As mentioned in the previous post, Conda has two features: (1) a package manager for program installing and version control and (2) an environment manager. Environment management is especially important for developers because they have to be confident that their applications can run smoothly in different environments (e.g. different dependencies, software versions, or operating systems) used by their cohorts or clients. So, you might ask that: does the environment really matter to me if I am just a user of packages?
Well, a good concept of environment will help you avoid some undesirable pitfalls when using Conda, and Conda environments will become a great tool to increase the reproducibility of your assay. Thus, first, let’s take a look at how Conda acts as an environment manager.
After the installation is completed, the installer, either Anaconda or Miniconda, will create a “base” environment, which contains a list of packages. All the packages are ready to use in the base
environment, and it is quite handy most of the time. However, if you are going to add more gears in Conda, especially very old or novel tools, the setting of the base
environment may not fit the requirements of those tools. Conda provides a simple way to create an isolate environment and you can run it without affecting other environments. You can imagine that the base
environment is the bench in your lab and you can do most of the examines on it, but for some special analyses, say, radioactive experiments, you have to go to a specific room with particular types of equipment.
Even for the bioinformatic programs that you think are regularly-used and up-to-date, it is still not a good idea to put all of them in the base
environment because the dependency conflict can easily undermine reproducibility. For example, the newly installed software may update some of the existing packages in your base
environment, and turns out this change tweaks the setting of the existing programs so that you cannot reproduce the results you ran a month ago anymore. It would be really painful if you have to figure this out when your PI asks you what causes the difference…Besides, the official document of Conda mentions that the packages with similar filenames and serve similar purposes may cause some problem if they are installed in the same environment. Therefore, create isolated environments for your bioinformatic tools! This also provides an additional benefit: you can export the environment of a specific pipeline for your cohort so that he/she can run the pipeline without worrying about the setting. This will save both you and your cohort’s lives.
Here is a very good article for whom is interested in more details of Conda environment management. The author used very vivid comparisons to explain how Conda works.
Get your tools manually by conda install
Goal: Create a new environment called “aligners”, and install Hisat2 and STAR, two aligners used in my project, in the aligners
environment.
1. Create an environment for your task: conda create
As I mentioned in the previous section, remember to create an isolated environment for the packages installed by yourself!
1 | conda create -n aligners |
-n: environment
Conda will tell you the localization of the environment, and it’s an interactive process, so remember to select [yes]. Also, you can see that at the end of the installation, the system shows how to activate/deactivate the environment.
2. Search for the packages/Check the version of the target packages: conda search
1 | conda search hisat2 |
There are lots of versions of Hisat2 available, and they are ordered from the oldest (at the top) to the newest (at the bottom).
Let’s use a simple bash command to check the latest version:
1 | conda search hisat2 | tail -n 1 |
3. Install packages: conda install
1 | conda install -n aligners hisat2 star |
-n: environment; Note, if you didn’t set this up, Conda will install the package in the currently activate environment (
base
environment if you do not activate any).
At the beginning of the installation process, Conda will let you know which packages will be downloaded/installed and where they are going to be installed. As you can see, in addition to STAR and Hisat2, lots of dependencies were installed. Besides, you can find which channel was used for downloading the packages. For example, STAR and Hisat2 were from Bioconda, and others were from conda-forge.
4. Check the environment: conda list
1 | conda list -n aligners |
-n: environment
Again, you can see the detailed information of each package. By the way, if you do not assign the environment with -n
, the packages in the currently active environment will be shown.
5. Activate the environment and use the packages: conda activate
1 | conda activate aligners |
Here is a small test: I called Hisat2 before and after activating the aligners
environment.
When I was in the base
environment, the system told me that command hisat2
was not found. In contrast, after activating the aligners
environment, key in hisat2
returned the corresponding help page, indicating the computer found the program. Note that the terminal prompt changed upon the activation of the environment.
6. Leave the environment
1 | conda deactivate |
7. List all environments you have:
1 | conda env list |
This command will list all the environments and their directories. The activate environment is marked with a star sign (*).
Set up at once
Here is the help page of conda create
, and I labeled some of the commonly used commands.
1 | conda create --help |
(red boxes)
-y: yes to all the interactive installation procedures
-c: assign the channel for downloading the package
-n: environment
Note, there is an option for installing packages (the green box). Therefore, you can create an environment and install all the package at once without answering [yes] during the process:
1 | conda create -n aligners2 hisat2 star -y |
Alternatively, you can do similar things using conda install
if the environment has already existed:
1 | conda create -n aligners3 -y |
Playing around with the aligners
examples
Goal: Test if the environments are independent of each other.
Logic flow: (a) install the different version of STAR in the base
environment, and then (b) call STAR
when activating different environments and check the versions.
1. Check the released version of STAR and install the old version in base
environment:
1 | conda search star |
I chose the previous version: 2.6.1a (red box).
1 | conda install star=2.6.1a |
2. Examine if the environments are isolated
Here is the version of STAR in the base
environment.
Here is the version of STAR in the aligners
environment.
As you can see, STAR with different versions were launched in different environments.
Tip
If possible, install all packages at once because Conda can then take care of all dependencies at the same time.
Always use
conda install
in your environment if possible rather by the classic installation method.For more information on the package, you can check the package Index on the Bioconda website. It also provides Docker image.
Reference
Why You Need Python Environments and How to Manage Them with Conda