Conda normalizes the installation procedures of a wide variety of programs and provides 2 ways for installation:
(a) installing packages manually by conda install
(b) installing through a YAML file.
In this post, I am covering the first method.
What is the Conda environment and why using it?
Before diving into how to use Conda to install packages, let’s talk about some aspects of the “environment”.
As mentioned in the previous post, Conda has two features: (1) a package manager for program installing and version control and (2) an environment manager. Environment management is especially important for developers because they have to be confident that their applications can run smoothly in different environments (e.g. different dependencies, software versions, or operating systems) used by their cohorts or clients. So, you might ask that: does the environment really matter to me if I am just a user of packages?
Well, a good concept of environment will help you avoid some undesirable pitfalls when using Conda, and Conda environments will become a great tool to increase the reproducibility of your assay. Thus, first, let’s take a look at how Conda acts as an environment manager.
After the installation is completed, the installer, either Anaconda or Miniconda, will create a “base” environment, which contains a list of packages. All the packages are ready to use in the base
environment, and it is quite handy most of the time. However, if you are going to add more gears in Conda, especially very old or novel tools, the setting of the base
environment may not fit the requirements of those tools. Conda provides a simple way to create an isolate environment and you can run it without affecting other environments. You can imagine that the base
environment is the bench in your lab and you can do most of the examines on it, but for some special analyses, say, radioactive experiments, you have to go to a specific room with particular types of equipment.
Even for the bioinformatic programs that you think are regularly-used and up-to-date, it is still not a good idea to put all of them in the base
environment because the dependency conflict can easily undermine reproducibility. For example, the newly installed software may update some of the existing packages in your base
environment, and turns out this change tweaks the setting of the existing programs so that you cannot reproduce the results you ran a month ago anymore. It would be really painful if you have to figure this out when your PI asks you what causes the difference…Besides, the official document of Conda mentions that the packages with similar filenames and serve similar purposes may cause some problem if they are installed in the same environment. Therefore, create isolated environments for your bioinformatic tools! This also provides an additional benefit: you can export the environment of a specific pipeline for your cohort so that he/she can run the pipeline without worrying about the setting. This will save both you and your cohort’s lives.
Here is a very good article for whom is interested in more details of Conda environment management. The author used very vivid comparisons to explain how Conda works.
Get your tools manually by conda install
Goal: Create a new environment called “aligners”, and install Hisat2 and STAR, two aligners used in my project, in the aligners
environment.