Differences

This shows you the differences between two versions of the page.

--- hpc:applications_and_libraries [2025/01/15 09:39] – [FOSS toolchain] Yann Sagon
+++ hpc:applications_and_libraries [2025/06/11 12:27] (current) – external edit 127.0.0.1
@@ Line 313: / Line 313: @@
 ===== Conda =====
+==== How to Create a Conda Environment in a Container =====
-Use it
-<code>
+Using **Conda** directly on HPC systems or shared servers can cause performance issues and storage overload because Conda environments create thousands of small files. This often results in:
-module load Anaconda3
+  * Slow job startup times
+  * Filesystem limitations being hit
+  * High I/O load on the cluster
+  * Complex environment management
+A better solution is to **encapsulate Conda environments inside a container**. This way, the entire environment is packaged into a single file (such as a `.sif` image used by Apptainer/Singularity), which can be easily deployed, shared, and reused without polluting clusters.
+=== Benefits ===
+Using this method offers multiple advantages:
+  - ✅ **Fewer files**: Your environment is stored in a single `.sif` file
+  - ✅ **Portability**: Share the container easily with collaborators
+  - ✅ **Reproducibility**: Your environment stays consistent across systems
+  - ✅ **Isolation**: No risk of interfering with other environment
+  - ✅ **Stability**: Cluster updates will not break your environment
+=== Limitations ===
+  - ⚠️ The container is static; to update packages, you need to rebuild the image
+This guide explains how to build such a container using [[https://hpc-community.unige.ch/t/new-software-installed-cotainr-version-2025-3-0/3917|cotainr]], a tool that simplifies container creation.
+=== Step 1 – Define the Conda Environment ===
+Create a file ''env.yml'' that contains the definition of your environment:
+(As exemple we will use ''bioenv.yml'')
+<code bash>
+name: bioenv
+channels:
+  - bioconda
+  - conda-forge
+  - defaults
+dependencies:
+  - blast=2.16.0
+  - diamond=2.1.11
+  - exonerate=2.4.0
+  - spades=4.1.0
+  - mafft=7.525
+  - trimal=1.5.0
+  - numpy
+  - joblib
+  - scipy
+[...]
+prefix:/home/users/a/alberta/env     # <==== To delete/edit
 </code>
+You can generate this file using the following commands:
+<code bash>
+# 1. (optional) create your environment (or not if you already have one)
+$ conda create -n bioenv -c bioconda -c conda-forge spades exonerate diamond blast mafft trimal numpy joblib scipy -y
+# 2. Activate your environment
+$ conda activate bioenv
+# 3. Export the settings of your environment
+# It’s recommended to manually remove the `prefix:` line at the bottom of the file before using it with cotainr.
+$ conda env export > bioenv.yml
+</code>
+=== Step 2 – Build the Container ===
+Now use ''cotainr'' to create the image:
+<code bash>
+$ module load GCCcore/13.3.0 cotainr
+# Ex: cotainr build <env.sif> --base-image=docker://<WhatEverYouWant>:latest --accept-licenses --conda-env=<env.yml>
+$ cotainr build bioenv.sif --base-image=docker://ubuntu:latest --accept-licenses --conda-env=bioenv.yml
+</code>
+You can replace ''ubuntu:latest'' with any other base image, such as ''rockylinux:latest''.
+=== Step 3 – Use the Container ===
+You can now run commands inside the container as follows:
+<code bash>
+$ apptainer exec bioenv.sif python3 -c "import numpy; print(numpy.__version__)"
+</code>
+Or launch any program inside the container just like you would in a normal environment.
 ==== Conda environment management ====
+Use it
+<code>
+module load Anaconda3
+</code>
 Create
@@ Line 1056: / Line 1150: @@
 With the Baobab upgrade to CentOS 7 (cf. https://hpc-community.unige.ch/t/baobab-migration-from-centos6-to-centos7/361 ) we do not provide anymore a central RStudio.
-Instead, you can download the upstream Open Source binary RStudio Desktop version (cf. https://rstudio.com/products/rstudio/download/ ) and directly use it, here the instructions:
+Instead, we provide Rstudio on [[hpc:how_to_use_openondemand|OpenOnDemand]].
-  - install it in your ''${HOME}'' folder: <code console>
-capello@login2:~$ mkdir Downloads
-capello@login2:~$ cd Downloads
-capello@login2:~/Downloads$ wget ${URL_FOR_rstudio-${VERSION}-x86_64-fedora.tar.gz}
-[...]
-capello@login2:~/Downloads$ tar axvf rstudio-${VERSION}-x86_64-fedora.tar.gz
-[...]
-capello@login2:~/Downloads$
-</code>
-  - launch an interactive graphical job:
-    - connect to the cluster using [[hpc:access_the_hpc_clusters#gui_accessdesktop_with_x2go|GUI access / Desktop with X2Go]] or using ''ssh -Y'' from a machine with an X server such as [[hpc:access_the_hpc_clusters#from_linux_and_mac_os|Linux or Mac]]. \\
-    - start an interactive session on a node (see [[hpc/slurm#interactive_jobs|Interactive Slurm jobs]]): <code console>
-capello@login2:~$ salloc -p debug-cpu -n 1 -c 16 --x11
-salloc: Pending job allocation 39085914
-salloc: job 39085914 queued and waiting for resources
-salloc: job 39085914 has been allocated resources
-salloc: Granted job allocation 39085914
-capello@node001:~$
-</code> Doing so, you will have 16 cores on one node of the partition ''debug-cpu'' for a max time of 15 minutes. Specify the appropriate duration time, partition, etc. like you would do for a normal job.
-    - load one of the R version supported by RStudio, for example:<code console>
-capello@node001:~$ module spider R/3.6.0
-----------------------------------------------------------------------------------
-  R: R/3.6.0
-----------------------------------------------------------------------------------
-    Description:
-      R is a free software environment for statistical computing and
-      graphics.
-    You will need to load all module(s) on any one of the lines below
-    before the "R/3.6.0" module is available to load.
-      GCC/8.2.0-2.31.1  OpenMPI/3.1.3
-[...]
-capello@node001:~$ module load GCC/8.2.0-2.31.1  OpenMPI/3.1.3
-capello@node001:~$ module load PostgreSQL/11.3-Python-3.7.2
-capello@node001:~$ module load R/3.6.0
-capello@node001:~$
-</code>
-    - run RStudio : <code console>
-capello@node001:~$ ~/Downloads/rstudio-${VERSION}/bin/rstudio
-</code>
-<note important>Latest version of Rstudio needs an aditional dependency loaded
-<code>
-module load PostgreSQL/11.3-Python-3.7.2
-</code>
-</note>
 ==== R packages ====