hpc:applications_and_libraries
Differences
This shows you the differences between two versions of the page.
hpc:applications_and_libraries [2025/01/15 09:39] – [FOSS toolchain] Yann Sagon | hpc:applications_and_libraries [2025/06/11 12:27] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 313: | Line 313: | ||
===== Conda ===== | ===== Conda ===== | ||
+ | ==== How to Create a Conda Environment in a Container ===== | ||
- | Use it | ||
- | < | + | Using **Conda** directly on HPC systems or shared servers can cause performance issues and storage overload because Conda environments create thousands of small files. This often results in: |
- | module load Anaconda3 | + | |
+ | * Slow job startup times | ||
+ | * Filesystem limitations being hit | ||
+ | * High I/O load on the cluster | ||
+ | * Complex environment management | ||
+ | |||
+ | A better solution is to **encapsulate Conda environments inside a container**. This way, the entire environment is packaged into a single file (such as a `.sif` image used by Apptainer/ | ||
+ | |||
+ | |||
+ | === Benefits === | ||
+ | Using this method offers multiple advantages: | ||
+ | - ✅ **Fewer files**: Your environment is stored in a single `.sif` file | ||
+ | - ✅ **Portability**: | ||
+ | - ✅ **Reproducibility**: | ||
+ | - ✅ **Isolation**: | ||
+ | - ✅ **Stability**: | ||
+ | |||
+ | === Limitations === | ||
+ | - ⚠️ The container is static; to update packages, you need to rebuild the image | ||
+ | |||
+ | |||
+ | This guide explains how to build such a container using [[https:// | ||
+ | |||
+ | |||
+ | === Step 1 – Define the Conda Environment === | ||
+ | Create a file '' | ||
+ | (As exemple we will use '' | ||
+ | |||
+ | < | ||
+ | name: bioenv | ||
+ | channels: | ||
+ | - bioconda | ||
+ | - conda-forge | ||
+ | - defaults | ||
+ | dependencies: | ||
+ | - blast=2.16.0 | ||
+ | - diamond=2.1.11 | ||
+ | - exonerate=2.4.0 | ||
+ | - spades=4.1.0 | ||
+ | - mafft=7.525 | ||
+ | - trimal=1.5.0 | ||
+ | - numpy | ||
+ | - joblib | ||
+ | - scipy | ||
+ | [...] | ||
+ | |||
+ | prefix:/ | ||
</ | </ | ||
+ | |||
+ | You can generate this file using the following commands: | ||
+ | |||
+ | <code bash> | ||
+ | # 1. (optional) create your environment (or not if you already have one) | ||
+ | $ conda create -n bioenv -c bioconda -c conda-forge spades exonerate diamond blast mafft trimal numpy joblib scipy -y | ||
+ | |||
+ | # 2. Activate your environment | ||
+ | $ conda activate bioenv | ||
+ | |||
+ | # 3. Export the settings of your environment | ||
+ | # It’s recommended to manually remove the `prefix:` line at the bottom of the file before using it with cotainr. | ||
+ | $ conda env export > bioenv.yml | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | === Step 2 – Build the Container === | ||
+ | Now use '' | ||
+ | |||
+ | <code bash> | ||
+ | $ module load GCCcore/ | ||
+ | # Ex: cotainr build < | ||
+ | $ cotainr build bioenv.sif --base-image=docker:// | ||
+ | </ | ||
+ | |||
+ | You can replace '' | ||
+ | |||
+ | === Step 3 – Use the Container === | ||
+ | You can now run commands inside the container as follows: | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | $ apptainer exec bioenv.sif python3 -c " | ||
+ | </ | ||
+ | |||
+ | Or launch any program inside the container just like you would in a normal environment. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
==== Conda environment management ==== | ==== Conda environment management ==== | ||
+ | |||
+ | Use it | ||
+ | |||
+ | < | ||
+ | module load Anaconda3 | ||
+ | </ | ||
Create | Create | ||
Line 1056: | Line 1150: | ||
With the Baobab upgrade to CentOS 7 (cf. https:// | With the Baobab upgrade to CentOS 7 (cf. https:// | ||
- | Instead, | + | Instead, |
- | + | ||
- | - install it in your '' | + | |
- | capello@login2: | + | |
- | capello@login2: | + | |
- | capello@login2: | + | |
- | [...] | + | |
- | capello@login2: | + | |
- | [...] | + | |
- | capello@login2: | + | |
- | </ | + | |
- | - launch an interactive graphical job: | + | |
- | - connect to the cluster using [[hpc:access_the_hpc_clusters# | + | |
- | - start an interactive session on a node (see [[hpc/ | + | |
- | capello@login2: | + | |
- | salloc: Pending job allocation 39085914 | + | |
- | salloc: job 39085914 queued and waiting for resources | + | |
- | salloc: job 39085914 has been allocated resources | + | |
- | salloc: Granted job allocation 39085914 | + | |
- | capello@node001: | + | |
- | </ | + | |
- | - load one of the R version supported by RStudio, for example:< | + | |
- | capello@node001: | + | |
- | + | ||
- | ---------------------------------------------------------------------------------- | + | |
- | R: R/3.6.0 | + | |
- | ---------------------------------------------------------------------------------- | + | |
- | Description: | + | |
- | R is a free software environment for statistical computing and | + | |
- | graphics. | + | |
- | + | ||
- | + | ||
- | You will need to load all module(s) on any one of the lines below | + | |
- | before the " | + | |
- | + | ||
- | GCC/ | + | |
- | [...] | + | |
- | capello@node001: | + | |
- | capello@node001: | + | |
- | capello@node001: | + | |
- | capello@node001: | + | |
- | </ | + | |
- | - run RStudio : <code console> | + | |
- | capello@node001: | + | |
- | </ | + | |
- | + | ||
- | <note important> | + | |
- | < | + | |
- | module load PostgreSQL/ | + | |
- | </ | + | |
- | </ | + | |
==== R packages ==== | ==== R packages ==== | ||
hpc/applications_and_libraries.1736933987.txt.gz · Last modified: (external edit)