{ "cells": [ { "cell_type": "markdown", "id": "f2c37699", "metadata": {}, "source": [ "# Tutorial" ] }, { "cell_type": "code", "execution_count": 2, "id": "eecb2061", "metadata": {}, "outputs": [], "source": [ "import plotly.io as pio\n", "pio.renderers.default = \"notebook_connected\" # \"notebook\" or \"notebook_connected\" for offline use" ] }, { "cell_type": "markdown", "id": "c54f1308", "metadata": {}, "source": [ "*oddSNP* is an open-source framework for calculating SNP-Information Content (SNP-IC), a quantitative metric derived from unpooled pilot single-cell RNA sequencing (scRNA-seq) data that accurately predicts the success of genotype-based demultiplexing, as well as its counterpart for genotype-free approaches, cell-paired SNP-Information Content (cpSNP-IC).\n", "\n", "Details on these metrics and the tool's implementation are available in the manuscript entitled: *OddSNP: a predictive framework for optimizing multiplexed single-cell RNA-seq experiments*." ] }, { "cell_type": "markdown", "id": "36c873ff", "metadata": {}, "source": [ "## Installation\n", "\n", "The recommended way to install *oddSNP* is by using a virtual-environment manager such as [Conda](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html). Even when installing from source, we will create the corresponding environments prior to installation:" ] }, { "cell_type": "markdown", "id": "e7a2702c", "metadata": { "vscode": { "languageId": "shellscript" } }, "source": [ "### Using Bioconda:\n", "\n", "We create a new conda environment and directly install *oddSNP* from its bioconda source.\n", " \n", "```bash\n", ":~$ conda create --name oddsnp python=3.12\n", ":~$ conda activate oddsnp\n", "(oddsnp):~$ conda install -c bioconda oddsnp\n", "(oddsnp):~$\n", "```" ] }, { "cell_type": "markdown", "id": "c4499be1", "metadata": {}, "source": [ "### Using PyPI:\n", "\n", "Still, we recommend to install *oddSNP* inside a virtual environment. In this case, we need to make sure to also install `pip` to the created environment to avoid interfering with system libraries.\n", "\n", "```bash\n", ":~$ conda create --name oddsnp python=3.12\n", ":~$ conda activate oddsnp\n", "(oddsnp):~$ conda install pip\n", "(oddsnp):~$ pip install oddsnp\n", "```" ] }, { "cell_type": "markdown", "id": "0f36c641", "metadata": {}, "source": [ "### From source\n", "\n", "It is also possible to install *oddSNP* directly from source to a previously created environment.\n", "\n", "For this, we first clone the contents of the GitHub repository to local folder (i.e `path/to/oddsnp`), then, in order to install *oddSNP* as a command line tool in the current environment, we simply use the following:\n", "\n", "```bash\n", ":~$ conda create --name oddsnp python=3.12\n", ":~$ conda activate oddsnp\n", "(oddsnp):~$ cd path/to/oddsnp/\n", "(oddsnp):~/path/to/oddsnp$ pip install .\n", "```\n", "\n", "In this case, make sure that the `pip` version used for installation is the one associated to the environment and not a system version (i.e. `$ which pip` should point to an environment directory and not your system's pip).\n", "\n", "Notice that by running this command, you will also install to the currently activated environment all of the tool's dependencies." ] }, { "cell_type": "markdown", "id": "22f12eb2", "metadata": {}, "source": [ "### After installation\n", "\n", "An installation of *[cellsnp-lite](https://cellsnp-lite.readthedocs.io/en/latest/index.html)* is required to perform pileup calculations within oddSNP. To install it, use the following command inside your activated conda environment:\n", "\n", "```bash\n", "(oddsnp):~$ conda install -c bioconda cellsnp-lite \n", "```\n", "\n", "**NOTE** Other installation methods for `cellsnp-lite` are described in their [original website](https://cellsnp-lite.readthedocs.io/en/latest/index.html)." ] }, { "cell_type": "markdown", "id": "4c4a6ca8", "metadata": {}, "source": [ "To check the installation finished properly, we can try and run *oddSNP* from the command line without any sub-commands. The output should be as follows:\n", "\n", "```bash\n", "(oddsnp):~$ oddSNP \n", "Usage: oddSNP [OPTIONS] COMMAND [ARGS]...\n", "\n", "Options:\n", " --help Show this message and exit.\n", "\n", "Commands:\n", " cpsnpic\n", " downsample\n", " genotype\n", " snpic\n", " utils\n", "```\n", "\n", "**NOTE** The rest of this tutorial will use *oddSNP* through its available API. Notice that all functionality is also available through the corresponding combination of command-line instructions." ] }, { "cell_type": "markdown", "id": "97f2ef83", "metadata": {}, "source": [ "## Using oddSNP" ] }, { "cell_type": "markdown", "id": "e674bc81", "metadata": {}, "source": [ "### 1. Sample data preparation\n", "\n", "First, we check the current working directory" ] }, { "cell_type": "code", "execution_count": 2, "id": "2bf5ee7b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Current working directory: /mnt/storage4/rallendes/oddSNP/notebooks\n" ] } ], "source": [ "import os\n", "\n", "current_directory = os.getcwd()\n", "print(f\"Current working directory: {current_directory}\")" ] }, { "cell_type": "markdown", "id": "602a71a9", "metadata": {}, "source": [ "The minimum starting point for a *snpIC* workflow requires the following input files:\n", "- a scRNAseq BAM file (together with a list of cell barcodes), and\n", "- a SNP reference file, typically in VCF format (with or without minor Allelle frequency information)\n", "\n", "Due to file size, we have not included the sample files used for this tutorial, however, they are publicly available for download from: [1k PBMCs from a Healthy Donor (v3 chemistry) BAM from 10XGenomics](https://www.10xgenomics.com/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0); and [Call set from 1000 Genomes Project sequence against GRCh38 (SNV and indels)](https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz) respectively.\n", "\n", "The following commands will automatically download and place the files in the required folders. Alternatively, you download the files manually and place them in the *{current_directory}/sample_data/* folder.\n", "\n", "For downloading the bam file (~5GB) and its corresponding matrix file:" ] }, { "cell_type": "code", "execution_count": 5, "id": "db16086d", "metadata": {}, "outputs": [], "source": [ "import os\n", "import requests\n", "import tarfile\n", "\n", "# create a sample_data directory if it doesn't exist\n", "if not os.path.exists(f\"{current_directory}/sample_data\"):\n", " os.makedirs(f\"{current_directory}/sample_data\")\n", "\n", "# bam file\n", "url = \"https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_possorted_genome_bam.bam\"\n", "save_path = f\"{current_directory}/sample_data/pbmc_1k_v3_possorted_genome_bam.bam\"\n", "response = requests.get(url)\n", "with open(save_path, \"wb\") as f:\n", " f.write(response.content)\n", "\n", "# matrix file (we use the list of barcodes from the filtered matrix generated by Cell Ranger))\n", "url = \"https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_filtered_feature_bc_matrix.tar.gz\"\n", "save_path = f\"{current_directory}/sample_data/pbmc_1k_v3_filtered_feature_bc_matrix.tar.gz\"\n", "response = requests.get(url)\n", "with open(save_path, \"wb\") as f:\n", " f.write(response.content)\n", "\n", "# untar the matrix file\n", "with tarfile.open(save_path, 'r:gz') as file:\n", " file.extractall(f\"{current_directory}/sample_data/\", filter='data') \n", " file.close()" ] }, { "cell_type": "markdown", "id": "7da10d53", "metadata": {}, "source": [ "
| \n", " | chrom | \n", "pos | \n", "id | \n", "ref | \n", "alt | \n", "qual | \n", "filter | \n", "info | \n", "
|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "10416 | \n", ". | \n", "CCCTAA | \n", "C | \n", ". | \n", "PASS | \n", "AC=240;AN=5096;DP=365460;AF=0.05;EAS_AF=0.06;E... | \n", "
| 1 | \n", "1 | \n", "16103 | \n", ". | \n", "T | \n", "G | \n", ". | \n", "PASS | \n", "AC=118;AN=5096;DP=29994;AF=0.02;EAS_AF=0;EUR_A... | \n", "
| 2 | \n", "1 | \n", "17496 | \n", ". | \n", "AC | \n", "A | \n", ". | \n", "PASS | \n", "AC=25;AN=5096;DP=189765;AF=0;EAS_AF=0;EUR_AF=0... | \n", "
| 3 | \n", "1 | \n", "51479 | \n", ". | \n", "T | \n", "A | \n", ". | \n", "PASS | \n", "AC=531;AN=5096;DP=17461;AF=0.1;EAS_AF=0;EUR_AF... | \n", "
| 4 | \n", "1 | \n", "51898 | \n", ". | \n", "C | \n", "A | \n", ". | \n", "PASS | \n", "AC=426;AN=5096;DP=15331;AF=0.08;EAS_AF=0.05;EU... | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 78229213 | \n", "X | \n", "156029373 | \n", ". | \n", "T | \n", "C | \n", ". | \n", "PASS | \n", "AC=181;AN=5096;DP=3661;AF=0.04;EAS_AF=0.05;EUR... | \n", "
| 78229214 | \n", "X | \n", "156029383 | \n", ". | \n", "C | \n", "G | \n", ". | \n", "PASS | \n", "AC=288;AN=5096;DP=4109;AF=0.06;EAS_AF=0.05;EUR... | \n", "
| 78229215 | \n", "X | \n", "156030556 | \n", ". | \n", "A | \n", "AG | \n", ". | \n", "PASS | \n", "AC=4;AN=5096;DP=428827;AF=0;EAS_AF=0;EUR_AF=0;... | \n", "
| 78229216 | \n", "X | \n", "156030574 | \n", ". | \n", "A | \n", "AG | \n", ". | \n", "PASS | \n", "AC=5;AN=5096;DP=462796;AF=0;EAS_AF=0;EUR_AF=0;... | \n", "
| 78229217 | \n", "X | \n", "156030592 | \n", ". | \n", "A | \n", "AG | \n", ". | \n", "PASS | \n", "AC=4;AN=5096;DP=510923;AF=0;EAS_AF=0;EUR_AF=0;... | \n", "
78229218 rows × 8 columns
\n", "| \n", " | cell1 | \n", "cell2 | \n", "min_sum | \n", "
|---|---|---|---|
| 0 | \n", "TTACGTTTCTCGCTTG-1 | \n", "TTGGGCGGTCGGAAAC-1 | \n", "88.0 | \n", "
| 1 | \n", "TTACGTTTCTCGCTTG-1 | \n", "TTGGGTAGTGCTAGCC-1 | \n", "196.0 | \n", "
| 2 | \n", "TTACGTTTCTCGCTTG-1 | \n", "TTGGGTATCACCGACG-1 | \n", "78.0 | \n", "
| 3 | \n", "TTACGTTTCTCGCTTG-1 | \n", "TTGGTTTCACTGGATT-1 | \n", "181.0 | \n", "
| 4 | \n", "TTACGTTTCTCGCTTG-1 | \n", "TTGTGGATCTAAGAAG-1 | \n", "186.0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 2522 | \n", "TTTGATCTCTTTGGAG-1 | \n", "TTTGGTTGTAGAATAC-1 | \n", "140.0 | \n", "
| 2523 | \n", "TTTGATCTCTTTGGAG-1 | \n", "TTTGTTGCAATTAGGA-1 | \n", "91.0 | \n", "
| 2524 | \n", "TTTGGTTAGTAACCTC-1 | \n", "TTTGGTTGTAGAATAC-1 | \n", "97.0 | \n", "
| 2525 | \n", "TTTGGTTAGTAACCTC-1 | \n", "TTTGTTGCAATTAGGA-1 | \n", "79.0 | \n", "
| 2526 | \n", "TTTGGTTGTAGAATAC-1 | \n", "TTTGTTGCAATTAGGA-1 | \n", "158.0 | \n", "
2527 rows × 3 columns
\n", "