downsampling

This module contains functions for downsampling the input data, which can be useful for testing and benchmarking purposes. It exposes both a Click CLI group and Python-callable wrappers.

CLI Commands

oddSNP downsample

Usage

oddSNP downsample [OPTIONS] COMMAND [ARGS]...

barcode-downsampling

Downsample BAM file by extracting selected barcodes. Given a list of cell barcodes, filter a BAM file in order to only include reads associated to those barcodes.

Arguments:

BAM: Path to the BAM file to filter.

BARCODES: A file with the list of cell barcodes to include.

OUPATH: The path to the directory where to store the filtered BAM file.

Usage

oddSNP downsample barcode-downsampling [OPTIONS] BAM BARCODES OUPATH

Options

--celltag <celltag>

Tag used inside the bam file for cell barcodes

--nproc <nproc>

Number of processes to use

--force

Override previous results

Arguments

BAM

Required argument

BARCODES

Required argument

OUPATH

Required argument

reads-downsampling

Downsample BAM file by extracting a given percentage of reads. It saves both the filtered BAM file and a TSV file with the cell barcodes to the output directory.

Arguments:

BAM: The input BAM file

READS: The fraction of reads to keep (between 0 and 1)

OUTDIR: The path to the directory where to store the filtered BAM file

Usage

oddSNP downsample reads-downsampling [OPTIONS] BAM READS OUPATH

Options

--celltag <celltag>

Tag used inside the bam file for cell barcodes

--seed <seed>

A seed to make the selection of reads deterministic

--nproc <nproc>

Number of processes to use

--force

Override previous results

Arguments

BAM

Required argument

READS

Required argument

OUPATH

Required argument

vcf-downsampling

Filter a reference VCF file to only include variants in given regions.

Arguments:

VCF: Path to the VCF file to filter.

REGIONS: A file with the list of regions to include.

OUT: The path to the directory where to store the filtered VCF file.

Usage

oddSNP downsample vcf-downsampling [OPTIONS] VCF REGIONS OUT

Options

--nproc <nproc>

Number of processes to use

--force

Override previous results

Arguments

VCF

Required argument

REGIONS

Required argument

OUT

Required argument

Python API

oddSNP.downsampling.call_barcode_downsampling(bam, barcodes, oupath, celltag, nproc, force)[source]

Python wrapper for barcode_downsampling().

Parameters:
  • bam – Path to the BAM file to filter.

  • barcodes – A file with the list of cell barcodes to include.

  • oupath – The path to the directory where to store the filtered BAM file.

  • celltag – The tag used inside the bam file for cell barcodes

  • nproc – Number of processes to use

  • force – If True override previous results

oddSNP.downsampling.call_reads_downsampling(bam, reads, oupath, celltag, seed, nproc, force)[source]

Python wrapper for reads_downsampling().

Parameters:
  • bam – The input BAM file

  • reads – The percentage of reads to keep (between 0 and 1)

  • oupath – The path to the directory where to store the filtered BAM file

  • celltag – The tag used inside the bam file for cell barcodes

  • seed – A seed to make the selection of reads deterministic

  • nproc – Number of processes to use

  • force – Override previous results

Returns:

The path to the filtered BAM file

oddSNP.downsampling.call_vcf_downsampling(vcf, regions, out, nproc, force)[source]

Python wrapper for vcf_downsampling().

Parameters:
  • vcf – Path to the VCF file to filter.

  • regions – A file with the list of regions to include.

  • out – The path to the directory where to store the filtered VCF file.

  • nproc – Number of processes to use

  • force – If True override previous results