downsampling¶
This module contains functions for downsampling the input data, which can be useful for testing and benchmarking purposes. It exposes both a Click CLI group and Python-callable wrappers.
CLI Commands¶
oddSNP downsample¶
Usage
oddSNP downsample [OPTIONS] COMMAND [ARGS]...
barcode-downsampling¶
Downsample BAM file by extracting selected barcodes. Given a list of cell barcodes, filter a BAM file in order to only include reads associated to those barcodes.
Arguments:
BAM: Path to the BAM file to filter.
BARCODES: A file with the list of cell barcodes to include.
OUPATH: The path to the directory where to store the filtered BAM file.
Usage
oddSNP downsample barcode-downsampling [OPTIONS] BAM BARCODES OUPATH
Options
- --celltag <celltag>¶
Tag used inside the bam file for cell barcodes
- --nproc <nproc>¶
Number of processes to use
- --force¶
Override previous results
Arguments
- BAM¶
Required argument
- BARCODES¶
Required argument
- OUPATH¶
Required argument
reads-downsampling¶
Downsample BAM file by extracting a given percentage of reads. It saves both the filtered BAM file and a TSV file with the cell barcodes to the output directory.
Arguments:
BAM: The input BAM file
READS: The fraction of reads to keep (between 0 and 1)
OUTDIR: The path to the directory where to store the filtered BAM file
Usage
oddSNP downsample reads-downsampling [OPTIONS] BAM READS OUPATH
Options
- --celltag <celltag>¶
Tag used inside the bam file for cell barcodes
- --seed <seed>¶
A seed to make the selection of reads deterministic
- --nproc <nproc>¶
Number of processes to use
- --force¶
Override previous results
Arguments
- BAM¶
Required argument
- READS¶
Required argument
- OUPATH¶
Required argument
vcf-downsampling¶
Filter a reference VCF file to only include variants in given regions.
- Arguments:
VCF: Path to the VCF file to filter.
REGIONS: A file with the list of regions to include.
OUT: The path to the directory where to store the filtered VCF file.
Usage
oddSNP downsample vcf-downsampling [OPTIONS] VCF REGIONS OUT
Options
- --nproc <nproc>¶
Number of processes to use
- --force¶
Override previous results
Arguments
- VCF¶
Required argument
- REGIONS¶
Required argument
- OUT¶
Required argument
Python API¶
- oddSNP.downsampling.call_barcode_downsampling(bam, barcodes, oupath, celltag, nproc, force)[source]¶
Python wrapper for
barcode_downsampling().- Parameters:
bam – Path to the BAM file to filter.
barcodes – A file with the list of cell barcodes to include.
oupath – The path to the directory where to store the filtered BAM file.
celltag – The tag used inside the bam file for cell barcodes
nproc – Number of processes to use
force – If True override previous results
- oddSNP.downsampling.call_reads_downsampling(bam, reads, oupath, celltag, seed, nproc, force)[source]¶
Python wrapper for
reads_downsampling().- Parameters:
bam – The input BAM file
reads – The percentage of reads to keep (between 0 and 1)
oupath – The path to the directory where to store the filtered BAM file
celltag – The tag used inside the bam file for cell barcodes
seed – A seed to make the selection of reads deterministic
nproc – Number of processes to use
force – Override previous results
- Returns:
The path to the filtered BAM file
- oddSNP.downsampling.call_vcf_downsampling(vcf, regions, out, nproc, force)[source]¶
Python wrapper for
vcf_downsampling().- Parameters:
vcf – Path to the VCF file to filter.
regions – A file with the list of regions to include.
out – The path to the directory where to store the filtered VCF file.
nproc – Number of processes to use
force – If True override previous results