Package 'hahmmr' reference manual

Title:	Haplotype-Aware Hidden Markov Model for RNA
Description:	Haplotype-aware Hidden Markov Model for RNA (HaHMMR) is a method for detecting copy number variations (CNVs) from bulk RNA-seq data. Additional examples, documentations, and details on the method are available at <https://github.com/kharchenkolab/hahmmr/>.
Authors:	Teng Gao [aut, cre] , Evan Biederstedt [aut], Peter Kharchenko [aut]
Maintainer:	Teng Gao <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0
Built:	2025-02-22 04:55:43 UTC
Source:	https://github.com/cran/hahmmr

centromere regions (hg19)

Description

centromere regions (hg19)

Usage

acen_hg19
acen_hg19

Format

An object of class tbl_df (inherits from tbl, data.frame) with 22 rows and 3 columns.

centromere regions (hg38)

Description

centromere regions (hg38)

Usage

acen_hg38
acen_hg38

Format

An object of class tbl_df (inherits from tbl, data.frame) with 22 rows and 3 columns.

Analyze allele profile

Description

Analyze allele profile

Usage

analyze_allele(
  bulk,
  t = 1e-05,
  theta_min = 0.08,
  gamma = 20,
  nu = 0.5,
  r = 0.015,
  hmm = "S5",
  fit_theta = FALSE,
  fit_gamma = FALSE,
  theta_start = 0.05,
  verbose = TRUE
)
analyze_allele(
  bulk,
  t = 1e-05,
  theta_min = 0.08,
  gamma = 20,
  nu = 0.5,
  r = 0.015,
  hmm = "S5",
  fit_theta = FALSE,
  fit_gamma = FALSE,
  theta_start = 0.05,
  verbose = TRUE
)

Arguments

`bulk`	dataframe Bulk allele profile
`t`	numeric Transition probability
`theta_min`	numeric Minimum allele fraction
`gamma`	numeric Overdispersion parameter
`nu`	numeric Phase switch rate
`r`	numeric Alternative allele count bias
`hmm`	character HMM model to use (S3 or S5)
`fit_theta`	logical Whether to fit theta_min
`fit_gamma`	logical Whether to fit gamma
`theta_start`	numeric Starting value for theta_min
`verbose`	logical Whether to print progress

Value

dataframe Bulk allele profile with CNV states

Examples

bulk_example = analyze_allele(bulk_example, hmm = 'S5')
bulk_example = analyze_allele(bulk_example, hmm = 'S5')

Analyze allele and expression profile

Description

Analyze allele and expression profile

Usage

analyze_joint(
  bulk,
  t = 1e-05,
  gamma = 20,
  theta_min = 0.08,
  logphi_min = 0.25,
  hmm = "S15",
  nu = 1,
  min_genes = 10,
  r = 0.015,
  theta_start = 0.05,
  exclude_neu = TRUE,
  fit_gamma = FALSE,
  fit_theta = FALSE,
  verbose = TRUE
)
analyze_joint(
  bulk,
  t = 1e-05,
  gamma = 20,
  theta_min = 0.08,
  logphi_min = 0.25,
  hmm = "S15",
  nu = 1,
  min_genes = 10,
  r = 0.015,
  theta_start = 0.05,
  exclude_neu = TRUE,
  fit_gamma = FALSE,
  fit_theta = FALSE,
  verbose = TRUE
)

Arguments

`bulk`	dataframe Bulk allele and expression profile
`t`	numeric Transition probability
`gamma`	numeric Overdispersion parameter
`theta_min`	numeric Minimum allele fraction
`logphi_min`	numeric Minimum log2 fold change
`hmm`	character HMM model to use (S7 or S15)
`nu`	numeric Phase switch rate
`min_genes`	integer Minimum number of genes per segment
`r`	numeric Alternative allele count bias
`theta_start`	numeric Starting value for theta_min
`exclude_neu`	logical Whether to exclude neutral segments in retest
`fit_gamma`	logical Whether to fit gamma
`fit_theta`	logical Whether to fit theta_min
`verbose`	logical Whether to print progress

Value

dataframe Bulk allele and expression profile with CNV states

Examples

bulk_example = analyze_joint(bulk_example, hmm = 'S15')
bulk_example = analyze_joint(bulk_example, hmm = 'S15')

example pseudobulk dataframe

Description

example pseudobulk dataframe

Usage

bulk_example
bulk_example

Format

An object of class tbl_df (inherits from tbl, data.frame) with 10321 rows and 58 columns.

chromosome sizes (hg19)

Description

chromosome sizes (hg19)

Usage

chrom_sizes_hg19
chrom_sizes_hg19

Format

An object of class data.table (inherits from data.frame) with 22 rows and 2 columns.

chromosome sizes (hg38)

Description

chromosome sizes (hg38)

Usage

chrom_sizes_hg38
chrom_sizes_hg38

Format

An object of class data.table (inherits from data.frame) with 22 rows and 2 columns.

Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom

Description

Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom

Usage

dbbinom(x, size, alpha = 1, beta = 1, log = FALSE)
dbbinom(x, size, alpha = 1, beta = 1, log = FALSE)

Arguments

`x`	vector of quantiles
`size`	number of trials (zero or more)
`alpha`	numeric (default=1)
`beta`	numeric (default=1)
`log`	boolean (default=FALSE)

Value

numeric Probability density values

Examples

dbbinom(1, 1, 1, 1)
dbbinom(1, 1, 1, 1)

example allele count dataframe

Description

example allele count dataframe

Usage

df_allele_example
df_allele_example

Format

An object of class data.table (inherits from data.frame) with 9957 rows and 11 columns.

Returns the density for the Poisson lognormal distribution with parameters mu and sig

Description

Returns the density for the Poisson lognormal distribution with parameters mu and sig

Usage

dpoilog(x, mu, sig, log = FALSE)
dpoilog(x, mu, sig, log = FALSE)

Arguments

`x`	vector of integers, the observations
`mu`	mean of lognormal distribution
`sig`	standard deviation of lognormal distribution
`log`	boolean Return the log density if TRUE (default=FALSE)

Value

numeric Probability density values

Examples

p = dpoilog(1, 1, 1)
p = dpoilog(1, 1, 1)

Fit MLE of log-normal Poisson model

Description

Fit MLE of log-normal Poisson model

Usage

fit_lnpois_cpp(Y_obs, lambda_ref, d)
fit_lnpois_cpp(Y_obs, lambda_ref, d)

Arguments

`Y_obs`	Vector of observed counts
`lambda_ref`	Vector of reference rates
`d`	integer Total depth

Value

NumericVector MLE estimates of mu and sigma

Forward-backward algorithm for allele HMM

Description

Forward-backward algorithm for allele HMM

Usage

forward_back_allele(hmm)
forward_back_allele(hmm)

Arguments

hmm

HMM object; expect variables x (allele depth), d (total depth), logPi (log transition prob matrix), delta (prior for each state), alpha (alpha for each state), beta (beta for each state), states (states), p_s (phase switch probs)

Value

numeric matrix; posterior probabilities

Examples

forward_back_allele(pre_likelihood_hmm)
forward_back_allele(pre_likelihood_hmm)

genome gap regions (hg19)

Description

genome gap regions (hg19)

Usage

gaps_hg19
gaps_hg19

Format

An object of class data.table (inherits from data.frame) with 28 rows and 3 columns.

genome gap regions (hg38)

Description

genome gap regions (hg38)

Usage

gaps_hg38
gaps_hg38

Format

An object of class data.table (inherits from data.frame) with 30 rows and 3 columns.

example gene expression counts matrix

Description

example gene expression counts matrix

Usage

gene_counts_example
gene_counts_example

Format

An object of class matrix (inherits from array) with 1758 rows and 1 columns.

Aggregate into pseudobulk alelle profile

Description

Aggregate into pseudobulk alelle profile

Usage

get_allele_bulk(df_allele, gtf, genetic_map = NULL, nu = 0.5, min_depth = 0)
get_allele_bulk(df_allele, gtf, genetic_map = NULL, nu = 0.5, min_depth = 0)

Arguments

`df_allele`	dataframe Single-cell allele counts
`gtf`	dataframe Transcript gtf
`genetic_map`	dataframe Genetic map
`nu`	numeric Phase switch rate
`min_depth`	integer Minimum coverage to filter SNPs

Value

dataframe Pseudobulk allele profile

Examples

bulk_example = get_allele_bulk(
    df_allele = df_allele_example,
    gtf = gtf_hg38)
bulk_example = get_allele_bulk(
    df_allele = df_allele_example,
    gtf = gtf_hg38)

Produce combined bulk expression and allele profile

Description

Produce combined bulk expression and allele profile

Usage

get_bulk(
  count_mat,
  lambdas_ref,
  df_allele,
  gtf,
  genetic_map = NULL,
  min_depth = 0,
  nu = 1,
  verbose = TRUE
)
get_bulk(
  count_mat,
  lambdas_ref,
  df_allele,
  gtf,
  genetic_map = NULL,
  min_depth = 0,
  nu = 1,
  verbose = TRUE
)

Arguments

`count_mat`	matrix Gene expression counts
`lambdas_ref`	matrix Reference expression profiles
`df_allele`	dataframe Allele counts
`gtf`	dataframe Transcript gtf
`genetic_map`	dataframe Genetic map
`min_depth`	integer Minimum coverage to filter SNPs
`nu`	numeric Phase switch rate
`verbose`	logical Whether to print progress

Value

dataframe Pseudobulk gene expression and allele profile

Examples

bulk_example = get_bulk(
    count_mat = gene_counts_example,
    lambdas_ref = ref_hca,
    df_allele = df_allele_example,
    gtf = gtf_hg38)
bulk_example = get_bulk(
    count_mat = gene_counts_example,
    lambdas_ref = ref_hca,
    df_allele = df_allele_example,
    gtf = gtf_hg38)

gene model (hg19)

Description

gene model (hg19)

Usage

gtf_hg19
gtf_hg19

Format

An object of class data.table (inherits from data.frame) with 26841 rows and 5 columns.

gene model (hg38)

Description

gene model (hg38)

Usage

gtf_hg38
gtf_hg38

Format

An object of class data.table (inherits from data.frame) with 26807 rows and 5 columns.

gene model (mm10)

Description

gene model (mm10)

Usage

gtf_mm10
gtf_mm10

Format

An object of class data.table (inherits from data.frame) with 30336 rows and 5 columns.

calculate joint likelihood of allele data

Description

calculate joint likelihood of allele data

Usage

l_bbinom(AD, DP, alpha, beta)
l_bbinom(AD, DP, alpha, beta)

Arguments

`AD`	numeric vector Variant allele depth
`DP`	numeric vector Total allele depth
`alpha`	numeric Alpha parameter of Beta-Binomial distribution
`beta`	numeric Beta parameter of Beta-Binomial distribution

Value

numeric Joint log likelihood

Examples

l_bbinom(c(1, 2), c(1, 2), 1, 1)
l_bbinom(c(1, 2), c(1, 2), 1, 1)

calculate joint likelihood of a PLN model

Description

calculate joint likelihood of a PLN model

Usage

l_lnpois(Y_obs, lambda_ref, d, mu, sig, phi = 1)
l_lnpois(Y_obs, lambda_ref, d, mu, sig, phi = 1)

Arguments

`Y_obs`	numeric vector Gene expression counts
`lambda_ref`	numeric vector Reference expression levels
`d`	numeric Total library size
`mu`	numeric Global mean expression
`sig`	numeric Global standard deviation of expression
`phi`	numeric Fold change of expression

Value

numeric Joint log likelihood

Examples

l_lnpois(c(1, 2), c(1, 2), 1, 1, 1)
l_lnpois(c(1, 2), c(1, 2), 1, 1, 1)

Only compute total log likelihood from an allele HMM

Description

Only compute total log likelihood from an allele HMM

Usage

likelihood_allele(hmm)
likelihood_allele(hmm)

Arguments

hmm

Value

numeric; total log likelihood

Examples

likelihood_allele(pre_likelihood_hmm)
likelihood_allele(pre_likelihood_hmm)

logSumExp function

Description

logSumExp function

Usage

logSumExp(x)
logSumExp(x)

Arguments

`x`	NumericVector

Value

double logSumExp of x

Plot a group of pseudobulk HMM profiles

Description

Plot a group of pseudobulk HMM profiles

Usage

plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)
plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)

Arguments

`bulks`	dataframe Pseudobulk profiles annotated with "sample" column
`...`	additional parameters passed to plot_psbulk()
`ncol`	integer Number of columns
`title`	logical Whether to add titles to individual plots
`title_size`	numeric Size of titles

Value

a ggplot object

Examples

p = plot_bulks(bulk_example)
p = plot_bulks(bulk_example)

Plot a pseudobulk HMM profile

Description

Plot a pseudobulk HMM profile

Usage

plot_psbulk(
  bulk,
  use_pos = TRUE,
  allele_only = FALSE,
  min_LLR = 5,
  min_depth = 8,
  exp_limit = 2,
  phi_mle = TRUE,
  theta_roll = FALSE,
  dot_size = 0.8,
  dot_alpha = 0.5,
  legend = TRUE,
  exclude_gap = TRUE,
  genome = "hg38",
  text_size = 10,
  raster = FALSE
)
plot_psbulk(
  bulk,
  use_pos = TRUE,
  allele_only = FALSE,
  min_LLR = 5,
  min_depth = 8,
  exp_limit = 2,
  phi_mle = TRUE,
  theta_roll = FALSE,
  dot_size = 0.8,
  dot_alpha = 0.5,
  legend = TRUE,
  exclude_gap = TRUE,
  genome = "hg38",
  text_size = 10,
  raster = FALSE
)

Arguments

`bulk`	dataframe Pseudobulk profile
`use_pos`	logical Use marker position instead of index as x coordinate
`allele_only`	logical Only plot alleles
`min_LLR`	numeric LLR threshold for event filtering
`min_depth`	numeric Minimum coverage depth for a SNP to be plotted
`exp_limit`	numeric Expression logFC axis limit
`phi_mle`	logical Whether to plot estimates of segmental expression fold change
`theta_roll`	logical Whether to plot rolling estimates of allele imbalance
`dot_size`	numeric Size of marker dots
`dot_alpha`	numeric Transparency of the marker dots
`legend`	logical Whether to show legend
`exclude_gap`	logical Whether to mark gap regions and centromeres
`genome`	character Genome build, either 'hg38' or 'hg19'
`text_size`	numeric Size of text in the plot
`raster`	logical Whether to raster images

Value

ggplot Plot of pseudobulk HMM profile

Examples

p = plot_psbulk(bulk_example)
p = plot_psbulk(bulk_example)

HMM object for unit tests

Description

HMM object for unit tests

Usage

pre_likelihood_hmm
pre_likelihood_hmm

Format

An object of class list of length 10.

reference expression magnitudes from HCA

Description

reference expression magnitudes from HCA

Usage

ref_hca
ref_hca

Format

An object of class matrix (inherits from array) with 24756 rows and 12 columns.

reference expression counts from HCA

Description

reference expression counts from HCA

Usage

ref_hca_counts
ref_hca_counts

Format

An object of class matrix (inherits from array) with 24857 rows and 12 columns.

Run a 5-state allele-only HMM - two theta levels

Description

Run a 5-state allele-only HMM - two theta levels

Usage

run_allele_hmm_s5(
  pAD,
  DP,
  p_s,
  t = 1e-05,
  theta_min = 0.08,
  gamma = 20,
  prior = NULL,
  ...
)
run_allele_hmm_s5(
  pAD,
  DP,
  p_s,
  t = 1e-05,
  theta_min = 0.08,
  gamma = 20,
  prior = NULL,
  ...
)

Arguments

`pAD`	integer vector Paternal allele counts
`DP`	integer vector Total alelle counts
`p_s`	numeric vector Phase switch probabilities
`t`	numeric Transition probability between copy number states
`theta_min`	numeric Minimum haplotype frequency deviation threshold
`gamma`	numeric Overdispersion in the allele-specific expression
`prior`	numeric vector Prior probabilities for each state
`...`	Additional parameters

Value

character vector Decoded states

Examples

with(bulk_example, {
    run_allele_hmm_s5(pAD = pAD, DP = DP, R = R, p_s = p_s, theta_min = 0.08, gamma = 30)
})
with(bulk_example, {
    run_allele_hmm_s5(pAD = pAD, DP = DP, R = R, p_s = p_s, theta_min = 0.08, gamma = 30)
})

Run 15-state joint HMM on a pseudobulk profile

Description

Run 15-state joint HMM on a pseudobulk profile

Usage

run_joint_hmm_s15(
  pAD,
  DP,
  p_s,
  Y_obs = 0,
  lambda_ref = 0,
  d_total = 0,
  theta_min = 0.08,
  theta_neu = 0,
  bal_cnv = TRUE,
  phi_del = 2^(-0.25),
  phi_amp = 2^(0.25),
  phi_bamp = phi_amp,
  phi_bdel = phi_del,
  mu = 0,
  sig = 1,
  r = 0.015,
  t = 1e-05,
  gamma = 18,
  prior = NULL,
  exp_only = FALSE,
  allele_only = FALSE,
  classify_allele = FALSE,
  debug = FALSE,
  ...
)
run_joint_hmm_s15(
  pAD,
  DP,
  p_s,
  Y_obs = 0,
  lambda_ref = 0,
  d_total = 0,
  theta_min = 0.08,
  theta_neu = 0,
  bal_cnv = TRUE,
  phi_del = 2^(-0.25),
  phi_amp = 2^(0.25),
  phi_bamp = phi_amp,
  phi_bdel = phi_del,
  mu = 0,
  sig = 1,
  r = 0.015,
  t = 1e-05,
  gamma = 18,
  prior = NULL,
  exp_only = FALSE,
  allele_only = FALSE,
  classify_allele = FALSE,
  debug = FALSE,
  ...
)

Arguments

`pAD`	integer vector Paternal allele counts
`DP`	integer vector Total alelle counts
`p_s`	numeric vector Phase switch probabilities
`Y_obs`	numeric vector Observed gene counts
`lambda_ref`	numeric vector Reference expression rates
`d_total`	integer Total library size for expression counts
`theta_min`	numeric Minimum haplotype imbalance threshold
`theta_neu`	numeric Haplotype imbalance threshold for neutral state
`bal_cnv`	logical Whether to include balanced CNV states
`phi_del`	numeric Expected fold change for deletion
`phi_amp`	numeric Expected fold change for amplification
`phi_bamp`	numeric Expected fold change for balanced amplification
`phi_bdel`	numeric Expected fold change for balanced deletion
`mu`	numeric Global expression bias
`sig`	numeric Global expression variance
`r`	numeric Variant mapping bias
`t`	numeric Transition probability between copy number states
`gamma`	numeric Overdispersion in the allele-specific expression
`prior`	numeric vector Prior probabilities for each state
`exp_only`	logical Whether to only use expression data
`allele_only`	logical Whether to only use allele data
`classify_allele`	logical Whether to classify allele states
`debug`	logical Whether to print debug messages
`...`	Additional parameters

Value

character vector Decoded states

Examples

with(bulk_example, {
    run_joint_hmm_s15(pAD = pAD, DP = DP, p_s = p_s, Y_obs = Y_obs, lambda_ref = lambda_ref, 
    d_total = na.omit(unique(d_obs)), mu = mu, sig = sig, t = 1e-5, gamma = 30, theta_min = 0.08)
})
with(bulk_example, {
    run_joint_hmm_s15(pAD = pAD, DP = DP, p_s = p_s, Y_obs = Y_obs, lambda_ref = lambda_ref, 
    d_total = na.omit(unique(d_obs)), mu = mu, sig = sig, t = 1e-5, gamma = 30, theta_min = 0.08)
})

example CNV segments dataframe

Description

example CNV segments dataframe

Usage

segs_example
segs_example

Format

An object of class data.table (inherits from data.frame) with 27 rows and 30 columns.

example VCF header

Description

example VCF header

Usage

vcf_meta
vcf_meta

Format

An object of class character of length 65.

Package 'hahmmr'

Help Index

centromere regions (hg19)

Description

Usage

Format

centromere regions (hg38)

Description

Usage

Format

Analyze allele profile

Description

Usage

Arguments

Value

Examples

Analyze allele and expression profile

Description

Usage

Arguments

Value

Examples

example pseudobulk dataframe

Description

Usage

Format

chromosome sizes (hg19)

Description

Usage

Format

chromosome sizes (hg38)

Description

Usage

Format

Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom

Description

Usage

Arguments

Value

Examples

example allele count dataframe

Description

Usage

Format

Returns the density for the Poisson lognormal distribution with parameters mu and sig

Description

Usage

Arguments

Value

Examples

Fit MLE of log-normal Poisson model

Description

Usage

Arguments

Value

Forward-backward algorithm for allele HMM

Description

Usage

Arguments

Value

Examples

genome gap regions (hg19)

Description

Usage

Format

genome gap regions (hg38)

Description

Usage

Format

example gene expression counts matrix

Description

Usage

Format

Aggregate into pseudobulk alelle profile

Description

Usage

Arguments

Value

Examples

Produce combined bulk expression and allele profile