| Title: | Haplotype-Aware Hidden Markov Model for RNA |
|---|---|
| Description: | Haplotype-aware Hidden Markov Model for RNA (HaHMMR) is a method for detecting copy number variations (CNVs) from bulk RNA-seq data. Additional examples, documentations, and details on the method are available at <https://github.com/kharchenkolab/hahmmr/>. |
| Authors: | Teng Gao [aut, cre] |
| Maintainer: | Teng Gao <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-31 09:36:59 UTC |
| Source: | https://github.com/cran/hahmmr |
centromere regions (hg19)
acen_hg19acen_hg19
An object of class tbl_df (inherits from tbl, data.frame) with 22 rows and 3 columns.
centromere regions (hg38)
acen_hg38acen_hg38
An object of class tbl_df (inherits from tbl, data.frame) with 22 rows and 3 columns.
Analyze allele profile
analyze_allele( bulk, t = 1e-05, theta_min = 0.08, gamma = 20, nu = 0.5, r = 0.015, hmm = "S5", fit_theta = FALSE, fit_gamma = FALSE, theta_start = 0.05, verbose = TRUE )analyze_allele( bulk, t = 1e-05, theta_min = 0.08, gamma = 20, nu = 0.5, r = 0.015, hmm = "S5", fit_theta = FALSE, fit_gamma = FALSE, theta_start = 0.05, verbose = TRUE )
bulk |
dataframe Bulk allele profile |
t |
numeric Transition probability |
theta_min |
numeric Minimum allele fraction |
gamma |
numeric Overdispersion parameter |
nu |
numeric Phase switch rate |
r |
numeric Alternative allele count bias |
hmm |
character HMM model to use (S3 or S5) |
fit_theta |
logical Whether to fit theta_min |
fit_gamma |
logical Whether to fit gamma |
theta_start |
numeric Starting value for theta_min |
verbose |
logical Whether to print progress |
dataframe Bulk allele profile with CNV states
bulk_example = analyze_allele(bulk_example, hmm = 'S5')bulk_example = analyze_allele(bulk_example, hmm = 'S5')
Analyze allele and expression profile
analyze_joint( bulk, t = 1e-05, gamma = 20, theta_min = 0.08, logphi_min = 0.25, hmm = "S15", nu = 1, min_genes = 10, r = 0.015, theta_start = 0.05, exclude_neu = TRUE, fit_gamma = FALSE, fit_theta = FALSE, verbose = TRUE )analyze_joint( bulk, t = 1e-05, gamma = 20, theta_min = 0.08, logphi_min = 0.25, hmm = "S15", nu = 1, min_genes = 10, r = 0.015, theta_start = 0.05, exclude_neu = TRUE, fit_gamma = FALSE, fit_theta = FALSE, verbose = TRUE )
bulk |
dataframe Bulk allele and expression profile |
t |
numeric Transition probability |
gamma |
numeric Overdispersion parameter |
theta_min |
numeric Minimum allele fraction |
logphi_min |
numeric Minimum log2 fold change |
hmm |
character HMM model to use (S7 or S15) |
nu |
numeric Phase switch rate |
min_genes |
integer Minimum number of genes per segment |
r |
numeric Alternative allele count bias |
theta_start |
numeric Starting value for theta_min |
exclude_neu |
logical Whether to exclude neutral segments in retest |
fit_gamma |
logical Whether to fit gamma |
fit_theta |
logical Whether to fit theta_min |
verbose |
logical Whether to print progress |
dataframe Bulk allele and expression profile with CNV states
bulk_example = analyze_joint(bulk_example, hmm = 'S15')bulk_example = analyze_joint(bulk_example, hmm = 'S15')
example pseudobulk dataframe
bulk_examplebulk_example
An object of class tbl_df (inherits from tbl, data.frame) with 10321 rows and 58 columns.
chromosome sizes (hg19)
chrom_sizes_hg19chrom_sizes_hg19
An object of class data.table (inherits from data.frame) with 22 rows and 2 columns.
chromosome sizes (hg38)
chrom_sizes_hg38chrom_sizes_hg38
An object of class data.table (inherits from data.frame) with 22 rows and 2 columns.
Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom
dbbinom(x, size, alpha = 1, beta = 1, log = FALSE)dbbinom(x, size, alpha = 1, beta = 1, log = FALSE)
x |
vector of quantiles |
size |
number of trials (zero or more) |
alpha |
numeric (default=1) |
beta |
numeric (default=1) |
log |
boolean (default=FALSE) |
numeric Probability density values
dbbinom(1, 1, 1, 1)dbbinom(1, 1, 1, 1)
example allele count dataframe
df_allele_exampledf_allele_example
An object of class data.table (inherits from data.frame) with 9957 rows and 11 columns.
Returns the density for the Poisson lognormal distribution with parameters mu and sig
dpoilog(x, mu, sig, log = FALSE)dpoilog(x, mu, sig, log = FALSE)
x |
vector of integers, the observations |
mu |
mean of lognormal distribution |
sig |
standard deviation of lognormal distribution |
log |
boolean Return the log density if TRUE (default=FALSE) |
numeric Probability density values
p = dpoilog(1, 1, 1)p = dpoilog(1, 1, 1)
Fit MLE of log-normal Poisson model
fit_lnpois_cpp(Y_obs, lambda_ref, d)fit_lnpois_cpp(Y_obs, lambda_ref, d)
Y_obs |
Vector of observed counts |
lambda_ref |
Vector of reference rates |
d |
integer Total depth |
NumericVector MLE estimates of mu and sigma
Forward-backward algorithm for allele HMM
forward_back_allele(hmm)forward_back_allele(hmm)
hmm |
HMM object; expect variables x (allele depth), d (total depth), logPi (log transition prob matrix), delta (prior for each state), alpha (alpha for each state), beta (beta for each state), states (states), p_s (phase switch probs) |
numeric matrix; posterior probabilities
forward_back_allele(pre_likelihood_hmm)forward_back_allele(pre_likelihood_hmm)
genome gap regions (hg19)
gaps_hg19gaps_hg19
An object of class data.table (inherits from data.frame) with 28 rows and 3 columns.
genome gap regions (hg38)
gaps_hg38gaps_hg38
An object of class data.table (inherits from data.frame) with 30 rows and 3 columns.
example gene expression counts matrix
gene_counts_examplegene_counts_example
An object of class matrix (inherits from array) with 1758 rows and 1 columns.
Aggregate into pseudobulk alelle profile
get_allele_bulk(df_allele, gtf, genetic_map = NULL, nu = 0.5, min_depth = 0)get_allele_bulk(df_allele, gtf, genetic_map = NULL, nu = 0.5, min_depth = 0)
df_allele |
dataframe Single-cell allele counts |
gtf |
dataframe Transcript gtf |
genetic_map |
dataframe Genetic map |
nu |
numeric Phase switch rate |
min_depth |
integer Minimum coverage to filter SNPs |
dataframe Pseudobulk allele profile
bulk_example = get_allele_bulk( df_allele = df_allele_example, gtf = gtf_hg38)bulk_example = get_allele_bulk( df_allele = df_allele_example, gtf = gtf_hg38)
Produce combined bulk expression and allele profile
get_bulk( count_mat, lambdas_ref, df_allele, gtf, genetic_map = NULL, min_depth = 0, nu = 1, verbose = TRUE )get_bulk( count_mat, lambdas_ref, df_allele, gtf, genetic_map = NULL, min_depth = 0, nu = 1, verbose = TRUE )
count_mat |
matrix Gene expression counts |
lambdas_ref |
matrix Reference expression profiles |
df_allele |
dataframe Allele counts |
gtf |
dataframe Transcript gtf |
genetic_map |
dataframe Genetic map |
min_depth |
integer Minimum coverage to filter SNPs |
nu |
numeric Phase switch rate |
verbose |
logical Whether to print progress |
dataframe Pseudobulk gene expression and allele profile
bulk_example = get_bulk( count_mat = gene_counts_example, lambdas_ref = ref_hca, df_allele = df_allele_example, gtf = gtf_hg38)bulk_example = get_bulk( count_mat = gene_counts_example, lambdas_ref = ref_hca, df_allele = df_allele_example, gtf = gtf_hg38)
gene model (hg19)
gtf_hg19gtf_hg19
An object of class data.table (inherits from data.frame) with 26841 rows and 5 columns.
gene model (hg38)
gtf_hg38gtf_hg38
An object of class data.table (inherits from data.frame) with 26807 rows and 5 columns.
gene model (mm10)
gtf_mm10gtf_mm10
An object of class data.table (inherits from data.frame) with 30336 rows and 5 columns.
calculate joint likelihood of allele data
l_bbinom(AD, DP, alpha, beta)l_bbinom(AD, DP, alpha, beta)
AD |
numeric vector Variant allele depth |
DP |
numeric vector Total allele depth |
alpha |
numeric Alpha parameter of Beta-Binomial distribution |
beta |
numeric Beta parameter of Beta-Binomial distribution |
numeric Joint log likelihood
l_bbinom(c(1, 2), c(1, 2), 1, 1)l_bbinom(c(1, 2), c(1, 2), 1, 1)
calculate joint likelihood of a PLN model
l_lnpois(Y_obs, lambda_ref, d, mu, sig, phi = 1)l_lnpois(Y_obs, lambda_ref, d, mu, sig, phi = 1)
Y_obs |
numeric vector Gene expression counts |
lambda_ref |
numeric vector Reference expression levels |
d |
numeric Total library size |
mu |
numeric Global mean expression |
sig |
numeric Global standard deviation of expression |
phi |
numeric Fold change of expression |
numeric Joint log likelihood
l_lnpois(c(1, 2), c(1, 2), 1, 1, 1)l_lnpois(c(1, 2), c(1, 2), 1, 1, 1)
Only compute total log likelihood from an allele HMM
likelihood_allele(hmm)likelihood_allele(hmm)
hmm |
HMM object; expect variables x (allele depth), d (total depth), logPi (log transition prob matrix), delta (prior for each state), alpha (alpha for each state), beta (beta for each state), states (states), p_s (phase switch probs) |
numeric; total log likelihood
likelihood_allele(pre_likelihood_hmm)likelihood_allele(pre_likelihood_hmm)
logSumExp function
logSumExp(x)logSumExp(x)
x |
NumericVector |
double logSumExp of x
Plot a group of pseudobulk HMM profiles
plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)
bulks |
dataframe Pseudobulk profiles annotated with "sample" column |
... |
additional parameters passed to plot_psbulk() |
ncol |
integer Number of columns |
title |
logical Whether to add titles to individual plots |
title_size |
numeric Size of titles |
a ggplot object
p = plot_bulks(bulk_example)p = plot_bulks(bulk_example)
Plot a pseudobulk HMM profile
plot_psbulk( bulk, use_pos = TRUE, allele_only = FALSE, min_LLR = 5, min_depth = 8, exp_limit = 2, phi_mle = TRUE, theta_roll = FALSE, dot_size = 0.8, dot_alpha = 0.5, legend = TRUE, exclude_gap = TRUE, genome = "hg38", text_size = 10, raster = FALSE )plot_psbulk( bulk, use_pos = TRUE, allele_only = FALSE, min_LLR = 5, min_depth = 8, exp_limit = 2, phi_mle = TRUE, theta_roll = FALSE, dot_size = 0.8, dot_alpha = 0.5, legend = TRUE, exclude_gap = TRUE, genome = "hg38", text_size = 10, raster = FALSE )
bulk |
dataframe Pseudobulk profile |
use_pos |
logical Use marker position instead of index as x coordinate |
allele_only |
logical Only plot alleles |
min_LLR |
numeric LLR threshold for event filtering |
min_depth |
numeric Minimum coverage depth for a SNP to be plotted |
exp_limit |
numeric Expression logFC axis limit |
phi_mle |
logical Whether to plot estimates of segmental expression fold change |
theta_roll |
logical Whether to plot rolling estimates of allele imbalance |
dot_size |
numeric Size of marker dots |
dot_alpha |
numeric Transparency of the marker dots |
legend |
logical Whether to show legend |
exclude_gap |
logical Whether to mark gap regions and centromeres |
genome |
character Genome build, either 'hg38' or 'hg19' |
text_size |
numeric Size of text in the plot |
raster |
logical Whether to raster images |
ggplot Plot of pseudobulk HMM profile
p = plot_psbulk(bulk_example)p = plot_psbulk(bulk_example)
HMM object for unit tests
pre_likelihood_hmmpre_likelihood_hmm
An object of class list of length 10.
reference expression magnitudes from HCA
ref_hcaref_hca
An object of class matrix (inherits from array) with 24756 rows and 12 columns.
reference expression counts from HCA
ref_hca_countsref_hca_counts
An object of class matrix (inherits from array) with 24857 rows and 12 columns.
Run a 5-state allele-only HMM - two theta levels
run_allele_hmm_s5( pAD, DP, p_s, t = 1e-05, theta_min = 0.08, gamma = 20, prior = NULL, ... )run_allele_hmm_s5( pAD, DP, p_s, t = 1e-05, theta_min = 0.08, gamma = 20, prior = NULL, ... )
pAD |
integer vector Paternal allele counts |
DP |
integer vector Total alelle counts |
p_s |
numeric vector Phase switch probabilities |
t |
numeric Transition probability between copy number states |
theta_min |
numeric Minimum haplotype frequency deviation threshold |
gamma |
numeric Overdispersion in the allele-specific expression |
prior |
numeric vector Prior probabilities for each state |
... |
Additional parameters |
character vector Decoded states
with(bulk_example, { run_allele_hmm_s5(pAD = pAD, DP = DP, R = R, p_s = p_s, theta_min = 0.08, gamma = 30) })with(bulk_example, { run_allele_hmm_s5(pAD = pAD, DP = DP, R = R, p_s = p_s, theta_min = 0.08, gamma = 30) })
Run 15-state joint HMM on a pseudobulk profile
run_joint_hmm_s15( pAD, DP, p_s, Y_obs = 0, lambda_ref = 0, d_total = 0, theta_min = 0.08, theta_neu = 0, bal_cnv = TRUE, phi_del = 2^(-0.25), phi_amp = 2^(0.25), phi_bamp = phi_amp, phi_bdel = phi_del, mu = 0, sig = 1, r = 0.015, t = 1e-05, gamma = 18, prior = NULL, exp_only = FALSE, allele_only = FALSE, classify_allele = FALSE, debug = FALSE, ... )run_joint_hmm_s15( pAD, DP, p_s, Y_obs = 0, lambda_ref = 0, d_total = 0, theta_min = 0.08, theta_neu = 0, bal_cnv = TRUE, phi_del = 2^(-0.25), phi_amp = 2^(0.25), phi_bamp = phi_amp, phi_bdel = phi_del, mu = 0, sig = 1, r = 0.015, t = 1e-05, gamma = 18, prior = NULL, exp_only = FALSE, allele_only = FALSE, classify_allele = FALSE, debug = FALSE, ... )
pAD |
integer vector Paternal allele counts |
DP |
integer vector Total alelle counts |
p_s |
numeric vector Phase switch probabilities |
Y_obs |
numeric vector Observed gene counts |
lambda_ref |
numeric vector Reference expression rates |
d_total |
integer Total library size for expression counts |
theta_min |
numeric Minimum haplotype imbalance threshold |
theta_neu |
numeric Haplotype imbalance threshold for neutral state |
bal_cnv |
logical Whether to include balanced CNV states |
phi_del |
numeric Expected fold change for deletion |
phi_amp |
numeric Expected fold change for amplification |
phi_bamp |
numeric Expected fold change for balanced amplification |
phi_bdel |
numeric Expected fold change for balanced deletion |
mu |
numeric Global expression bias |
sig |
numeric Global expression variance |
r |
numeric Variant mapping bias |
t |
numeric Transition probability between copy number states |
gamma |
numeric Overdispersion in the allele-specific expression |
prior |
numeric vector Prior probabilities for each state |
exp_only |
logical Whether to only use expression data |
allele_only |
logical Whether to only use allele data |
classify_allele |
logical Whether to classify allele states |
debug |
logical Whether to print debug messages |
... |
Additional parameters |
character vector Decoded states
with(bulk_example, { run_joint_hmm_s15(pAD = pAD, DP = DP, p_s = p_s, Y_obs = Y_obs, lambda_ref = lambda_ref, d_total = na.omit(unique(d_obs)), mu = mu, sig = sig, t = 1e-5, gamma = 30, theta_min = 0.08) })with(bulk_example, { run_joint_hmm_s15(pAD = pAD, DP = DP, p_s = p_s, Y_obs = Y_obs, lambda_ref = lambda_ref, d_total = na.omit(unique(d_obs)), mu = mu, sig = sig, t = 1e-5, gamma = 30, theta_min = 0.08) })
example CNV segments dataframe
segs_examplesegs_example
An object of class data.table (inherits from data.frame) with 27 rows and 30 columns.
example VCF header
vcf_metavcf_meta
An object of class character of length 65.