Title: | Haplotype-Aware Hidden Markov Model for RNA |
---|---|
Description: | Haplotype-aware Hidden Markov Model for RNA (HaHMMR) is a method for detecting copy number variations (CNVs) from bulk RNA-seq data. Additional examples, documentations, and details on the method are available at <https://github.com/kharchenkolab/hahmmr/>. |
Authors: | Teng Gao [aut, cre] , Evan Biederstedt [aut], Peter Kharchenko [aut] |
Maintainer: | Teng Gao <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-10-25 05:17:48 UTC |
Source: | https://github.com/cran/hahmmr |
centromere regions (hg19)
acen_hg19
acen_hg19
An object of class tbl_df
(inherits from tbl
, data.frame
) with 22 rows and 3 columns.
centromere regions (hg38)
acen_hg38
acen_hg38
An object of class tbl_df
(inherits from tbl
, data.frame
) with 22 rows and 3 columns.
Analyze allele profile
analyze_allele( bulk, t = 1e-05, theta_min = 0.08, gamma = 20, nu = 0.5, r = 0.015, hmm = "S5", fit_theta = FALSE, fit_gamma = FALSE, theta_start = 0.05, verbose = TRUE )
analyze_allele( bulk, t = 1e-05, theta_min = 0.08, gamma = 20, nu = 0.5, r = 0.015, hmm = "S5", fit_theta = FALSE, fit_gamma = FALSE, theta_start = 0.05, verbose = TRUE )
bulk |
dataframe Bulk allele profile |
t |
numeric Transition probability |
theta_min |
numeric Minimum allele fraction |
gamma |
numeric Overdispersion parameter |
nu |
numeric Phase switch rate |
r |
numeric Alternative allele count bias |
hmm |
character HMM model to use (S3 or S5) |
fit_theta |
logical Whether to fit theta_min |
fit_gamma |
logical Whether to fit gamma |
theta_start |
numeric Starting value for theta_min |
verbose |
logical Whether to print progress |
dataframe Bulk allele profile with CNV states
bulk_example = analyze_allele(bulk_example, hmm = 'S5')
bulk_example = analyze_allele(bulk_example, hmm = 'S5')
Analyze allele and expression profile
analyze_joint( bulk, t = 1e-05, gamma = 20, theta_min = 0.08, logphi_min = 0.25, hmm = "S15", nu = 1, min_genes = 10, r = 0.015, theta_start = 0.05, exclude_neu = TRUE, fit_gamma = FALSE, fit_theta = FALSE, verbose = TRUE )
analyze_joint( bulk, t = 1e-05, gamma = 20, theta_min = 0.08, logphi_min = 0.25, hmm = "S15", nu = 1, min_genes = 10, r = 0.015, theta_start = 0.05, exclude_neu = TRUE, fit_gamma = FALSE, fit_theta = FALSE, verbose = TRUE )
bulk |
dataframe Bulk allele and expression profile |
t |
numeric Transition probability |
gamma |
numeric Overdispersion parameter |
theta_min |
numeric Minimum allele fraction |
logphi_min |
numeric Minimum log2 fold change |
hmm |
character HMM model to use (S7 or S15) |
nu |
numeric Phase switch rate |
min_genes |
integer Minimum number of genes per segment |
r |
numeric Alternative allele count bias |
theta_start |
numeric Starting value for theta_min |
exclude_neu |
logical Whether to exclude neutral segments in retest |
fit_gamma |
logical Whether to fit gamma |
fit_theta |
logical Whether to fit theta_min |
verbose |
logical Whether to print progress |
dataframe Bulk allele and expression profile with CNV states
bulk_example = analyze_joint(bulk_example, hmm = 'S15')
bulk_example = analyze_joint(bulk_example, hmm = 'S15')
example pseudobulk dataframe
bulk_example
bulk_example
An object of class tbl_df
(inherits from tbl
, data.frame
) with 10321 rows and 58 columns.
chromosome sizes (hg19)
chrom_sizes_hg19
chrom_sizes_hg19
An object of class data.table
(inherits from data.frame
) with 22 rows and 2 columns.
chromosome sizes (hg38)
chrom_sizes_hg38
chrom_sizes_hg38
An object of class data.table
(inherits from data.frame
) with 22 rows and 2 columns.
Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom
dbbinom(x, size, alpha = 1, beta = 1, log = FALSE)
dbbinom(x, size, alpha = 1, beta = 1, log = FALSE)
x |
vector of quantiles |
size |
number of trials (zero or more) |
alpha |
numeric (default=1) |
beta |
numeric (default=1) |
log |
boolean (default=FALSE) |
numeric Probability density values
dbbinom(1, 1, 1, 1)
dbbinom(1, 1, 1, 1)
example allele count dataframe
df_allele_example
df_allele_example
An object of class data.table
(inherits from data.frame
) with 9957 rows and 11 columns.
Returns the density for the Poisson lognormal distribution with parameters mu and sig
dpoilog(x, mu, sig, log = FALSE)
dpoilog(x, mu, sig, log = FALSE)
x |
vector of integers, the observations |
mu |
mean of lognormal distribution |
sig |
standard deviation of lognormal distribution |
log |
boolean Return the log density if TRUE (default=FALSE) |
numeric Probability density values
p = dpoilog(1, 1, 1)
p = dpoilog(1, 1, 1)
Fit MLE of log-normal Poisson model
fit_lnpois_cpp(Y_obs, lambda_ref, d)
fit_lnpois_cpp(Y_obs, lambda_ref, d)
Y_obs |
Vector of observed counts |
lambda_ref |
Vector of reference rates |
d |
integer Total depth |
NumericVector MLE estimates of mu and sigma
Forward-backward algorithm for allele HMM
forward_back_allele(hmm)
forward_back_allele(hmm)
hmm |
HMM object; expect variables x (allele depth), d (total depth), logPi (log transition prob matrix), delta (prior for each state), alpha (alpha for each state), beta (beta for each state), states (states), p_s (phase switch probs) |
numeric matrix; posterior probabilities
forward_back_allele(pre_likelihood_hmm)
forward_back_allele(pre_likelihood_hmm)
genome gap regions (hg19)
gaps_hg19
gaps_hg19
An object of class data.table
(inherits from data.frame
) with 28 rows and 3 columns.
genome gap regions (hg38)
gaps_hg38
gaps_hg38
An object of class data.table
(inherits from data.frame
) with 30 rows and 3 columns.
example gene expression counts matrix
gene_counts_example
gene_counts_example
An object of class matrix
(inherits from array
) with 1758 rows and 1 columns.
Aggregate into pseudobulk alelle profile
get_allele_bulk(df_allele, gtf, genetic_map = NULL, nu = 0.5, min_depth = 0)
get_allele_bulk(df_allele, gtf, genetic_map = NULL, nu = 0.5, min_depth = 0)
df_allele |
dataframe Single-cell allele counts |
gtf |
dataframe Transcript gtf |
genetic_map |
dataframe Genetic map |
nu |
numeric Phase switch rate |
min_depth |
integer Minimum coverage to filter SNPs |
dataframe Pseudobulk allele profile
bulk_example = get_allele_bulk( df_allele = df_allele_example, gtf = gtf_hg38)
bulk_example = get_allele_bulk( df_allele = df_allele_example, gtf = gtf_hg38)
Produce combined bulk expression and allele profile
get_bulk( count_mat, lambdas_ref, df_allele, gtf, genetic_map = NULL, min_depth = 0, nu = 1, verbose = TRUE )
get_bulk( count_mat, lambdas_ref, df_allele, gtf, genetic_map = NULL, min_depth = 0, nu = 1, verbose = TRUE )
count_mat |
matrix Gene expression counts |
lambdas_ref |
matrix Reference expression profiles |
df_allele |
dataframe Allele counts |
gtf |
dataframe Transcript gtf |
genetic_map |
dataframe Genetic map |
min_depth |
integer Minimum coverage to filter SNPs |
nu |
numeric Phase switch rate |
verbose |
logical Whether to print progress |
dataframe Pseudobulk gene expression and allele profile
bulk_example = get_bulk( count_mat = gene_counts_example, lambdas_ref = ref_hca, df_allele = df_allele_example, gtf = gtf_hg38)
bulk_example = get_bulk( count_mat = gene_counts_example, lambdas_ref = ref_hca, df_allele = df_allele_example, gtf = gtf_hg38)
gene model (hg19)
gtf_hg19
gtf_hg19
An object of class data.table
(inherits from data.frame
) with 26841 rows and 5 columns.
gene model (hg38)
gtf_hg38
gtf_hg38
An object of class data.table
(inherits from data.frame
) with 26807 rows and 5 columns.
gene model (mm10)
gtf_mm10
gtf_mm10
An object of class data.table
(inherits from data.frame
) with 30336 rows and 5 columns.
calculate joint likelihood of allele data
l_bbinom(AD, DP, alpha, beta)
l_bbinom(AD, DP, alpha, beta)
AD |
numeric vector Variant allele depth |
DP |
numeric vector Total allele depth |
alpha |
numeric Alpha parameter of Beta-Binomial distribution |
beta |
numeric Beta parameter of Beta-Binomial distribution |
numeric Joint log likelihood
l_bbinom(c(1, 2), c(1, 2), 1, 1)
l_bbinom(c(1, 2), c(1, 2), 1, 1)
calculate joint likelihood of a PLN model
l_lnpois(Y_obs, lambda_ref, d, mu, sig, phi = 1)
l_lnpois(Y_obs, lambda_ref, d, mu, sig, phi = 1)
Y_obs |
numeric vector Gene expression counts |
lambda_ref |
numeric vector Reference expression levels |
d |
numeric Total library size |
mu |
numeric Global mean expression |
sig |
numeric Global standard deviation of expression |
phi |
numeric Fold change of expression |
numeric Joint log likelihood
l_lnpois(c(1, 2), c(1, 2), 1, 1, 1)
l_lnpois(c(1, 2), c(1, 2), 1, 1, 1)
Only compute total log likelihood from an allele HMM
likelihood_allele(hmm)
likelihood_allele(hmm)
hmm |
HMM object; expect variables x (allele depth), d (total depth), logPi (log transition prob matrix), delta (prior for each state), alpha (alpha for each state), beta (beta for each state), states (states), p_s (phase switch probs) |
numeric; total log likelihood
likelihood_allele(pre_likelihood_hmm)
likelihood_allele(pre_likelihood_hmm)
logSumExp function
logSumExp(x)
logSumExp(x)
x |
NumericVector |
double logSumExp of x
Plot a group of pseudobulk HMM profiles
plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)
plot_bulks(bulks, ..., ncol = 1, title = TRUE, title_size = 8)
bulks |
dataframe Pseudobulk profiles annotated with "sample" column |
... |
additional parameters passed to plot_psbulk() |
ncol |
integer Number of columns |
title |
logical Whether to add titles to individual plots |
title_size |
numeric Size of titles |
a ggplot object
p = plot_bulks(bulk_example)
p = plot_bulks(bulk_example)
Plot a pseudobulk HMM profile
plot_psbulk( bulk, use_pos = TRUE, allele_only = FALSE, min_LLR = 5, min_depth = 8, exp_limit = 2, phi_mle = TRUE, theta_roll = FALSE, dot_size = 0.8, dot_alpha = 0.5, legend = TRUE, exclude_gap = TRUE, genome = "hg38", text_size = 10, raster = FALSE )
plot_psbulk( bulk, use_pos = TRUE, allele_only = FALSE, min_LLR = 5, min_depth = 8, exp_limit = 2, phi_mle = TRUE, theta_roll = FALSE, dot_size = 0.8, dot_alpha = 0.5, legend = TRUE, exclude_gap = TRUE, genome = "hg38", text_size = 10, raster = FALSE )
bulk |
dataframe Pseudobulk profile |
use_pos |
logical Use marker position instead of index as x coordinate |
allele_only |
logical Only plot alleles |
min_LLR |
numeric LLR threshold for event filtering |
min_depth |
numeric Minimum coverage depth for a SNP to be plotted |
exp_limit |
numeric Expression logFC axis limit |
phi_mle |
logical Whether to plot estimates of segmental expression fold change |
theta_roll |
logical Whether to plot rolling estimates of allele imbalance |
dot_size |
numeric Size of marker dots |
dot_alpha |
numeric Transparency of the marker dots |
legend |
logical Whether to show legend |
exclude_gap |
logical Whether to mark gap regions and centromeres |
genome |
character Genome build, either 'hg38' or 'hg19' |
text_size |
numeric Size of text in the plot |
raster |
logical Whether to raster images |
ggplot Plot of pseudobulk HMM profile
p = plot_psbulk(bulk_example)
p = plot_psbulk(bulk_example)
HMM object for unit tests
pre_likelihood_hmm
pre_likelihood_hmm
An object of class list
of length 10.
reference expression magnitudes from HCA
ref_hca
ref_hca
An object of class matrix
(inherits from array
) with 24756 rows and 12 columns.
reference expression counts from HCA
ref_hca_counts
ref_hca_counts
An object of class matrix
(inherits from array
) with 24857 rows and 12 columns.
Run a 5-state allele-only HMM - two theta levels
run_allele_hmm_s5( pAD, DP, p_s, t = 1e-05, theta_min = 0.08, gamma = 20, prior = NULL, ... )
run_allele_hmm_s5( pAD, DP, p_s, t = 1e-05, theta_min = 0.08, gamma = 20, prior = NULL, ... )
pAD |
integer vector Paternal allele counts |
DP |
integer vector Total alelle counts |
p_s |
numeric vector Phase switch probabilities |
t |
numeric Transition probability between copy number states |
theta_min |
numeric Minimum haplotype frequency deviation threshold |
gamma |
numeric Overdispersion in the allele-specific expression |
prior |
numeric vector Prior probabilities for each state |
... |
Additional parameters |
character vector Decoded states
with(bulk_example, { run_allele_hmm_s5(pAD = pAD, DP = DP, R = R, p_s = p_s, theta_min = 0.08, gamma = 30) })
with(bulk_example, { run_allele_hmm_s5(pAD = pAD, DP = DP, R = R, p_s = p_s, theta_min = 0.08, gamma = 30) })
Run 15-state joint HMM on a pseudobulk profile
run_joint_hmm_s15( pAD, DP, p_s, Y_obs = 0, lambda_ref = 0, d_total = 0, theta_min = 0.08, theta_neu = 0, bal_cnv = TRUE, phi_del = 2^(-0.25), phi_amp = 2^(0.25), phi_bamp = phi_amp, phi_bdel = phi_del, mu = 0, sig = 1, r = 0.015, t = 1e-05, gamma = 18, prior = NULL, exp_only = FALSE, allele_only = FALSE, classify_allele = FALSE, debug = FALSE, ... )
run_joint_hmm_s15( pAD, DP, p_s, Y_obs = 0, lambda_ref = 0, d_total = 0, theta_min = 0.08, theta_neu = 0, bal_cnv = TRUE, phi_del = 2^(-0.25), phi_amp = 2^(0.25), phi_bamp = phi_amp, phi_bdel = phi_del, mu = 0, sig = 1, r = 0.015, t = 1e-05, gamma = 18, prior = NULL, exp_only = FALSE, allele_only = FALSE, classify_allele = FALSE, debug = FALSE, ... )
pAD |
integer vector Paternal allele counts |
DP |
integer vector Total alelle counts |
p_s |
numeric vector Phase switch probabilities |
Y_obs |
numeric vector Observed gene counts |
lambda_ref |
numeric vector Reference expression rates |
d_total |
integer Total library size for expression counts |
theta_min |
numeric Minimum haplotype imbalance threshold |
theta_neu |
numeric Haplotype imbalance threshold for neutral state |
bal_cnv |
logical Whether to include balanced CNV states |
phi_del |
numeric Expected fold change for deletion |
phi_amp |
numeric Expected fold change for amplification |
phi_bamp |
numeric Expected fold change for balanced amplification |
phi_bdel |
numeric Expected fold change for balanced deletion |
mu |
numeric Global expression bias |
sig |
numeric Global expression variance |
r |
numeric Variant mapping bias |
t |
numeric Transition probability between copy number states |
gamma |
numeric Overdispersion in the allele-specific expression |
prior |
numeric vector Prior probabilities for each state |
exp_only |
logical Whether to only use expression data |
allele_only |
logical Whether to only use allele data |
classify_allele |
logical Whether to classify allele states |
debug |
logical Whether to print debug messages |
... |
Additional parameters |
character vector Decoded states
with(bulk_example, { run_joint_hmm_s15(pAD = pAD, DP = DP, p_s = p_s, Y_obs = Y_obs, lambda_ref = lambda_ref, d_total = na.omit(unique(d_obs)), mu = mu, sig = sig, t = 1e-5, gamma = 30, theta_min = 0.08) })
with(bulk_example, { run_joint_hmm_s15(pAD = pAD, DP = DP, p_s = p_s, Y_obs = Y_obs, lambda_ref = lambda_ref, d_total = na.omit(unique(d_obs)), mu = mu, sig = sig, t = 1e-5, gamma = 30, theta_min = 0.08) })
example CNV segments dataframe
segs_example
segs_example
An object of class data.table
(inherits from data.frame
) with 27 rows and 30 columns.
example VCF header
vcf_meta
vcf_meta
An object of class character
of length 65.