Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 7.233
Filter
Add more filters

Publication year range
1.
Am J Hum Genet ; 111(8): 1770-1781, 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-39047729

ABSTRACT

Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.


Subject(s)
Alleles , Humans , Transcriptome/genetics , Genomic Imprinting , Sequence Analysis, RNA/methods , Software , Gene Expression Profiling/methods
2.
Genet Epidemiol ; 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38533840

ABSTRACT

Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV-disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV-disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.

3.
Biostatistics ; 25(4): 1254-1272, 2024 Oct 01.
Article in English | MEDLINE | ID: mdl-38649751

ABSTRACT

CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens-"thresholded regression"-exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , CRISPR-Cas Systems/genetics , Models, Statistical
4.
Biostatistics ; 25(2): 354-384, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-36881693

ABSTRACT

Naive estimates of incidence and infection fatality rates (IFR) of coronavirus disease 2019 suffer from a variety of biases, many of which relate to preferential testing. This has motivated epidemiologists from around the globe to conduct serosurveys that measure the immunity of individuals by testing for the presence of SARS-CoV-2 antibodies in the blood. These quantitative measures (titer values) are then used as a proxy for previous or current infection. However, statistical methods that use this data to its full potential have yet to be developed. Previous researchers have discretized these continuous values, discarding potentially useful information. In this article, we demonstrate how multivariate mixture models can be used in combination with post-stratification to estimate cumulative incidence and IFR in an approximate Bayesian framework without discretization. In doing so, we account for uncertainty from both the estimated number of infections and incomplete deaths data to provide estimates of IFR. This method is demonstrated using data from the Action to Beat Coronavirus erosurvey in Canada.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Bayes Theorem , Incidence , SARS-CoV-2
5.
Biostatistics ; 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39002144

ABSTRACT

High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.

6.
Biostatistics ; 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38637995

ABSTRACT

Computed tomography (CT) has been a powerful diagnostic tool since its emergence in the 1970s. Using CT data, 3D structures of human internal organs and tissues, such as blood vessels, can be reconstructed using professional software. This 3D reconstruction is crucial for surgical operations and can serve as a vivid medical teaching example. However, traditional 3D reconstruction heavily relies on manual operations, which are time-consuming, subjective, and require substantial experience. To address this problem, we develop a novel semiparametric Gaussian mixture model tailored for the 3D reconstruction of blood vessels. This model extends the classical Gaussian mixture model by enabling nonparametric variations in the component-wise parameters of interest according to voxel positions. We develop a kernel-based expectation-maximization algorithm for estimating the model parameters, accompanied by a supporting asymptotic theory. Furthermore, we propose a novel regression method for optimal bandwidth selection. Compared to the conventional cross-validation-based (CV) method, the regression method outperforms the CV method in terms of computational and statistical efficiency. In application, this methodology facilitates the fully automated reconstruction of 3D blood vessel structures with remarkable accuracy.

7.
Biostatistics ; 25(3): 666-680, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38141227

ABSTRACT

With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.


Subject(s)
Bayes Theorem , Humans , Longitudinal Studies , Brain/physiology , Brain/diagnostic imaging , Spectroscopy, Near-Infrared/methods , Data Interpretation, Statistical , Models, Statistical , Infant , Multivariate Analysis , Biostatistics/methods
8.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36653899

ABSTRACT

Gene regulatory networks govern complex gene expression programs in various biological phenomena, including embryonic development, cell fate decisions and oncogenesis. Single-cell techniques are increasingly being used to study gene expression, providing higher resolution than traditional approaches. However, inferring a comprehensive gene regulatory network across different cell types remains a challenge. Here, we propose to construct context-dependent gene regulatory networks (CDGRNs) from single-cell RNA sequencing data utilizing both spliced and unspliced transcript expression levels. A gene regulatory network is decomposed into subnetworks corresponding to different transcriptomic contexts. Each subnetwork comprises the consensus active regulation pairs of transcription factors and their target genes shared by a group of cells, inferred by a Gaussian mixture model. We find that the union of gene regulation pairs in all contexts is sufficient to reconstruct differentiation trajectories. Functions specific to the cell cycle, cell differentiation or tissue-specific functions are enriched throughout the developmental process in each context. Surprisingly, we also observe that the network entropy of CDGRNs decreases along differentiation trajectories, indicating directionality in differentiation. Overall, CDGRN allows us to establish the connection between gene regulation at the molecular level and cell differentiation at the macroscopic level.


Subject(s)
Embryonic Development , Gene Regulatory Networks , Cell Differentiation/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Gene Expression Profiling
9.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36592058

ABSTRACT

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Normal Distribution , Bayes Theorem , Single-Cell Analysis/methods , Cluster Analysis
10.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37080761

ABSTRACT

Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.


Subject(s)
Deep Learning , Transcriptome , Bayes Theorem , Gene Expression Profiling , Cell Communication
11.
Syst Biol ; 73(2): 375-391, 2024 Jul 27.
Article in English | MEDLINE | ID: mdl-38421146

ABSTRACT

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.


Subject(s)
Classification , Phylogeny , Classification/methods , Models, Genetic , Computer Simulation , Software , Animals
12.
Cereb Cortex ; 34(10)2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39385613

ABSTRACT

Visual working memory (VWM) is a core cognitive function wherein visual information is stored and manipulated over short periods. Response errors in VWM tasks arise from the imprecise memory of target items, swaps between targets and nontargets, and random guesses. However, it remains unclear whether these types of errors are underpinned by distinct neural networks. To answer this question, we recruited 80 healthy adults to perform delayed estimation tasks and acquired their resting-state functional magnetic resonance imaging scans. The tasks required participants to reproduce the memorized visual feature along continuous scales, which, combined with mixture distribution modeling, allowed us to estimate the measures of memory precision, swap errors, and random guesses. Intrinsic functional connectivity within and between different networks, identified using a hierarchical clustering approach, was estimated for each participant. Our analyses revealed that higher memory precision was associated with increased connectivity within a frontal-opercular network, as well as between the dorsal attention network and an angular-gyrus-cerebellar network. We also found that coupling between the frontoparietal control network and the cingulo-opercular network contributes to both memory precision and random guesses. Our findings demonstrate that distinct sources of variability in VWM performance are underpinned by different yet partially overlapping intrinsic functional networks.


Subject(s)
Magnetic Resonance Imaging , Memory, Short-Term , Nerve Net , Visual Perception , Humans , Memory, Short-Term/physiology , Female , Male , Adult , Young Adult , Nerve Net/physiology , Nerve Net/diagnostic imaging , Visual Perception/physiology , Brain/physiology , Brain/diagnostic imaging , Brain Mapping/methods , Neural Pathways/physiology
13.
Mol Cell Proteomics ; 22(12): 100658, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37806340

ABSTRACT

Label-free proteomics is a fast-growing methodology to infer abundances in mass spectrometry proteomics. Extensive research has focused on spectral quantification and peptide identification. However, research toward modeling and understanding quantitative proteomics data is scarce. Here we propose a Bayesian hierarchical decision model (Baldur) to test for differences in means between conditions for proteins, peptides, and post-translational modifications. We developed a Bayesian regression model to characterize local mean-variance trends in data, to estimate measurement uncertainty and hyperparameters for the decision model. A key contribution is the development of a new gamma regression model that describes the mean-variance dependency as a mixture of a common and a latent trend-allowing for localized trend estimates. We then evaluate the performance of Baldur, limma-trend, and t test on six benchmark datasets: five total proteomics and one post-translational modification dataset. We find that Baldur drastically improves the decision in noisier post-translational modification data over limma-trend and t test. In addition, we see significant improvements using Baldur over the other methods in the total proteomics datasets. Finally, we analyzed Baldur's performance when increasing the number of replicates and found that the method always increases precision with sample size, while showing robust control of the false positive rate. We conclude that our model vastly improves over popular data analysis methods (limma-trend and t test) in several spike-in datasets by achieving a high true positive detection rate, while greatly reducing the false-positive rate.


Subject(s)
Proteins , Proteomics , Proteomics/methods , Bayes Theorem , Proteins/chemistry , Peptides/metabolism , Mass Spectrometry/methods
14.
Nano Lett ; 24(40): 12560-12567, 2024 Oct 09.
Article in English | MEDLINE | ID: mdl-39331415

ABSTRACT

Two-dimensional materials have enormous development prospects in the bulk photovoltaic effect (BPVE). The enhancement and manipulation of the BPVE are some of the key roles of its various applications. Through a simplified Hamiltonian model, this work shows that a substantial band mixture between occupied and unoccupied states could produce a large optical absorption rate with trivial topological features, resulting in a significantly enhanced shift current generation. Furthermore, this mechanism is illustrated in a realistic C3B/C3N bilayer material based on density functional theory calculation and tight-binding model. As each layer of C3B/C3N is centrosymmetric, the in-plane shift current arises from the interfacial interaction. The electron transfer between the layers gives a controllable band mixture, which offers a giant shift current reaching over ∼1500 µA/V2. In addition, we propose that interlayer sliding could reverse the in-plane shift current. Our work suggests a feasible approach for giant and switchable nonlinear optical processes.

15.
BMC Bioinformatics ; 25(1): 90, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38429687

ABSTRACT

RNA sequencing of time-course experiments results in three-way count data where the dimensions are the genes, the time points and the biological units. Clustering RNA-seq data allows to extract groups of co-expressed genes over time. After standardisation, the normalised counts of individual genes across time points and biological units have similar properties as compositional data. We propose the following procedure to suitably cluster three-way RNA-seq data: (1) pre-process the RNA-seq data by calculating the normalised expression profiles, (2) transform the data using the additive log ratio transform to map the composition in the D-part Aitchison simplex to a D - 1 -dimensional Euclidean vector, (3) cluster the transformed RNA-seq data using matrix-variate Gaussian mixture models and (4) assess the quality of the overall cluster solution and of individual clusters based on cluster separation in the transformed space using density-based silhouette information and on compactness of the cluster in the original space using cluster maps as a suitable visualisation. The proposed procedure is illustrated on RNA-seq data from fission yeast and results are also compared to an analogous two-way approach after flattening out the biological units.


Subject(s)
RNA , RNA/genetics , Sequence Analysis, RNA/methods , RNA-Seq , Base Sequence , Cluster Analysis
16.
J Proteome Res ; 23(1): 430-448, 2024 01 05.
Article in English | MEDLINE | ID: mdl-38127799

ABSTRACT

NMR-based metabolomics aims at recovering biological information by comparing spectral data from samples of biological interest and appropriate controls. Any statistical analysis performed on the data matrix relies on the proper peak alignment to produce meaningful results. Through the last decades, several peak alignment algorithms have been proposed, as well as alternatives like spectral binning or strategies for annotation and quantification, the latter depending on reference databases. Most of the alignment algorithms, mainly based on segmentation of the spectra, present limitations for regions with peak overlap or cases of frequency order exchange. Here, we present our multiplet-assisted peak alignment algorithm, a new methodology that consists of aligning peaks by matching multiplet profiles of f1 traces from J-resolved spectra. A correspondence matrix with the linked f1 traces is built, and multivariate data analysis can be performed on it to obtain useful information from the data, overcoming the issues of peak overlap and frequency crossovers. Statistical total correlation spectroscopy can be applied on the matrix as well, toward a better identification of molecules of interest. The results can be queried on one-dimensional (1D) 1H databases or can be directly coupled to our previously published Chemical Shift Multiplet Database.


Subject(s)
Magnetic Resonance Imaging , Metabolomics , Proton Magnetic Resonance Spectroscopy , Metabolomics/methods , Magnetic Resonance Spectroscopy/methods , Algorithms
17.
Pflugers Arch ; 476(11): 1761-1775, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39210062

ABSTRACT

Taste buds contain 2 types of GABA-producing cells: sour-responsive Type III cells and glial-like Type I cells. The physiological role of GABA, released by Type III cells is not fully understood. Here, we investigated the role of GABA released from Type III cells using transgenic mice lacking the expression of GAD67 in taste bud cells (Gad67-cKO mice). Immunohistochemical experiments confirmed the absence of GAD67 in Type III cells of Gad67-cKO mice. Furthermore, no difference was observed in the expression and localization of cell type markers, ectonucleoside triphosphate diphosphohydrolase 2 (ENTPD2), gustducin, and carbonic anhydrase 4 (CA4) in taste buds between wild-type (WT) and Gad67-cKO mice. Short-term lick tests demonstrated that both WT and Gad67-cKO mice exhibited normal licking behaviors to each of the five basic tastants. Gustatory nerve recordings from the chorda tympani nerve demonstrated that both WT and Gad67-cKO mice similarly responded to five basic tastants when they were applied individually. However, gustatory nerve responses to sweet-sour mixtures were significantly smaller than the sum of responses to each tastant in WT mice but not in Gad67-cKO mice. In summary, elimination of GABA signalling by sour-responsive Type III taste cells eliminates the inhibitory cell-cell interactions seen with application of sour-sweet mixtures.


Subject(s)
Glutamate Decarboxylase , Taste Buds , Taste , gamma-Aminobutyric Acid , Animals , Taste Buds/metabolism , Taste Buds/physiology , gamma-Aminobutyric Acid/metabolism , Mice , Glutamate Decarboxylase/metabolism , Glutamate Decarboxylase/genetics , Taste/physiology , Signal Transduction/physiology , Mice, Knockout , Mice, Inbred C57BL , Chorda Tympani Nerve/physiology
18.
BMC Genomics ; 25(1): 25, 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38166601

ABSTRACT

BACKGROUND: Copy number alteration (CNA) is one of the major genomic variations that frequently occur in cancers, and accurate inference of CNAs is essential for unmasking intra-tumor heterogeneity (ITH) and tumor evolutionary history. Single-cell DNA sequencing (scDNA-seq) makes it convenient to profile CNAs at single-cell resolution, and thus aids in better characterization of ITH. Despite that several computational methods have been proposed to decipher single-cell CNAs, their performance is limited in either breakpoint detection or copy number estimation due to the high dimensionality and noisy nature of read counts data. RESULTS: By treating breakpoint detection as a process to segment high dimensional read count sequence, we develop a novel method called DeepCNA for cross-cell segmentation of read count sequence and per-cell inference of CNAs. To cope with the difficulty of segmentation, an autoencoder (AE) network is employed in DeepCNA to project the original data into a low-dimensional space, where the breakpoints can be efficiently detected along each latent dimension and further merged to obtain the final breakpoints. Unlike the existing methods that manually calculate certain statistics of read counts to find breakpoints, the AE model makes it convenient to automatically learn the representations. Based on the inferred breakpoints, we employ a mixture model to predict copy numbers of segments for each cell, and leverage expectation-maximization algorithm to efficiently estimate cell ploidy by exploring the most abundant copy number state. Benchmarking results on simulated and real data demonstrate our method is able to accurately infer breakpoints as well as absolute copy numbers and surpasses the existing methods under different test conditions. DeepCNA can be accessed at: https://github.com/zhyu-lab/deepcna . CONCLUSIONS: Profiling single-cell CNAs based on deep learning is becoming a new paradigm of scDNA-seq data analysis, and DeepCNA is an enhancement to the current arsenal of computational methods for investigating cancer genomics.


Subject(s)
DNA Copy Number Variations , Neoplasms , Humans , Algorithms , Genomics/methods , Sequence Analysis, DNA , Neoplasms/genetics
19.
Neurobiol Dis ; 190: 106373, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38072165

ABSTRACT

In Alzheimer's disease (AD) research, cerebrospinal fluid (CSF) Amyloid beta (Aß), Tau and pTau are the most accepted and well validated biomarkers. Several methods and platforms exist to measure those biomarkers, leading to challenges in combining data across studies. Thus, there is a need to identify methods that harmonize and standardize these values. We used a Z-score based approach to harmonize CSF and amyloid imaging data from multiple cohorts and compared GWAS results using this approach with currently accepted methods. We also used a generalized mixture model to calculate the threshold for biomarker-positivity. Based on our findings, our normalization approach performed as well as meta-analysis and did not lead to any spurious results. In terms of dichotomization, cutoffs calculated with this approach were very similar to those reported previously. These findings show that the Z-score based harmonization approach can be applied to heterogeneous platforms and provides biomarker cut-offs consistent with the classical approaches without requiring any additional data.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/diagnostic imaging , Alzheimer Disease/genetics , Alzheimer Disease/cerebrospinal fluid , Amyloid beta-Peptides/cerebrospinal fluid , tau Proteins/genetics , tau Proteins/cerebrospinal fluid , Positron-Emission Tomography , Biomarkers/cerebrospinal fluid , Peptide Fragments/cerebrospinal fluid
20.
Am J Epidemiol ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38973734

ABSTRACT

Telomere length is associated with chronic diseases and in younger populations, may represent a biomarker of disease susceptibility. As growing evidence suggests that environmental factors, including metals, may impact telomere length, we investigated the association between 17 metals measured in toenail samples and leukocyte relative telomere length (RTL), among 472 five- to seven-year-old children enrolled in the Bangladesh Environmental Research in Children's Health (BiRCH) cohort. In single exposure linear regression models, a doubling of arsenic (As) and mercury (Hg) (µg/g) were associated with a -0.21 (95%CI: -0.032, -0.010; p=0.0005) and -0.017 (95%CI: -0.029, -0.004; p=0.006) difference in RTL, respectively. In Bayesian Kernel Machine Regression (BKMR) mixture models, the overall metal mixture was inversely associated with RTL (P-for-trend <0.001). Negative associations with RTL were observed with both log2-As and log2-Hg, while an inverted U-shaped association was observed for log2-zinc (Zn) with RTL. We found little evidence of interaction among metals. Sex-stratification identified stronger associations of the overall mixture and log2-As with RTL among females, compared to males. Our study suggests that As and Hg may independently influence RTL in mid-childhood. Further studies are needed to investigate potential long-term impacts of metal-associated telomere shortening in childhood on health outcomes in adult life.

SELECTION OF CITATIONS
SEARCH DETAIL