Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
Mol Biol Evol ; 39(7)2022 07 02.
Article in English | MEDLINE | ID: mdl-35749590

ABSTRACT

Understanding intratumor heterogeneity is critical for studying tumorigenesis and designing personalized treatments. To decompose the mixed cell population in a tumor, subclones are inferred computationally based on variant allele frequency (VAF) from bulk sequencing data. In this study, we showed that sequencing depth, mean VAF, and variance of VAF of a subclone are confounded. Without considering this effect, current methods require deep-sequencing data (>300× depth) to reliably infer subclones. Here, we present a novel algorithm that incorporates depth-variance and mean-variance dependencies in a clustering error model and successfully identifies subclones in tumors sequenced at depths of as low as 30×. We implemented the algorithm as a model-based adaptive grouping of subclones (MAGOS) method. Analyses of computer simulated data and empirical sequencing data showed that MAGOS outperformed existing methods on minimum sequencing depth, decomposition accuracy, and computation efficiency. The most prominent improvements were observed in analyzing tumors sequenced at depths between 30× and 200×, whereas the performance was comparable between MAGOS and existing methods on deeply sequenced tumors. MAGOS supports analysis of single-nucleotide variants and copy number variants from a single sample or multiple samples of a tumor. We applied MAGOS to whole-exome data of late-stage liver cancers and discovered that high subclone count in a tumor was a significant risk factor of poor prognosis. Lastly, our analysis suggested that sequencing multiple samples of the same tumor at standard depth is more cost-effective and robust for subclone characterization than deep sequencing a single sample. MAGOS is available at github (https://github.com/liliulab/magos).


Subject(s)
High-Throughput Nucleotide Sequencing , Neoplasms , DNA Copy Number Variations , Exome , Genome , High-Throughput Nucleotide Sequencing/methods , Humans , Neoplasms/genetics , Polymorphism, Single Nucleotide
2.
Bioinformatics ; 36(6): 1712-1717, 2020 03 01.
Article in English | MEDLINE | ID: mdl-32176769

ABSTRACT

MOTIVATION: Functions of cancer driver genes vary substantially across tissues and organs. Distinguishing passenger genes, oncogenes (OGs) and tumor-suppressor genes (TSGs) for each cancer type is critical for understanding tumor biology and identifying clinically actionable targets. Although many computational tools are available to predict putative cancer driver genes, resources for context-aware classifications of OGs and TSGs are limited. RESULTS: We show that the direction and magnitude of somatic selection of protein-coding mutations are significantly different for passenger genes, OGs and TSGs. Based on these patterns, we develop a new method (genes under selection in tumors) to discover OGs and TSGs in a cancer-type specific manner. Genes under selection in tumors shows a high accuracy (92%) when evaluated via strict cross-validations. Its application to 10 172 tumor exomes found known and novel cancer drivers with high tissue-specificities. In 11 out of 13 OGs shared among multiple cancer types, we found functional domains selectively engaged in different cancers, suggesting differences in disease mechanisms. AVAILABILITY AND IMPLEMENTATION: An R implementation of the GUST algorithm is available at https://github.com/liliulab/gust. A database with pre-computed results is available at https://liliulab.shinyapps.io/gust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genes, Tumor Suppressor , Neoplasms/genetics , Algorithms , Humans , Mutation , Oncogenes
3.
Comput Struct Biotechnol J ; 23: 679-687, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38292477

ABSTRACT

Gene transcription is an essential process involved in all aspects of cellular functions with significant impact on biological traits and diseases. This process is tightly regulated by multiple elements that co-operate to jointly modulate the transcription levels of target genes. To decipher the complicated regulatory network, we present a novel multi-view attention-based deep neural network that models the relationship between genetic, epigenetic, and transcriptional patterns and identifies co-operative regulatory elements (COREs). We applied this new method, named DeepCORE, to predict transcriptomes in various tissues and cell lines, which outperformed the state-of-the-art algorithms. Furthermore, DeepCORE contains an interpreter that extracts the attention values embedded in the deep neural network, maps the attended regions to putative regulatory elements, and infers COREs based on correlated attentions. The identified COREs are significantly enriched with known promoters and enhancers. Novel regulatory elements discovered by DeepCORE showed epigenetic signatures consistent with the status of histone modification marks.

4.
PeerJ Comput Sci ; 9: e1545, 2023.
Article in English | MEDLINE | ID: mdl-37705621

ABSTRACT

Background: Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods: Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results: Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.

5.
bioRxiv ; 2023 Apr 19.
Article in English | MEDLINE | ID: mdl-37131697

ABSTRACT

Gene transcription is an essential process involved in all aspects of cellular functions with significant impact on biological traits and diseases. This process is tightly regulated by multiple elements that co-operate to jointly modulate the transcription levels of target genes. To decipher the complicated regulatory network, we present a novel multi-view attention-based deep neural network that models the relationship between genetic, epigenetic, and transcriptional patterns and identifies co-operative regulatory elements (COREs). We applied this new method, named DeepCORE, to predict transcriptomes in 25 different cell lines, which outperformed the state-of-the-art algorithms. Furthermore, DeepCORE translates the attention values embedded in the neural network into interpretable information, including locations of putative regulatory elements and their correlations, which collectively implies COREs. These COREs are significantly enriched with known promoters and enhancers. Novel regulatory elements discovered by DeepCORE showed epigenetic signatures consistent with the status of histone modification marks.

6.
PeerJ ; 10: e13227, 2022.
Article in English | MEDLINE | ID: mdl-35547187

ABSTRACT

COVID-19 can be life-threatening to individuals with chronic diseases. To prevent severe outcomes, it is critical that we comprehend pre-existing molecular abnormalities found in common health conditions that predispose patients to poor prognoses. In this study, we focused on 14 pre-existing health conditions for which increased hazard ratios of COVID-19 mortality have been documented. We hypothesized that dysregulated gene expression in these pre-existing health conditions were risk factors of COVID-19 related death, and the magnitude of dysregulation (measured by fold change) were correlated with the severity of COVID-19 outcome (measured by hazard ratio). To test this hypothesis, we analyzed transcriptomics data sets archived before the pandemic in which no sample had COVID-19. For a given pre-existing health condition, we identified differentially expressed genes by comparing individuals affected by this health condition with those unaffected. Among genes differentially expressed in multiple health conditions, the fold changes of 70 upregulated genes and 181 downregulated genes were correlated with hazard ratios of COVID-19 mortality. These pre-existing dysregulations were molecular risk factors of severe COVID-19 outcomes. These genes were enriched with endoplasmic reticulum and mitochondria function, proinflammatory reaction, interferon production, and programmed cell death that participate in viral replication and innate immune responses to viral infections. Our results suggest that impaired innate immunity in pre-existing health conditions is associated with increased hazard of COVID-19 mortality. The discovered molecular risk factors are potential prognostic biomarkers and targets for therapeutic intervention.


Subject(s)
COVID-19 , Humans , COVID-19/genetics , Cross-Sectional Studies , Multimorbidity , Immunity, Innate/genetics , Risk Factors
7.
Sci Adv ; 6(11): eaaz6162, 2020 03.
Article in English | MEDLINE | ID: mdl-32195353

ABSTRACT

Non-small cell lung cancer (NSCLC) is the most commonly diagnosed cancer and the leading cause of cancer death worldwide. More than half of patients with NSCLC die after developing distant metastases, so rapid, minimally invasive prognostic biomarkers are needed to reduce mortality. We used proteomics to identify proteins differentially expressed on extracellular vesicles (EVs) of nonmetastatic 393P and metastatic 344SQ NSCLC cell lines and found that tetraspanin-8 (Tspan8) was selectively enriched on 344SQ EVs. NSCLC cell lines treated with EVs overexpressing Tspan8 also exhibited increased Matrigel invasion. Elevated Tspan8 expression on serum EVs of individuals with stage III premetastatic NSCLC tumors was also associated with reduced distant metastasis-free survival, suggesting that Tspan8 levels on serum EVs may predict future metastasis. This result suggests that a minimally invasive blood test to analyze EV expression of Tspan8 may be of potential value to guide therapeutic decisions for patients with NSCLC and merits further study.


Subject(s)
Carcinoma, Non-Small-Cell Lung/metabolism , Extracellular Vesicles/metabolism , Gene Expression Regulation, Neoplastic , Lung Neoplasms/metabolism , Neoplasm Proteins/biosynthesis , Tetraspanins/biosynthesis , Animals , Carcinoma, Non-Small-Cell Lung/pathology , Cell Line, Tumor , Extracellular Vesicles/pathology , Humans , Lung Neoplasms/pathology , Mice , Neoplasm Metastasis
SELECTION OF CITATIONS
SEARCH DETAIL