Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 34(1): 85-93, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38290978

RESUMO

The availability of single-cell sequencing (SCS) enables us to assess intra-tumor heterogeneity and identify cellular subclones without the confounding effect of mixed cells. Copy number aberrations (CNAs) have been commonly used to identify subclones in SCS data using various clustering methods, as cells comprising a subpopulation are found to share a genetic profile. However, currently available methods may generate spurious results (e.g., falsely identified variants) in the procedure of CNA detection, thereby diminishing the accuracy of subclone identification within a large, complex cell population. In this study, we developed a subclone clustering method based on a fused lasso model, referred to as FLCNA, which can simultaneously detect CNAs in single-cell DNA sequencing (scDNA-seq) data. Spike-in simulations were conducted to evaluate the clustering and CNA detection performance of FLCNA, benchmarking it against existing copy number estimation methods (SCOPE, HMMcopy) in combination with commonly used clustering methods. Application of FLCNA to a scDNA-seq data set of breast cancer revealed different genomic variation patterns in neoadjuvant chemotherapy-treated samples and pretreated samples. We show that FLCNA is a practical and powerful method for subclone identification and CNA detection with scDNA-seq data.


Assuntos
Variações do Número de Cópias de DNA , Análise de Sequência de DNA/métodos , Sequência de Bases , Análise por Conglomerados
2.
Genet Epidemiol ; 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38533840

RESUMO

Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV-disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV-disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.

3.
FASEB J ; 38(1): e23324, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38019188

RESUMO

As an independent risk factor of atrial fibrillation (AF), hypertension (HTN) can induce atrial fibrosis through cyclic stretch and hydrostatic pressure. The mechanism by which high hydrostatic pressure promotes atrial fibrosis is unclear yet. p300 and p53/Smad3 play important roles in the process of atrial fibrosis. This study investigated whether high hydrostatic pressure promotes atrial fibrosis by activating the p300/p53/Smad3 pathway. Biochemical experiments were used to study the expression of p300/p53/Smad3 pathway in left atrial appendage (LAA) tissues of patients with sinus rhythm (SR), AF, AF + HTN, and C57/BL6 mice, hypertensive C57/BL6 mice and atrial fibroblasts of mice. To investigate the roles of p300 and p53 in the process of atrial fibrosis, p300 and p53 in mice atrial fibroblasts were knocked in or knocked down, respectively. The expression of p300/p53/Smad3 and fibrotic factors was higher in patients with AF and AF + HTN than those with SR only. The expressions of p300/p53/Smad3 and fibrotic factors increased in hypertensive mice. Curcumin (Cur) and knocking down of p300 reversed the expressions of these factors. 40 mmHg hydrostatic pressure/overexpression of p300 upregulated the expressions of p300/p53/Smad3 and fibrotic factors in mice LAA fibroblasts. While Cur or knocking down p300 reversed these changes. Knocking down/overexpression of p53, the expressions of p53/Smad3 and fibrotic factors also decreased/increased, correspondingly. High hydrostatic pressure promotes atrial fibrosis by activating the p300/p53/Smad3 pathway, which further increases the susceptibility to AF.


Assuntos
Fibrilação Atrial , Hipertensão , Animais , Humanos , Camundongos , Fibrilação Atrial/etiologia , Curcumina , Fibrose , Átrios do Coração , Pressão Hidrostática , Proteína Supressora de Tumor p53/genética
4.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36326081

RESUMO

Gene expression in mammalian cells is inherently stochastic and mRNAs are synthesized in discrete bursts. Single-cell transcriptomics provides an unprecedented opportunity to explore the transcriptome-wide kinetics of transcriptional bursting. However, current analysis methods provide limited accuracy in bursting inference due to substantial noise inherent to single-cell transcriptomic data. In this study, we developed BISC, a Bayesian method for inferring bursting parameters from single cell transcriptomic data. Based on a beta-gamma-Poisson model, BISC modeled the mean-variance dependency to achieve accurate estimation of bursting parameters from noisy data. Evaluation based on both simulation and real intron sequential RNA fluorescence in situ hybridization data showed improved accuracy and reliability of BISC over existing methods, especially for genes with low expression values. Further application of BISC found bursting frequency but not bursting size was strongly associated with gene expression regulation. Moreover, our analysis provided new mechanistic insights into the functional role of enhancer and superenhancer by modulating both bursting frequency and size. BISC also formulated a downstream framework to identify differential bursting (in frequency and size separately) genes in samples under different conditions. Applying to multiple datasets (a mouse embryonic cell and fibroblast dataset, a human immune cell dataset and a human pancreatic cell dataset), BISC identified known cell-type signature genes that were missed by differential expression analysis, providing additional insights in understanding the cell-specific stochastic gene transcription. Applying to datasets of human lung and colon cancers, BISC successfully detected tumor signature genes based on alterations in bursting kinetics, which illustrates its value in understanding disease development regarding transcriptional bursting. Collectively, BISC provides a new tool for accurately inferring bursting kinetics and detecting differential bursting genes. This study also produced new insights in the role of transcriptional bursting in regulating gene expression, cell identity and tumor progression.


Assuntos
Neoplasias , Transcriptoma , Animais , Humanos , Camundongos , Hibridização in Situ Fluorescente , Reprodutibilidade dos Testes , Teorema de Bayes , Cinética , Transcrição Gênica , Mamíferos/genética
5.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34114005

RESUMO

Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.


Assuntos
Variações do Número de Cópias de DNA , Estudos de Associação Genética , Predisposição Genética para Doença , Testes Genéticos , Genômica/métodos , Algoritmos , Biologia Computacional/métodos , Estudos de Associação Genética/métodos , Testes Genéticos/métodos , Humanos , Software , Sequenciamento do Exoma , Fluxo de Trabalho
6.
Bioinformatics ; 38(5): 1304-1311, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34874992

RESUMO

MOTIVATION: Recent advancements in single-cell RNA sequencing (scRNA-seq) have enabled time-efficient transcriptome profiling in individual cells. To optimize sequencing protocols and develop reliable analysis methods for various application scenarios, solid simulation methods for scRNA-seq data are required. However, due to the noisy nature of scRNA-seq data, currently available simulation methods cannot sufficiently capture and simulate important properties of real data, especially the biological variation. In this study, we developed scRNA-seq information producer (SCRIP), a novel simulator for scRNA-seq that is accurate and enables simulation of bursting kinetics. RESULTS: Compared to existing simulators, SCRIP showed a significantly higher accuracy of stimulating key data features, including mean-variance dependency in all experiments. SCRIP also outperformed other methods in recovering cell-cell distances. The application of SCRIP in evaluating differential expression analysis methods showed that edgeR outperformed other examined methods in differential expression analyses, and ZINB-WaVE improved the AUC at high dropout rates. Collectively, this study provides the research community with a rigorous tool for scRNA-seq data simulation. AVAILABILITY AND IMPLEMENTATION: https://CRAN.R-project.org/package=SCRIP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Célula Única , Software , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , RNA
7.
Stat Med ; 42(28): 5266-5284, 2023 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-37715500

RESUMO

In recent years, comprehensive cancer genomics platforms, such as The Cancer Genome Atlas (TCGA), provide access to an enormous amount of high throughput genomic datasets for each patient, including gene expression, DNA copy number alterations, DNA methylation, and somatic mutation. While the integration of these multi-omics datasets has the potential to provide novel insights that can lead to personalized medicine, most existing approaches only focus on gene-level analysis and lack the ability to facilitate biological findings at the pathway-level. In this article, we propose Bayes-InGRiD (Bayesian Integrative Genomics Robust iDentification of cancer subgroups), a novel pathway-guided Bayesian sparse latent factor model for the simultaneous identification of cancer patient subgroups (clustering) and key molecular features (variable selection) within a unified framework, based on the joint analysis of continuous, binary, and count data. By utilizing pathway (gene set) information, Bayes-InGRiD does not only enhance the accuracy and robustness of cancer patient subgroup and key molecular feature identification, but also promotes biological understanding and interpretation. Finally, to facilitate an efficient posterior sampling, an alternative Gibbs sampler for logistic and negative binomial models is proposed using Pólya-Gamma mixtures of normal to represent latent variables for binary and count data, which yields a conditionally Gaussian representation of the posterior. The R package "INGRID" implementing the proposed approach is currently available in our research group GitHub webpage (https://dongjunchung.github.io/INGRID/).


Assuntos
Genômica , Neoplasias , Humanos , Teorema de Bayes , Neoplasias/genética , Modelos Estatísticos , Metilação de DNA
8.
BMC Pediatr ; 23(1): 120, 2023 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-36927328

RESUMO

BACKGROUND: Fibroblast growth factor 19 (FGF19) takes part in maintaining the balance of glycolipids and may be involved in complications of type 1 diabetes(T1D) in children. This study aimed at at evaluating the relationship among the levels of serum FGF19 and vascular endothelial growth factor(VEGF)and soluble klotho protein(sklotho) in type 1 diabetic children. METHODS: In a cross-section single center study samples were obtained from 96 subjects: 66 T1D and 30 healthy children.Serum FGF19 and VEGF and sklotho concentrations were measured by ELISA. And 66 type 1 diabetes participants were divided into two groups according to T1D duration or three groups according to HbA1c.Furthermore,we compared the serum levels of FGF19 and VEGF and sklotho in different groups. RESULTS: The concentration of FGF19 was lower in T1D than in the controls(226.52 ± 20.86pg/mu vs.240.08 ± 23.53 pg/L, p = 0.03),while sklotho was also lower in T1D than in the controls (2448.67 ± 791.92pg/mL vs. 3083.55 ± 1113.47pg/mL, p = 0.011). In contrast, VEGF levels were higher in diabetic patients than in controls (227.95 ± 48.65pg/mL vs. 205.92 ± 28.27 pg/mL, p = 0.016). In T1D, FGF19 and VEGF and sklotho was not correlated with the duration of diabetes. FGF19 and VEGF and sklotho were correlated with HbA1c (r=-0.349, p = 0.004 and r = 0.302, p = 0.014 and r=-0.342, p = 0.005, respectively), but not with blood glucose and lipid. Among subjects in the T1D group, concentrations of FGF19,VEGF and sklotho protein were different between different groups according to the degree of HbA1c(P < 0.005).Furthermore, there was a positive correlation between the serum FGF19 concentration and sklotho levels (r = 0.247,p = 0.045), and a negative correlation between the serum FGF19 concentration and VEGF level(r=-0.335,P = 0.006). CONCLUSIONS: The serum FGF19 levels have a close relation with serum VEGF levels and sklotho levels among T1D subjects. FGF19 may be involved in the development of complications in children with type 1 diabetes through interaction with VEGF and sklotho.


Assuntos
Diabetes Mellitus Tipo 1 , Fator A de Crescimento do Endotélio Vascular , Humanos , Criança , Glucuronidase , Hemoglobinas Glicadas , Fatores de Crescimento do Endotélio Vascular , Fatores de Crescimento de Fibroblastos
10.
Bioinformatics ; 37(3): 312-317, 2021 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-32805016

RESUMO

MOTIVATION: Copy number variation plays important roles in human complex diseases. The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation. Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.e. boundary of CNVs). However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions, such as linkage disequilibrium (LD). Our study showed that the LD structure is related to the location distribution of CNVs, which indeed presents a non-random pattern on the genome. To generate more accurate CNVs, we proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic dependence structure (i.e. LD). RESULTS: We theoretically demonstrated the correlation structure of CNV data in SNP array, which further supports the necessity of integrating biological structure in statistical methods for CNV detection. Therefore, we developed the LDcnv that integrated the genomic correlation structure with a local search strategy into statistical modeling of the CNV intensities. To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets. We showed that LDcnv presented high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods. This new segmentation algorithm has a wide scope of potential application with data from various high-throughput technology platforms. AVAILABILITY AND IMPLEMENTATION: https://github.com/FeifeiXiaoUSC/LDcnv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variações do Número de Cópias de DNA , Genômica , Algoritmos , Genoma Humano , Humanos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único
11.
Hum Brain Mapp ; 42(6): 1682-1698, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33377592

RESUMO

Recent studies have combined multiple neuroimaging modalities to gain further understanding of the neurobiological substrates of aphasia. Following this line of work, the current study uses machine learning approaches to predict aphasia severity and specific language measures based on a multimodal neuroimaging dataset. A total of 116 individuals with chronic left-hemisphere stroke were included in the study. Neuroimaging data included task-based functional magnetic resonance imaging (fMRI), diffusion-based fractional anisotropy (FA)-values, cerebral blood flow (CBF), and lesion-load data. The Western Aphasia Battery was used to measure aphasia severity and specific language functions. As a primary analysis, we constructed support vector regression (SVR) models predicting language measures based on (i) each neuroimaging modality separately, (ii) lesion volume alone, and (iii) a combination of all modalities. Prediction accuracy across models was subsequently statistically compared. Prediction accuracy across modalities and language measures varied substantially (predicted vs. empirical correlation range: r = .00-.67). The multimodal prediction model yielded the most accurate prediction in all cases (r = .53-.67). Statistical superiority in favor of the multimodal model was achieved in 28/30 model comparisons (p-value range: <.001-.046). Our results indicate that different neuroimaging modalities carry complementary information that can be integrated to more accurately depict how brain damage and remaining functionality of intact brain tissue translate into language function in aphasia.


Assuntos
Afasia/diagnóstico , Imageamento por Ressonância Magnética , Neuroimagem , Máquina de Vetores de Suporte , Adulto , Idoso , Idoso de 80 Anos ou mais , Afasia/etiologia , Afasia/patologia , Afasia/fisiopatologia , Circulação Cerebrovascular/fisiologia , Doença Crônica , Imagem de Tensor de Difusão , Feminino , Neuroimagem Funcional , Humanos , Testes de Linguagem , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Imagem Multimodal , Neuroimagem/métodos , Avaliação de Resultados em Cuidados de Saúde , Índice de Gravidade de Doença , Acidente Vascular Cerebral/complicações
12.
Int J Mol Sci ; 22(4)2021 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-33546390

RESUMO

Cancer remains the second leading cause of death all over the world. Aberrant expression of miRNA has shown diagnostic and prognostic value in many kinds of cancer. This study aims to provide a novel strategy to identify reliable miRNA signatures and develop improved cancer prognostic models from reported cancer-associated miRNAs. We proposed a new cluster-based approach to identify distinct cluster(s) of cancers and corresponding miRNAs. Further, with samples from TCGA and other independent studies, we identified prognostic markers and validated their prognostic value in prediction models. We also performed KEGG pathway analysis to investigate the functions of miRNAs associated with the cancer cluster of interest. A distinct cluster with 28 cancers and 146 associated miRNAs was identified. This cluster was enriched by digestive system cancers. Further, we screened out 8 prognostic miRNA signatures for STAD, 5 for READ, 18 for PAAD, 24 for LIHC, 12 for ESCA and 18 for COAD. These identified miRNA signatures demonstrated strong abilities in discriminating the overall survival time between high-risk group and low-risk group (p-value < 0.05) in both TCGA training and test datasets, as well as four independent Gene Expression Omnibus (GEO) validation datasets. We also demonstrated that these cluster-based miRNA signatures are superior to signatures identified in single cancers for prognosis. Our study identified significant miRNA signatures with improved prognosis accuracy in digestive system cancers. It also provides a novel method/strategy for cancer prognostic marker selection and offers valuable methodological directions to similar research topics.


Assuntos
Neoplasias do Sistema Digestório/genética , Neoplasias do Sistema Digestório/mortalidade , Perfilação da Expressão Gênica , MicroRNAs/genética , Transcriptoma , Biomarcadores Tumorais , Análise por Conglomerados , Biologia Computacional/métodos , Neoplasias do Sistema Digestório/diagnóstico , Regulação Neoplásica da Expressão Gênica , Humanos , Estimativa de Kaplan-Meier , Prognóstico , Interferência de RNA , Curva ROC
13.
Int J Mol Sci ; 22(17)2021 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-34502134

RESUMO

The current spreading coronavirus SARS-CoV-2 is highly infectious and pathogenic. In this study, we screened the gene expression of three host receptors (ACE2, DC-SIGN and L-SIGN) of SARS coronaviruses and dendritic cells (DCs) status in bulk and single cell transcriptomic datasets of upper airway, lung or blood of COVID-19 patients and healthy controls. In COVID-19 patients, DC-SIGN gene expression was interestingly decreased in lung DCs but increased in blood DCs. Within DCs, conventional DCs (cDCs) were depleted while plasmacytoid DCs (pDCs) were augmented in the lungs of mild COVID-19. In severe cases, we identified augmented types of immature DCs (CD22+ or ANXA1+ DCs) with MHCII downregulation. In this study, our observation indicates that DCs in severe cases stimulate innate immune responses but fail to specifically present SARS-CoV-2. It provides insights into the profound modulation of DC function in severe COVID-19.


Assuntos
COVID-19/imunologia , Moléculas de Adesão Celular/genética , Células Dendríticas/imunologia , Regulação da Expressão Gênica/imunologia , Lectinas Tipo C/genética , Receptores de Superfície Celular/genética , SARS-CoV-2/imunologia , Enzima de Conversão de Angiotensina 2/genética , Enzima de Conversão de Angiotensina 2/metabolismo , COVID-19/diagnóstico , COVID-19/patologia , COVID-19/virologia , Moléculas de Adesão Celular/metabolismo , Conjuntos de Dados como Assunto , Células Dendríticas/metabolismo , Estudo de Associação Genômica Ampla , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/imunologia , Humanos , Imunidade Inata , Lectinas Tipo C/metabolismo , Pulmão/imunologia , Pulmão/patologia , Pulmão/virologia , Análise da Randomização Mendeliana , Nasofaringe/imunologia , Nasofaringe/patologia , Nasofaringe/virologia , RNA-Seq , Receptores de Superfície Celular/metabolismo , Índice de Gravidade de Doença , Análise de Célula Única
14.
Hum Genet ; 139(8): 1107-1117, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32270270

RESUMO

Extensive studies have been conducted on the analysis of genome function, especially on the expression quantitative trait loci (eQTL). These studies offered promising results for characterization of the functional sequencing variation and understanding of the basic processes of gene regulation. Parent of origin effect (POE) is an important epigenetic phenomenon describing that the expression of certain genes depends on their allelic parent-of-origin and it is known to play important roles in human complex diseases. However, traditional eQTL mapping approaches do not allow for the detection of imprinting, or they focus on modeling the additive genetic effect thereby ignoring the estimation of the dominance genetic effect. In this study, we proposed a statistical framework to test the additive and dominance genetic effects of the candidate eQTLs along with detection of the POE with a functional model and an orthogonal model for RNA-seq data. We demonstrated the desirable power and preserved Type I errors of the methods in most scenarios, especially the orthogonal model with un-biased estimation of the genetic effects and over-dispersion of the RNA-seq data. The application to a HapMap project trio dataset validated existing imprinting genes and discovered two novel imprinting genes with potential dominance genetic effect and RB1 and IGF1R genes. This study provides new insights into the next generation statistical modeling of eQTL mapping for better understanding of the genetic architecture underlying the mechanisms of gene expression regulation.


Assuntos
Regulação da Expressão Gênica/genética , Impressão Genômica/genética , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Alelos , Criança , Simulação por Computador , Família , Feminino , Genes Dominantes/genética , Genótipo , Projeto HapMap , Humanos , Masculino , Pais , RNA-Seq
15.
Cancer Immunol Immunother ; 69(9): 1881-1890, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32372138

RESUMO

BACKGROUND: Lung adenocarcinoma (LUAD) has become the most frequent histologic type of lung cancer in the past several decades. Recent successes with immune checkpoint blockade therapy have demonstrated that the manipulation of the immune system is a very potent treatment for LUAD. This study aims to explore the role of immune-related genes in the development of LUAD and establish a signature that can predict overall survival for LUAD patients. METHODS: To identify the differential expression genes (DEGs) between normal and tumor tissues, we developed an analysis strategy to combine an independent-sample design and a paired-sample design using RNA-seq transcriptomic profiling data of The Cancer Genome Atlas LUAD samples. Further, we selected prognostic markers from DEGs and evaluated their prognostic value in a prediction model. RESULTS: We identified and validated PD1, PDL1 and CTLA4 genes as prognostic markers, which are well-known immune checkpoints, and revealed two new potential prognostic immune checkpoints for LUAD, HHLA2 (logFC = 2.55, FDR = 1.89 × 10-6) and VTCN1 (logFC = -2.86, FDR = 1.72 × 10-11). Furthermore, we identified an 18-gene LUAD prognostic biomarker panel and observed that the classified high-risk group presented a significantly shorter overall survival time (HR = 3.57, p value = 4.07 × 10-10). The prediction model was validated in five independent high-throughput gene expression datasets. CONCLUSIONS: The identified DEG features may serve as potential biomarkers for prognosis prediction of LUAD patients and immunotherapy. Based on that assumption, we identified a gene expression-based immune signature for lung adenocarcinoma prognosis.


Assuntos
Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/imunologia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/imunologia , Transcriptoma/genética , Transcriptoma/imunologia , Idoso , Biomarcadores Tumorais/imunologia , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Regulação Neoplásica da Expressão Gênica/imunologia , Humanos , Masculino , Prognóstico
16.
Bioinformatics ; 35(17): 2891-2898, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30649252

RESUMO

MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Estudo de Associação Genômica Ampla , Alelos , Interpretação Estatística de Dados , Polimorfismo de Nucleotídeo Único , Sensibilidade e Especificidade , Software
17.
Bioinformatics ; 33(15): 2384-2385, 2017 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-28453611

RESUMO

SUMMARY: Chromosomal copy number variation (CNV) refers to a polymorphism that a DNA segment presents deletion or duplication in the population. The computational algorithms developed to identify this type of variation are usually of high computational complexity. Here we present a user-friendly R package, modSaRa, designed to perform copy number variants identification. The package is developed based on a change-point based method with optimal computational complexity and desirable accuracy. The current version of modSaRa package is a comprehensive tool with integration of preprocessing steps and main CNV calling steps. AVAILABILITY AND IMPLEMENTATION: modSaRa is an R package written in R, C ++ and Rcpp and is now freely available for download at http://c2s2.yale.edu/software/modSaRa . CONTACT: heping.zhang@yale.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variações do Número de Cópias de DNA , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genômica/métodos , Humanos
18.
BMC Bioinformatics ; 18(1): 364, 2017 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-28793860

RESUMO

BACKGROUND: Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. RESULTS: In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. CONCLUSIONS: The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.


Assuntos
RNA/química , Análise de Sequência de RNA , Linhagem Celular , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Teóricos , RNA/metabolismo , Reação em Cadeia da Polimerase em Tempo Real
19.
Bioinformatics ; 31(9): 1341-8, 2015 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-25542927

RESUMO

MOTIVATION: Copy number variation (CNV) is a type of structural variation, usually defined as genomic segments that are 1 kb or larger, which present variable copy numbers when compared with a reference genome. The screening and ranking algorithm (SaRa) was recently proposed as an efficient approach for multiple change-points detection, which can be applied to CNV detection. However, some practical issues arise from application of SaRa to single nucleotide polymorphism data. RESULTS: In this study, we propose a modified SaRa on CNV detection to address these issues. First, we use the quantile normalization on the original intensities to guarantee that the normal mean model-based SaRa is a robust method. Second, a novel normal mixture model coupled with a modified Bayesian information criterion is proposed for candidate change-point selection and further clustering the potential CNV segments to copy number states. Simulations revealed that the modified SaRa became a robust method for identifying change-points and achieved better performance than the circular binary segmentation (CBS) method. By applying the modified SaRa to real data from the HapMap project, we illustrated its performance on detecting CNV segments. In conclusion, our modified SaRa method improves SaRa theoretically and numerically, for identifying CNVs with high-throughput genotyping data. AVAILABILITY AND IMPLEMENTATION: The modSaRa package is implemented in R program and freely available at http://c2s2.yale.edu/software/modSaRa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Análise de Sequência de DNA/métodos , Teorema de Bayes , Análise por Conglomerados , Genoma Humano , Genômica , Técnicas de Genotipagem , Projeto HapMap , Humanos , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA