Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.879
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Am J Hum Genet ; 111(6): 1006-1017, 2024 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-38703768

RESUMO

We present shaPRS, a method that leverages widespread pleiotropy between traits or shared genetic effects across ancestries, to improve the accuracy of polygenic scores. The method uses genome-wide summary statistics from two diseases or ancestries to improve the genetic effect estimate and standard error at SNPs where there is homogeneity of effect between the two datasets. When there is significant evidence of heterogeneity, the genetic effect from the disease or population closest to the target population is maintained. We show via simulation and a series of real-world examples that shaPRS substantially enhances the accuracy of polygenic risk scores (PRSs) for complex diseases and greatly improves PRS performance across ancestries. shaPRS is a PRS pre-processing method that is agnostic to the actual PRS generation method, and as a result, it can be integrated into existing PRS generation pipelines and continue to be applied as more performant PRS methods are developed over time.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Herança Multifatorial/genética , Humanos , Modelos Genéticos , Simulação por Computador , Pleiotropia Genética , Fenótipo
2.
Development ; 151(3)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38345326

RESUMO

Morphogen gradients provide essential positional information to gene networks through their spatially heterogeneous distribution, yet how they form is still hotly contested, with multiple models proposed for different systems. Here, we focus on the transcription factor Bicoid (Bcd), a morphogen that forms an exponential gradient across the anterior-posterior (AP) axis of the early Drosophila embryo. Using fluorescence correlation spectroscopy we find there are spatial differences in Bcd diffusivity along the AP axis, with Bcd diffusing more rapidly in the posterior. We establish that such spatially varying differences in Bcd dynamics are sufficient to explain how Bcd can have a steep exponential gradient in the anterior half of the embryo and yet still have an observable fraction of Bcd near the posterior pole. In the nucleus, we demonstrate that Bcd dynamics are impacted by binding to DNA. Addition of the Bcd homeodomain to eGFP::NLS qualitatively replicates the Bcd concentration profile, suggesting this domain regulates Bcd dynamics. Our results reveal how a long-range gradient can form while retaining a steep profile through much of its range.


Assuntos
Proteínas de Drosophila , Proteínas de Homeodomínio , Animais , Padronização Corporal/genética , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Embrião não Mamífero/metabolismo , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Transativadores/genética , Transativadores/metabolismo
3.
Proc Natl Acad Sci U S A ; 121(14): e2305297121, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38551842

RESUMO

The causal connectivity of a network is often inferred to understand network function. It is arguably acknowledged that the inferred causal connectivity relies on the causality measure one applies, and it may differ from the network's underlying structural connectivity. However, the interpretation of causal connectivity remains to be fully clarified, in particular, how causal connectivity depends on causality measures and how causal connectivity relates to structural connectivity. Here, we focus on nonlinear networks with pulse signals as measured output, e.g., neural networks with spike output, and address the above issues based on four commonly utilized causality measures, i.e., time-delayed correlation coefficient, time-delayed mutual information, Granger causality, and transfer entropy. We theoretically show how these causality measures are related to one another when applied to pulse signals. Taking a simulated Hodgkin-Huxley network and a real mouse brain network as two illustrative examples, we further verify the quantitative relations among the four causality measures and demonstrate that the causal connectivity inferred by any of the four well coincides with the underlying network structural connectivity, therefore illustrating a direct link between the causal and structural connectivity. We stress that the structural connectivity of pulse-output networks can be reconstructed pairwise without conditioning on the global information of all other nodes in a network, thus circumventing the curse of dimensionality. Our framework provides a practical and effective approach for pulse-output network reconstruction.

4.
Proc Natl Acad Sci U S A ; 121(28): e2321193121, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38954549

RESUMO

Iron antimonide (FeSb2) has been investigated for decades due to its puzzling electronic properties. It undergoes the temperature-controlled transition from an insulator to an ill-defined metal, with a cross-over from diamagnetism to paramagnetism. Extensive efforts have been made to uncover the underlying mechanism, but a consensus has yet to be reached. While macroscopic transport and magnetic measurements can be explained by different theoretical proposals, the essential spectroscopic evidence required to distinguish the physical origin is missing. In this paper, through the use of X-ray absorption spectroscopy and atomic multiplet simulations, we have observed the mixed spin states of 3d 6 configuration in FeSb2. Furthermore, we reveal that the enhancement of the conductivity, whether induced by temperature or doping, is characterized by populating the high-spin state from the low-spin state. Our work constitutes vital spectroscopic evidence that the electrical/magnetical transition in FeSb2 is directly associated with the spin-state excitation.

5.
Proc Natl Acad Sci U S A ; 121(32): e2409676121, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39074273

RESUMO

Fragment correlation mass spectrometry correlates ion pairs generated from the same fragmentation pathway, achieved by covariance mapping of tandem mass spectra generated with an unmodified linear ion trap without preseparation. We enable the identification of different precursors at different charge states in a complex mixture from a large isolation window, empowering an analytical approach for data-independent acquisition. The method resolves and matches isobaric fragments, internal ions, and disulfide bond fragments. We suggest that this method represents a major advance for analyzing structures of biopolymers in mixtures.

6.
Proc Natl Acad Sci U S A ; 121(31): e2401162121, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39042671

RESUMO

Nonequilibrium states in soft condensed matter require a systematic approach to characterize and model materials, enhancing predictability and applications. Among the tools, X-ray photon correlation spectroscopy (XPCS) provides exceptional temporal and spatial resolution to extract dynamic insight into the properties of the material. However, existing models might overlook intricate details. We introduce an approach for extracting the transport coefficient, denoted as [Formula: see text], from the XPCS studies. This coefficient is a fundamental parameter in nonequilibrium statistical mechanics and is crucial for characterizing transport processes within a system. Our method unifies the Green-Kubo formulas associated with various transport coefficients, including gradient flows, particle-particle interactions, friction matrices, and continuous noise. We achieve this by integrating the collective influence of random and systematic forces acting on the particles within the framework of a Markov chain. We initially validated this method using molecular dynamics simulations of a system subjected to changes in temperatures over time. Subsequently, we conducted further verification using experimental systems reported in the literature and known for their complex nonequilibrium characteristics. The results, including the derived [Formula: see text] and other relevant physical parameters, align with the previous observations and reveal detailed dynamical information in nonequilibrium states. This approach represents an advancement in XPCS analysis, addressing the growing demand to extract intricate nonequilibrium dynamics. Further, the methods presented are agnostic to the nature of the material system and can be potentially expanded to hard condensed matter systems.

7.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38856173

RESUMO

Multivariate analysis is becoming central in studies investigating high-throughput molecular data, yet, some important features of these data are seldom explored. Here, we present MANOCCA (Multivariate Analysis of Conditional CovAriance), a powerful method to test for the effect of a predictor on the covariance matrix of a multivariate outcome. The proposed test is by construction orthogonal to tests based on the mean and variance and is able to capture effects that are missed by both approaches. We first compare the performances of MANOCCA with existing correlation-based methods and show that MANOCCA is the only test correctly calibrated in simulation mimicking omics data. We then investigate the impact of reducing the dimensionality of the data using principal component analysis when the sample size is smaller than the number of pairwise covariance terms analysed. We show that, in many realistic scenarios, the maximum power can be achieved with a limited number of components. Finally, we apply MANOCCA to 1000 healthy individuals from the Milieu Interieur cohort, to assess the effect of health, lifestyle and genetic factors on the covariance of two sets of phenotypes, blood biomarkers and flow cytometry-based immune phenotypes. Our analyses identify significant associations between multiple factors and the covariance of both omics data.


Assuntos
Análise de Componente Principal , Humanos , Análise Multivariada , Biologia Computacional/métodos , Fenótipo , Algoritmos , Genômica/métodos , Biomarcadores/sangue , Simulação por Computador
8.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38888456

RESUMO

MOTIVATION: The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes. RESULTS: We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions. AVAILABILITY AND IMPLEMENTATION: The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.


Assuntos
Algoritmos , Humanos , Biologia Computacional/métodos , Genômica/métodos , Perfilação da Expressão Gênica/métodos , Análise Multivariada
9.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38856170

RESUMO

In the application of genomic prediction, a situation often faced is that there are multiple populations in which genomic prediction (GP) need to be conducted. A common way to handle the multi-population GP is simply to combine the multiple populations into a single population. However, since these populations may be subject to different environments, there may exist genotype-environment interactions which may affect the accuracy of genomic prediction. In this study, we demonstrated that multi-trait genomic best linear unbiased prediction (MTGBLUP) can be used for multi-population genomic prediction, whereby the performances of a trait in different populations are regarded as different traits, and thus multi-population prediction is regarded as multi-trait prediction by employing the between-population genetic correlation. Using real datasets, we proved that MTGBLUP outperformed the conventional multi-population model that simply combines different populations together. We further proposed that MTGBLUP can be improved by partitioning the global between-population genetic correlation into local genetic correlations (LGC). We suggested two LGC models, LGC-model-1 and LGC-model-2, which partition the genome into regions with and without significant LGC (LGC-model-1) or regions with and without strong LGC (LGC-model-2). In analysis of real datasets, we demonstrated that the LGC models could increase universally the prediction accuracy and the relative improvement over MTGBLUP reached up to 163.86% (25.64% on average).


Assuntos
Genômica , Modelos Genéticos , Genômica/métodos , Genética Populacional/métodos , Locos de Características Quantitativas , Humanos , Algoritmos , Genótipo
10.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38436559

RESUMO

A wide range of approaches can be used to detect micro RNA (miRNA)-target gene pairs (mTPs) from expression data, differing in the ways the gene and miRNA expression profiles are calculated, combined and correlated. However, there is no clear consensus on which is the best approach across all datasets. Here, we have implemented multiple strategies and applied them to three distinct rare disease datasets that comprise smallRNA-Seq and RNA-Seq data obtained from the same samples, obtaining mTPs related to the disease pathology. All datasets were preprocessed using a standardized, freely available computational workflow, DEG_workflow. This workflow includes coRmiT, a method to compare multiple strategies for mTP detection. We used it to investigate the overlap of the detected mTPs with predicted and validated mTPs from 11 different databases. Results show that there is no clear best strategy for mTP detection applicable to all situations. We therefore propose the integration of the results of the different strategies by selecting the one with the highest odds ratio for each miRNA, as the optimal way to integrate the results. We applied this selection-integration method to the datasets and showed it to be robust to changes in the predicted and validated mTP databases. Our findings have important implications for miRNA analysis. coRmiT is implemented as part of the ExpHunterSuite Bioconductor package available from https://bioconductor.org/packages/ExpHunterSuite.


Assuntos
MicroRNAs , Consenso , Bases de Dados Factuais , MicroRNAs/genética , Razão de Chances , RNA-Seq
11.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38487851

RESUMO

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular heterogeneity through high-throughput analysis of individual cells. Nevertheless, challenges arise from prevalent sequencing dropout events and noise effects, impacting subsequent analyses. Here, we introduce a novel algorithm, Single-cell Gene Importance Ranking (scGIR), which utilizes a single-cell gene correlation network to evaluate gene importance. The algorithm transforms single-cell sequencing data into a robust gene correlation network through statistical independence, with correlation edges weighted by gene expression levels. We then constructed a random walk model on the resulting weighted gene correlation network to rank the importance of genes. Our analysis of gene importance using PageRank algorithm across nine authentic scRNA-seq datasets indicates that scGIR can effectively surmount technical noise, enabling the identification of cell types and inference of developmental trajectories. We demonstrated that the edges of gene correlation, weighted by expression, play a critical role in enhancing the algorithm's performance. Our findings emphasize that scGIR outperforms in enhancing the clustering of cell subtypes, reverse identifying differentially expressed marker genes, and uncovering genes with potential differential importance. Overall, we proposed a promising method capable of extracting more information from single-cell RNA sequencing datasets, potentially shedding new lights on cellular processes and disease mechanisms.


Assuntos
Redes Reguladoras de Genes , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
12.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581416

RESUMO

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Assuntos
Redes Reguladoras de Genes , Neoplasias Hepáticas , Humanos , Biologia de Sistemas/métodos , Transcriptoma , Algoritmos , Biologia Computacional/métodos
13.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38904542

RESUMO

The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.


Assuntos
Neoplasias da Mama , Aprendizado de Máquina , Humanos , Neoplasias da Mama/genética , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/metabolismo , Feminino , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Algoritmos , Antineoplásicos/uso terapêutico , Antineoplásicos/farmacologia , Biologia Computacional/métodos , Genômica/métodos
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38348747

RESUMO

Integrating and analyzing multiple omics data sets, including genomics, proteomics and radiomics, can significantly advance researchers' comprehensive understanding of Alzheimer's disease (AD). However, current methodologies primarily focus on the main effects of genetic variation and protein, overlooking non-additive effects such as genotype-protein interaction (GPI) and correlation patterns in brain imaging genetics studies. Importantly, these non-additive effects could contribute to intermediate imaging phenotypes, finally leading to disease occurrence. In general, the interaction between genetic variations and proteins, and their correlations are two distinct biological effects, and thus disentangling the two effects for heritable imaging phenotypes is of great interest and need. Unfortunately, this issue has been largely unexploited. In this paper, to fill this gap, we propose $\textbf{M}$ulti-$\textbf{T}$ask $\textbf{G}$enotype-$\textbf{P}$rotein $\textbf{I}$nteraction and $\textbf{C}$orrelation disentangling method ($\textbf{MT-GPIC}$) to identify GPI and extract correlation patterns between them. To ensure stability and interpretability, we use novel and off-the-shelf penalties to identify meaningful genetic risk factors, as well as exploit the interconnectedness of different brain regions. Additionally, since computing GPI poses a high computational burden, we develop a fast optimization strategy for solving MT-GPIC, which is guaranteed to converge. Experimental results on the Alzheimer's Disease Neuroimaging Initiative data set show that MT-GPIC achieves higher correlation coefficients and classification accuracy than state-of-the-art methods. Moreover, our approach could effectively identify interpretable phenotype-related GPI and correlation patterns in high-dimensional omics data sets. These findings not only enhance the diagnostic accuracy but also contribute valuable insights into the underlying pathogenic mechanisms of AD.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Multiômica , Genótipo , Neuroimagem/métodos , Fenótipo , Encéfalo/diagnóstico por imagem , Encéfalo/patologia
15.
Mol Cell Proteomics ; 23(4): 100744, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38417630

RESUMO

NF-κB pathway is involved in inflammation; however, recent data shows its role also in cancer development and progression, including metastasis. To understand the role of NF-κB interactome dynamics in cancer, we study the complexity of breast cancer interactome in luminal A breast cancer model and its rearrangement associated with NF-κB modulation. Liquid chromatography-mass spectrometry measurement of 160 size-exclusion chromatography fractions identifies 5460 protein groups. Seven thousand five hundred sixty eight interactions among these proteins have been reconstructed by PrInCE algorithm, of which 2564 have been validated in independent datasets. NF-κB modulation leads to rearrangement of protein complexes involved in NF-κB signaling and immune response, cell cycle regulation, and DNA replication. Central NF-κB transcription regulator RELA co-elutes with interactors of NF-κB activator PRMT5, and these complexes are confirmed by AlphaPulldown prediction. A complementary immunoprecipitation experiment recapitulates RELA interactions with other NF-κB factors, associating NF-κB inhibition with lower binding of NF-κB activators to RELA. This study describes a network of pro-tumorigenic protein interactions and their rearrangement upon NF-κB inhibition with potential therapeutic implications in tumors with high NF-κB activity.


Assuntos
Neoplasias da Mama , NF-kappa B , Mapas de Interação de Proteínas , Fator de Transcrição RelA , Humanos , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Feminino , NF-kappa B/metabolismo , Fator de Transcrição RelA/metabolismo , Mapeamento de Interação de Proteínas , Transdução de Sinais , Linhagem Celular Tumoral , Ligação Proteica , Proteína-Arginina N-Metiltransferases/metabolismo , Carcinogênese/metabolismo
16.
Genet Epidemiol ; 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080969

RESUMO

Observational studies are rarely representative of their target population because there are known and unknown factors that affect an individual's choice to participate (the selection mechanism). Selection can cause bias in a given analysis if the outcome is related to selection (conditional on the other variables in the model). Detecting and adjusting for selection bias in practice typically requires access to data on nonselected individuals. Here, we propose methods to detect selection bias in genetic studies by comparing correlations among genetic variants in the selected sample to those expected under no selection. We examine the use of four hypothesis tests to identify induced associations between genetic variants in the selected sample. We evaluate these approaches in Monte Carlo simulations. Finally, we use these approaches in an applied example using data from the UK Biobank (UKBB). The proposed tests suggested an association between alcohol consumption and selection into UKBB. Hence, UKBB analyses with alcohol consumption as the exposure or outcome may be biased by this selection.

17.
Genet Epidemiol ; 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38751238

RESUMO

Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing "gene component scores" and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.

18.
Biostatistics ; 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39083810

RESUMO

This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.

19.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113079

RESUMO

Millions of RNA sequencing samples have been deposited into public databases, providing a rich resource for biological research. These datasets encompass tens of thousands of experiments and offer comprehensive insights into human cellular regulation. However, a major challenge is how to integrate these experiments that acquired at different conditions. We propose a new statistical tool based on beta-binomial distributions that can construct robust gene co-regulation network (CoRegNet) across tens of thousands of experiments. Our analysis of over 12 000 experiments involving human tissues and cells shows that CoRegNet significantly outperforms existing gene co-expression-based methods. Although the majority of the genes are linearly co-regulated, we did discover an interesting set of genes that are non-linearly co-regulated; half of the time they change in the same direction and the other half they change in the opposite direction. Additionally, we identified a set of gene pairs that follows the Simpson's paradox. By utilizing public domain data, CoRegNet offers a powerful approach for identifying functionally related gene pairs, thereby revealing new biological insights.


Assuntos
Redes Reguladoras de Genes , Modelos Estatísticos , Humanos , RNA-Seq , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos
20.
Cereb Cortex ; 34(1)2024 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-38100334

RESUMO

Functional connectome has revealed remarkable potential in the diagnosis of neurological disorders, e.g. autism spectrum disorder. However, existing studies have primarily focused on a single connectivity pattern, such as full correlation, partial correlation, or causality. Such an approach fails in discovering the potential complementary topology information of FCNs at different connection patterns, resulting in lower diagnostic performance. Consequently, toward an accurate autism spectrum disorder diagnosis, a straightforward ambition is to combine the multiple connectivity patterns for the diagnosis of neurological disorders. To this end, we conduct functional magnetic resonance imaging data to construct multiple brain networks with different connectivity patterns and employ kernel combination techniques to fuse information from different brain connectivity patterns for autism diagnosis. To verify the effectiveness of our approach, we assess the performance of the proposed method on the Autism Brain Imaging Data Exchange dataset for diagnosing autism spectrum disorder. The experimental findings demonstrate that our method achieves precise autism spectrum disorder diagnosis with exceptional accuracy (91.30%), sensitivity (91.48%), and specificity (91.11%).


Assuntos
Transtorno do Espectro Autista , Conectoma , Doenças do Sistema Nervoso , Humanos , Conectoma/métodos , Transtorno do Espectro Autista/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA