Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BioData Min ; 17(1): 10, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627770

RESUMO

BACKGROUND: Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. RESULTS: Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. CONCLUSIONS: Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.

2.
J Biophotonics ; 17(5): e202300510, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38302112

RESUMO

Marine bacteria have been considered as important participants in revealing various carbon/sulfur/nitrogen cycles of marine ecosystem. Thus, how to accurately identify rare marine bacteria without a culture process is significant and valuable. In this work, we constructed a single-cell Raman spectra dataset from five living bacteria spores and utilized convolutional neural network to rapidly, accurately, nondestructively identify bacteria spores. The optimal CNN architecture can provide a prediction accuracy of five bacteria spore as high as 94.93% ± 1.78%. To evaluate the classification weight of extracted spectra features, we proposed a novel algorithm by occluding fingerprint Raman bands. Based on the relative classification weight arranged from large to small, four Raman bands located at 1518, 1397, 1666, and 1017 cm-1 mostly contribute to producing such high prediction accuracy. It can be foreseen that, LTRS combined with CNN approach have great potential for identifying marine bacteria, which cannot be cultured under normal condition.


Assuntos
Aprendizado Profundo , Pinças Ópticas , Análise de Célula Única , Análise Espectral Raman , Esporos Bacterianos , Esporos Bacterianos/isolamento & purificação , Fatores de Tempo , Organismos Aquáticos
3.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36653905

RESUMO

In longitudinal studies variables are measured repeatedly over time, leading to clustered and correlated observations. If the goal of the study is to develop prediction models, machine learning approaches such as the powerful random forest (RF) are often promising alternatives to standard statistical methods, especially in the context of high-dimensional data. In this paper, we review extensions of the standard RF method for the purpose of longitudinal data analysis. Extension methods are categorized according to the data structures for which they are designed. We consider both univariate and multivariate response longitudinal data and further categorize the repeated measurements according to whether the time effect is relevant. Even though most extensions are proposed for low-dimensional data, some can be applied to high-dimensional data. Information of available software implementations of the reviewed extensions is also given. We conclude with discussions on the limitations of our review and some future research directions.


Assuntos
Algoritmo Florestas Aleatórias , Software , Estudos Longitudinais , Análise de Dados
4.
BMC Bioinformatics ; 23(1): 243, 2022 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-35729515

RESUMO

BACKGROUND: Microbial communities in the human body, also known as human microbiota, impact human health, such as colorectal cancer (CRC). However, the different roles that microbial communities play in healthy and disease hosts remain largely unknown. The microbial communities are typically recorded through the taxa counts of operational taxonomic units (OTUs). The sparsity and high correlations among OTUs pose major challenges for understanding the microbiota-disease relation. Furthermore, the taxa data are structured in the sense that OTUs are related evolutionarily by a hierarchical structure. RESULTS: In this study, we borrow the idea of super-variant from statistical genetics, and propose a new concept called super-taxon to exploit hierarchical structure of taxa for microbiome studies, which is essentially a combination of taxonomic units. Specifically, we model a genus which consists of a set of OTUs at low hierarchy and is designed to reflect both marginal and joint effects of OTUs associated with the risk of CRC to address these issues. We first demonstrate the power of super-taxon in detecting highly correlated OTUs. Then, we identify CRC-associated OTUs in two publicly available datasets via a discovery-validation procedure. Specifically, four species of two genera are found to be associated with CRC: Parvimonas micra, Parvimonas sp., Peptostreptococcus stomatis, and Peptostreptococcus anaerobius. More importantly, for the first time, we report the joint effect of Parvimonas micra and Parvimonas sp. (p = 0.0084) as well as that of Peptostrepto-coccus stomatis and Peptostreptococcus anaerobius (p = 8.21e-06) on CRC. The proposed approach provides a novel and useful tool for identifying disease-related microbes by taking the hierarchical structure of taxa into account and further sheds new lights on their potential joint effects as a community in disease development. CONCLUSIONS: Our work shows that proposed approaches are effective to study the microbiota-disease relation taking into account for the sparsity, hierarchical and correlated structure among microbes.


Assuntos
Neoplasias Colorretais , Microbiota , Neoplasias Colorretais/genética , Firmicutes , Humanos , Microbiota/genética , Peptostreptococcus
5.
Opt Lett ; 47(5): 1033-1036, 2022 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-35230283

RESUMO

We measure the molecular alignment induced in gas using molecular rotational echo spectroscopy. Our results show that the echo intensity and the time interval between the local extremas of the echo responses depend sensitively on the pump intensities and the initial molecular rotational temperature, respectively. This allows us to accurately extract these experimental parameters from the echo signals and then further determine the molecular alignment in experiments. The accuracy of our method has been verified by comparing the simulation with the extracted parameters from the molecular alignment experiment performed with a femtosecond pump pulse.

6.
Hum Genomics ; 15(1): 10, 2021 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-33536081

RESUMO

BACKGROUND: The severity of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly heterogeneous. Studies have reported that males and some ethnic groups are at increased risk of death from COVID-19, which implies that individual risk of death might be influenced by host genetic factors. METHODS: In this project, we consider the mortality as the trait of interest and perform a genome-wide association study (GWAS) of data for 1778 infected cases (445 deaths, 25.03%) distributed by the UK Biobank. Traditional GWAS fails to identify any genome-wide significant genetic variants from this dataset. To enhance the power of GWAS and account for possible multi-loci interactions, we adopt the concept of super variant for the detection of genetic factors. A discovery-validation procedure is used for verifying the potential associations. RESULTS: We find 8 super variants that are consistently identified across multiple replications as susceptibility loci for COVID-19 mortality. The identified risk factors on chromosomes 2, 6, 7, 8, 10, 16, and 17 contain genetic variants and genes related to cilia dysfunctions (DNAH7 and CLUAP1), cardiovascular diseases (DES and SPEG), thromboembolic disease (STXBP5), mitochondrial dysfunctions (TOMM7), and innate immune system (WSB1). It is noteworthy that DNAH7 has been reported recently as the most downregulated gene after infecting human bronchial epithelial cells with SARS-CoV-2. CONCLUSIONS: Eight genetic variants are identified to significantly increase the risk of COVID-19 mortality among the patients with white British ancestry. These findings may provide timely clues and potential directions for better understanding the molecular pathogenesis of COVID-19 and the genetic basis of heterogeneous susceptibility, with potential impact on new therapeutic options.


Assuntos
Bancos de Espécimes Biológicos , COVID-19/mortalidade , Variação Genética , SARS-CoV-2/genética , Alelos , COVID-19/genética , COVID-19/virologia , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Fatores de Risco , Reino Unido/epidemiologia
7.
Hum Brain Mapp ; 42(5): 1304-1312, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33236465

RESUMO

Identifying genetic biomarkers for brain connectivity helps us understand genetic effects on brain function. The unique and important challenge in detecting associations between brain connectivity and genetic variants is that the phenotype is a matrix rather than a scalar. We study a new concept of super-variant for genetic association detection. Similar to but different from the classic concept of gene, a super-variant is a combination of alleles in multiple loci but contributing loci can be anywhere in the genome. We hypothesize that the super-variants are easier to detect and more reliable to reproduce in their associations with brain connectivity. By applying a novel ranking and aggregation method to the UK Biobank databases, we discovered and verified several replicable super-variants. Specifically, we investigate a discovery set with 16,421 subjects and a verification set with 2,882 subjects, where they are formed according to release date, and the verification set is used to validate the genetic associations from the discovery phase. We identified 12 replicable super-variants on Chromosomes 1, 3, 7, 8, 9, 10, 12, 15, 16, 18, and 19. These verified super-variants contain single nucleotide polymorphisms that locate in 14 genes which have been reported to have association with brain structure and function, and/or neurodevelopmental and neurodegenerative disorders in the literature. We also identified novel loci in genes RSPO2 and TMEM74 which may be upregulated in brain issues. These findings demonstrate the validity of the super-variants and its capability of unifying existing results as well as discovering novel and replicable results.


Assuntos
Encéfalo , Conectoma , Estudos de Associação Genética , Rede Nervosa , Adulto , Encéfalo/anatomia & histologia , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Conectoma/métodos , Bases de Dados Factuais , Conjuntos de Dados como Assunto , Estudos de Associação Genética/métodos , Humanos , Rede Nervosa/anatomia & histologia , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/fisiologia , Polimorfismo de Nucleotídeo Único
8.
medRxiv ; 2020 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-33200144

RESUMO

Background: The severity of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly heterogenous. Studies have reported that males and some ethnic groups are at increased risk of death from COVID-19, which implies that individual risk of death might be influenced by host genetic factors. Methods: In this project, we consider the mortality as the trait of interest and perform a genome-wide association study (GWAS) of data for 1,778 infected cases (445 deaths, 25.03%) distributed by the UK Biobank. Traditional GWAS failed to identify any genome-wide significant genetic variants from this dataset. To enhance the power of GWAS and account for possible multi-loci interactions, we adopt the concept of super-variant for the detection of genetic factors. A discovery-validation procedure is used for verifying the potential associations. Results: We find 8 super-variants that are consistently identified across multiple replications as susceptibility loci for COVID-19 mortality. The identified risk factors on Chromosomes 2, 6, 7, 8, 10, 16, and 17 contain genetic variants and genes related to cilia dysfunctions (DNAH7 and CLUAP1), cardiovascular diseases (DES and SPEG), thromboembolic disease (STXBP5), mitochondrial dysfunctions (TOMM7), and innate immune system (WSB1). It is noteworthy that DNAH7 has been reported recently as the most downregulated gene after infecting human bronchial epithelial cells with SARS-CoV2. Conclusions: Eight genetic variants are identified to significantly increase risk of COVID-19 mortality among the patients with white British ancestry. These findings may provide timely evidence and clues for better understanding the molecular pathogenesis of COVID-19 and genetic basis of heterogeneous susceptibility, with potential impact on new therapeutic options.

9.
Genet Epidemiol ; 44(8): 934-947, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32808324

RESUMO

In genome-wide association studies, signals associated with rare variants and interactions between genes are hard to detect even when the sample size is in tens of thousands. To overcome these problems, we examine the concept of supervariant. Like the classic concept of the gene, a supervariant is a combination of alleles in multiple loci, but the contributing loci can be anywhere in the genome. We hypothesize that supervariants are easy to detect and the aggregated signals are more stable in their associations with the disease than that from a single nucleoid polymorphism. Using the UK Biobank databases, we develop a ranking and aggregation method for identifying supervariants. Specifically, we examine 9,377 breast cancer cases with 46,861 controls matched by sex and age. In our simulations, the use of supervariants outperforms single-nucleotide polymorphism-based association method in detecting rare variants and signals with interactive structure. In real data analysis, we identify supervariants on Chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 16, and 22 which cover previously reported loci that have associations with breast or other cancers, and several novel loci on Chromosomes 2, 5, 9, and 12. These findings demonstrate the validity of supervariants and its potential of discovering replicable and novel results for complex disease.


Assuntos
Neoplasias da Mama/genética , Predisposição Genética para Doença , Variação Genética , Alelos , Simulação por Computador , Bases de Dados Genéticas , Feminino , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA