Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74.866
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 184(9): 2372-2383.e9, 2021 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-33743213

RESUMO

Vaccination elicits immune responses capable of potently neutralizing SARS-CoV-2. However, ongoing surveillance has revealed the emergence of variants harboring mutations in spike, the main target of neutralizing antibodies. To understand the impact of these variants, we evaluated the neutralization potency of 99 individuals that received one or two doses of either BNT162b2 or mRNA-1273 vaccines against pseudoviruses representing 10 globally circulating strains of SARS-CoV-2. Five of the 10 pseudoviruses, harboring receptor-binding domain mutations, including K417N/T, E484K, and N501Y, were highly resistant to neutralization. Cross-neutralization of B.1.351 variants was comparable to SARS-CoV and bat-derived WIV1-CoV, suggesting that a relatively small number of mutations can mediate potent escape from vaccine responses. While the clinical impact of neutralization resistance remains uncertain, these results highlight the potential for variants to escape from neutralizing humoral immunity and emphasize the need to develop broadly protective interventions against the evolving pandemic.


Assuntos
Anticorpos Neutralizantes/imunologia , Anticorpos Antivirais/imunologia , Vacinas contra COVID-19/imunologia , Imunidade Humoral , SARS-CoV-2/imunologia , Vacina BNT162 , COVID-19/sangue , COVID-19/imunologia , COVID-19/virologia , Células HEK293 , Humanos , Mutação/genética , Curva ROC , SARS-CoV-2/genética
2.
Cell ; 182(2): 317-328.e10, 2020 07 23.
Artigo em Inglês | MEDLINE | ID: mdl-32526205

RESUMO

Hepatocellular carcinoma (HCC) is an aggressive malignancy with its global incidence and mortality rate continuing to rise, although early detection and surveillance are suboptimal. We performed serological profiling of the viral infection history in 899 individuals from an NCI-UMD case-control study using a synthetic human virome, VirScan. We developed a viral exposure signature and validated the results in a longitudinal cohort with 173 at-risk patients who had long-term follow-up for HCC development. Our viral exposure signature significantly associated with HCC status among at-risk individuals in the validation cohort (area under the curve: 0.91 [95% CI 0.87-0.96] at baseline and 0.98 [95% CI 0.97-1] at diagnosis). The signature identified cancer patients prior to a clinical diagnosis and was superior to alpha-fetoprotein. In summary, we established a viral exposure signature that can predict HCC among at-risk patients prior to a clinical diagnosis, which may be useful in HCC surveillance.


Assuntos
Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/patologia , Viroses/patologia , Adulto , Idoso , Área Sob a Curva , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Estudos de Casos e Controles , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Curva ROC , Fatores de Risco , Viroses/complicações , Adulto Jovem , alfa-Fetoproteínas/análise
3.
Cell ; 178(2): 447-457.e5, 2019 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-31257030

RESUMO

Neurons in cortical circuits are often coactivated as ensembles, yet it is unclear whether ensembles play a functional role in behavior. Some ensemble neurons have pattern completion properties, triggering the entire ensemble when activated. Using two-photon holographic optogenetics in mouse primary visual cortex, we tested whether recalling ensembles by activating pattern completion neurons alters behavioral performance in a visual task. Disruption of behaviorally relevant ensembles by activation of non-selective neurons decreased performance, whereas activation of only two pattern completion neurons from behaviorally relevant ensembles improved performance, by reliably recalling the whole ensemble. Also, inappropriate behavioral choices were evoked by the mistaken activation of behaviorally relevant ensembles. Finally, in absence of visual stimuli, optogenetic activation of two pattern completion neurons could trigger behaviorally relevant ensembles and correct behavioral responses. Our results demonstrate a causal role of neuronal ensembles in a visually guided behavior and suggest that ensembles implement internal representations of perceptual states.


Assuntos
Comportamento Animal , Córtex Visual/fisiologia , Animais , Área Sob a Curva , Cálcio/metabolismo , Holografia , Processamento de Imagem Assistida por Computador , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Neurônios/metabolismo , Optogenética/métodos , Estimulação Luminosa , Fótons , Curva ROC
4.
Cell ; 174(6): 1361-1372.e10, 2018 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-30193110

RESUMO

A key aspect of genomic medicine is to make individualized clinical decisions from personal genomes. We developed a machine-learning framework to integrate personal genomes and electronic health record (EHR) data and used this framework to study abdominal aortic aneurysm (AAA), a prevalent irreversible cardiovascular disease with unclear etiology. Performing whole-genome sequencing on AAA patients and controls, we demonstrated its predictive precision solely from personal genomes. By modeling personal genomes with EHRs, this framework quantitatively assessed the effectiveness of adjusting personal lifestyles given personal genome baselines, demonstrating its utility as a personal health management tool. We showed that this new framework agnostically identified genetic components involved in AAA, which were subsequently validated in human aortic tissues and in murine models. Our study presents a new framework for disease genome analysis, which can be used for both health management and understanding the biological architecture of complex diseases. VIDEO ABSTRACT.


Assuntos
Aneurisma da Aorta Abdominal/patologia , Genômica , Animais , Aneurisma da Aorta Abdominal/genética , Área Sob a Curva , Modelos Animais de Doenças , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Aprendizado de Máquina , Camundongos , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas , Curva ROC , Sequenciamento Completo do Genoma
5.
Cell ; 172(5): 1122-1131.e9, 2018 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-29474911

RESUMO

The implementation of clinical-decision support algorithms for medical imaging faces challenges with reliability and interpretability. Here, we establish a diagnostic tool based on a deep-learning framework for the screening of patients with common treatable blinding retinal diseases. Our framework utilizes transfer learning, which trains a neural network with a fraction of the data of conventional approaches. Applying this approach to a dataset of optical coherence tomography images, we demonstrate performance comparable to that of human experts in classifying age-related macular degeneration and diabetic macular edema. We also provide a more transparent and interpretable diagnosis by highlighting the regions recognized by the neural network. We further demonstrate the general applicability of our AI system for diagnosis of pediatric pneumonia using chest X-ray images. This tool may ultimately aid in expediting the diagnosis and referral of these treatable conditions, thereby facilitating earlier treatment, resulting in improved clinical outcomes. VIDEO ABSTRACT.


Assuntos
Aprendizado Profundo , Diagnóstico por Imagem , Pneumonia/diagnóstico , Criança , Humanos , Redes Neurais de Computação , Pneumonia/diagnóstico por imagem , Curva ROC , Reprodutibilidade dos Testes , Tomografia de Coerência Óptica
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38385875

RESUMO

Metabolomics and foodomics shed light on the molecular processes within living organisms and the complex food composition by leveraging sophisticated analytical techniques to systematically analyze the vast array of molecular features. The traditional feature-picking method often results in arbitrary selections of the model, feature ranking, and cut-off, which may lead to suboptimal results. Thus, a Multiple and Optimal Screening Subset (MOSS) approach was developed in this study to achieve a balance between a minimal number of predictors and high predictive accuracy during statistical model setup. The MOSS approach compares five commonly used models in the context of food matrix analysis, specifically bourbons. These models include Student's t-test, receiver operating characteristic curve, partial least squares-discriminant analysis (PLS-DA), random forests, and support vector machines. The approach employs cross-validation to identify promising subset feature candidates that contribute to food characteristic classification. It then determines the optimal subset size by comparing it to the corresponding top-ranked features. Finally, it selects the optimal feature subset by traversing all possible feature candidate combinations. By utilizing MOSS approach to analyze 1406 mass spectral features from a collection of 122 bourbon samples, we were able to generate a subset of features for bourbon age prediction with 88% accuracy. Additionally, MOSS increased the area under the curve performance of sweetness prediction to 0.898 with only four predictors compared with the top-ranked four features at 0.681 based on the PLS-DA model. Overall, we demonstrated that MOSS provides an efficient and effective approach for selecting optimal features compared with other frequently utilized methods.


Assuntos
Metabolômica , Projetos de Pesquisa , Análise Discriminante , Modelos Estatísticos , Curva ROC
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38770720

RESUMO

The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.


Assuntos
RNA-Seq , RNA-Seq/métodos , Humanos , Algoritmos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Curva ROC , Software
8.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426324

RESUMO

Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.


Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , RNA Circular/genética , Curva ROC , Aprendizado de Máquina , Algoritmos , Biologia Computacional/métodos
9.
Nature ; 587(7834): 448-454, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33149306

RESUMO

Low concordance between studies that examine the role of microbiota in human diseases is a pervasive challenge that limits the capacity to identify causal relationships between host-associated microorganisms and pathology. The risk of obtaining false positives is exacerbated by wide interindividual heterogeneity in microbiota composition1, probably due to population-wide differences in human lifestyle and physiological variables2 that exert differential effects on the microbiota. Here we infer the greatest, generalized sources of heterogeneity in human gut microbiota profiles and also identify human lifestyle and physiological characteristics that, if not evenly matched between cases and controls, confound microbiota analyses to produce spurious microbial associations with human diseases. We identify alcohol consumption frequency and bowel movement quality as unexpectedly strong sources of gut microbiota variance that differ in distribution between healthy participants and participants with a disease and that can confound study designs. We demonstrate that for numerous prevalent, high-burden human diseases, matching cases and controls for confounding variables reduces observed differences in the microbiota and the incidence of spurious associations. On this basis, we present a list of host variables that we recommend should be captured in human microbiota studies for the purpose of matching comparison groups, which we anticipate will increase robustness and reproducibility in resolving the members of the gut microbiota that are truly associated with human disease.


Assuntos
Fatores de Confusão Epidemiológicos , Análise de Dados , Dieta , Doença , Microbioma Gastrointestinal/fisiologia , Estilo de Vida , Aprendizado de Máquina , Adulto , Idoso , Idoso de 80 Anos ou mais , Consumo de Bebidas Alcoólicas , Área Sob a Curva , Índice de Massa Corporal , Estudos de Casos e Controles , Diabetes Mellitus Tipo 2 , Fezes/microbiologia , Feminino , Motilidade Gastrointestinal , Humanos , Masculino , Pessoa de Meia-Idade , RNA Ribossômico 16S/genética , Curva ROC , Características de Residência , Adulto Jovem
10.
Am J Hum Genet ; 109(2): 195-209, 2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35032432

RESUMO

Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic cause of a portion of these unresolved cases. As sequencing methods using long or linked reads become more accessible and SV detection algorithms improve, clinicians and researchers are gaining access to thousands of reliable SVs of unknown disease relevance. Methods to predict the pathogenicity of these SVs are required to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE to distinguish pathogenic SVs from benign SVs that overlap exons. In a random forest classifier, we integrated features that capture gene importance, coding region, conservation, expression, and exon structure. We found that features such as expression and conservation are important but are absent from SV classification guidelines. We leveraged multiple resources to construct a size-matched training set of rare, putatively benign and pathogenic SVs. StrVCTVRE performs accurately across a wide SV size range on independent test sets, which will allow clinicians and researchers to eliminate about half of SVs from consideration while retaining a 90% sensitivity. We anticipate clinicians and researchers will use StrVCTVRE to prioritize SVs in probands where no SV is immediately compelling, empowering deeper investigation into novel SVs to resolve cases and understand new mechanisms of disease. StrVCTVRE runs rapidly and is publicly available.


Assuntos
Algoritmos , Genoma Humano , Variação Estrutural do Genoma , Software , Aprendizado de Máquina Supervisionado , Conjuntos de Dados como Assunto , Éxons , Genômica/métodos , Humanos , Curva ROC , Sequenciamento Completo do Genoma/estatística & dados numéricos
11.
Gastroenterology ; 167(3): 591-603.e9, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38583724

RESUMO

BACKGROUND & AIMS: Benign ulcerative colorectal diseases (UCDs) such as ulcerative colitis, Crohn's disease, ischemic colitis, and intestinal tuberculosis share similar phenotypes with different etiologies and treatment strategies. To accurately diagnose closely related diseases like UCDs, we hypothesize that contextual learning is critical in enhancing the ability of the artificial intelligence models to differentiate the subtle differences in lesions amidst the vastly divergent spatial contexts. METHODS: White-light colonoscopy datasets of patients with confirmed UCDs and healthy controls were retrospectively collected. We developed a Multiclass Contextual Classification (MCC) model that can differentiate among the mentioned UCDs and healthy controls by incorporating the tissue object contexts surrounding the individual lesion region in a scene and spatial information from other endoscopic frames (video-level) into a unified framework. Internal and external datasets were used to validate the model's performance. RESULTS: Training datasets included 762 patients, and the internal and external testing cohorts included 257 patients and 293 patients, respectively. Our MCC model provided a rapid reference diagnosis on internal test sets with a high averaged area under the receiver operating characteristic curve (image-level: 0.950 and video-level: 0.973) and balanced accuracy (image-level: 76.1% and video-level: 80.8%), which was superior to junior endoscopists (accuracy: 71.8%, P < .0001) and similar to experts (accuracy: 79.7%, P = .732). The MCC model achieved an area under the receiver operating characteristic curve of 0.988 and balanced accuracy of 85.8% using external testing datasets. CONCLUSIONS: These results enable this model to fit in the routine endoscopic workflow, and the contextual framework to be adopted for diagnosing other closely related diseases.


Assuntos
Inteligência Artificial , Colite Ulcerativa , Colonoscopia , Humanos , Colite Ulcerativa/diagnóstico , Estudos Retrospectivos , Feminino , Masculino , Pessoa de Meia-Idade , Adulto , Interpretação de Imagem Assistida por Computador/métodos , Curva ROC , Idoso , Reprodutibilidade dos Testes , Colo/patologia , Colo/diagnóstico por imagem , Valor Preditivo dos Testes , Diagnóstico Diferencial , Gravação em Vídeo , Aprendizado de Máquina , Estudos de Casos e Controles
12.
Gastroenterology ; 167(2): 357-367.e9, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38513745

RESUMO

BACKGROUND & AIMS: There is an unmet need for noninvasive tests to improve case-finding and aid primary care professionals in referring patients at high risk of liver disease. METHODS: A metabolic dysfunction-associated fibrosis (MAF-5) score was developed and externally validated in a total of 21,797 individuals with metabolic dysfunction in population-based (National Health and Nutrition Examination Survey 2017-2020, National Health and Nutrition Examination Survey III, and Rotterdam Study) and hospital-based (from Antwerp and Bogota) cohorts. Fibrosis was defined as liver stiffness ≥8.0 kPa. Diagnostic accuracy was compared with FIB-4, nonalcoholic fatty liver disease fibrosis score (NFS), LiverRisk score and steatosis-associated fibrosis estimator (SAFE). MAF-5 was externally validated with liver stiffness measurement ≥8.0 kPa, with shear-wave elastography ≥7.5 kPa, and biopsy-proven steatotic liver disease according to Metavir and Nonalcoholic Steatohepatitis Clinical Research Network scores, and was tested for prognostic performance (all-cause mortality). RESULTS: The MAF-5 score comprised waist circumference, body mass index (calculated as kg / m2), diabetes, aspartate aminotransferase, and platelets. With this score, 60.9% was predicted at low, 14.1% at intermediate, and 24.9% at high risk of fibrosis. The observed prevalence was 3.3%, 7.9%, and 28.1%, respectively. The area under the receiver operator curve of MAF-5 (0.81) was significantly higher than FIB-4 (0.61), and outperformed the FIB-4 among young people (negative predictive value [NPV], 99%; area under the curve [AUC], 0.86 vs NPV, 94%; AUC, 0.51) and older adults (NPV, 94%; AUC, 0.75 vs NPV, 88%; AUC, 0.55). MAF-5 showed excellent performance to detect liver stiffness measurement ≥12 kPa (AUC, 0.86 training; AUC, 0.85 validation) and good performance in detecting liver stiffness and biopsy-proven liver fibrosis among the external validation cohorts. MAF-5 score >1 was associated with increased risk of all-cause mortality in (un)adjusted models (adjusted hazard ratio, 1.59; 95% CI, 1.47-1.73). CONCLUSIONS: The MAF-5 score is a validated, age-independent, inexpensive referral tool to identify individuals at high risk of liver fibrosis and all-cause mortality in primary care populations, using simple variables.


Assuntos
Técnicas de Imagem por Elasticidade , Cirrose Hepática , Valor Preditivo dos Testes , Humanos , Masculino , Feminino , Cirrose Hepática/diagnóstico , Cirrose Hepática/epidemiologia , Cirrose Hepática/patologia , Cirrose Hepática/etiologia , Pessoa de Meia-Idade , Medição de Risco , Idoso , Prognóstico , Índice de Massa Corporal , Fatores de Risco , Circunferência da Cintura , Inquéritos Nutricionais , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Hepatopatia Gordurosa não Alcoólica/diagnóstico , Hepatopatia Gordurosa não Alcoólica/patologia , Adulto , Aspartato Aminotransferases/sangue , Contagem de Plaquetas , Fígado/patologia , Fígado/diagnóstico por imagem , Países Baixos/epidemiologia , Biópsia , Curva ROC , Reprodutibilidade dos Testes
13.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37039696

RESUMO

The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests and antibody production. Several computational approaches have been proposed to identify, from an antigen protein or peptide sequence, which residues are more likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could otherwise be obtained. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well-established Cutoff Scanning Matrix algorithm and Organism Ontology information. Our model achieved Areas Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance. A comprehensive comparison to alternative methods using distinct benchmark data sets was also employed, with our model outperforming state-of-the-art tools. epitope1D represents not only a significant advance in predictive performance, but also allows biologically meaningful features to be combined and used for model interpretation. epitope1D has been made available as a user-friendly web server interface and application programming interface at https://biosig.lab.uq.edu.au/epitope1d/.


Assuntos
Algoritmos , Epitopos de Linfócito B , Sequência de Aminoácidos , Curva ROC
14.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37369639

RESUMO

DNA methylation plays a crucial role in transcriptional regulation. Reduced representation bisulfite sequencing (RRBS) is a technique of increasing use for analyzing genome-wide methylation profiles. Many computational tools such as Metilene, MethylKit, BiSeq and DMRfinder have been developed to use RRBS data for the detection of the differentially methylated regions (DMRs) potentially involved in epigenetic regulations of gene expression. For DMR detection tools, as for countless other medical applications, P-values and their adjustments are among the most standard reporting statistics used to assess the statistical significance of biological findings. However, P-values are coming under increasing criticism relating to their questionable accuracy and relatively high levels of false positive or negative indications. Here, we propose a method to calculate E-values, as likelihood ratios falling into the null hypothesis over the entire parameter space, for DMR detection in RRBS data. We also provide the R package 'metevalue' as a user-friendly interface to implement E-value calculations into various DMR detection tools. To evaluate the performance of E-values, we generated various RRBS benchmarking datasets using our simulator 'RRBSsim' with eight samples in each experimental group. Our comprehensive benchmarking analyses showed that using E-values not only significantly improved accuracy, area under ROC curve and power, over that of P-values or adjusted P-values, but also reduced false discovery rates and type I errors. In applications using real RRBS data of CRL rats and a clinical trial on low-salt diet, the use of E-values detected biologically more relevant DMRs and also improved the negative association between DNA methylation and gene expression.


Assuntos
Metilação de DNA , Animais , Ratos , Análise de Sequência de DNA/métodos , Curva ROC , Ilhas de CpG
15.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37013942

RESUMO

Identifying protein-protein interaction (PPI) site is an important step in understanding biological activity, apprehending pathological mechanism and designing novel drugs. Developing reliable computational methods for predicting PPI site as screening tools contributes to reduce lots of time and expensive costs for conventional experiments, but how to improve the accuracy is still challenging. We propose a PPI site predictor, called Augmented Graph Attention Network Protein-Protein Interacting Site (AGAT-PPIS), based on AGAT with initial residual and identity mapping, in which eight AGAT layers are connected to mine node embedding representation deeply. AGAT is our augmented version of graph attention network, with added edge features. Besides, extra node features and edge features are introduced to provide more structural information and increase the translation and rotation invariance of the model. On the benchmark test set, AGAT-PPIS significantly surpasses the state-of-the-art method by 8% in Accuracy, 17.1% in Precision, 11.8% in F1-score, 15.1% in Matthews Correlation Coefficient (MCC), 8.1% in Area Under the Receiver Operating Characteristic curve (AUROC), 14.5% in Area Under the Precision-Recall curve (AUPRC), respectively.


Assuntos
Mapeamento de Interação de Proteínas , Inibidores da Bomba de Prótons , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Área Sob a Curva , Curva ROC
16.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36790856

RESUMO

Potential miRNA-disease associations (MDA) play an important role in the discovery of complex human disease etiology. Therefore, MDA prediction is an attractive research topic in the field of biomedical machine learning. Recently, several models have been proposed for this task, but their performance limited by over-reliance on relevant network information with noisy graph structure connections. However, the application of self-supervised graph structure learning to MDA tasks remains unexplored. Our study is the first to use multi-view self-supervised contrastive learning (MSGCL) for MDA prediction. Specifically, we generated a learner view without association labels of miRNAs and diseases as input, and utilized the known association network to generate an anchor view that provides guiding signals for the learner view. The graph structure was optimized by designing a contrastive loss to maximize the consistency between the anchor and learner views. Our model is similar to a pre-trained model that continuously optimizes upstream tasks for high-quality association graph topology, thereby enhancing the latent representation of association predictions. The experimental results show that our proposed method outperforms state-of-the-art methods by 2.79$\%$ and 3.20$\%$ in area under the receiver operating characteristic curve (AUC) and area under the precision/recall curve (AUPR), respectively.


Assuntos
Aprendizado de Máquina , MicroRNAs , Humanos , Área Sob a Curva , MicroRNAs/genética , Curva ROC
17.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37328701

RESUMO

Circular RNA (circRNA) is closely associated with human diseases. Accordingly, identifying the associations between human diseases and circRNA can help in disease prevention, diagnosis and treatment. Traditional methods are time consuming and laborious. Meanwhile, computational models can effectively predict potential circRNA-disease associations (CDAs), but are restricted by limited data, resulting in data with high dimension and imbalance. In this study, we propose a model based on automatically selected meta-path and contrastive learning, called the MPCLCDA model. First, the model constructs a new heterogeneous network based on circRNA similarity, disease similarity and known association, via automatically selected meta-path and obtains the low-dimensional fusion features of nodes via graph convolutional networks. Then, contrastive learning is used to optimize the fusion features further, and obtain the node features that make the distinction between positive and negative samples more evident. Finally, circRNA-disease scores are predicted through a multilayer perceptron. The proposed method is compared with advanced methods on four datasets. The average area under the receiver operating characteristic curve, area under the precision-recall curve and F1 score under 5-fold cross-validation reached 0.9752, 0.9831 and 0.9745, respectively. Simultaneously, case studies on human diseases further prove the predictive ability and application value of this method.


Assuntos
Redes Neurais de Computação , RNA Circular , Humanos , RNA Circular/genética , Curva ROC , Biologia Computacional/métodos , Algoritmos
18.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38243694

RESUMO

The correct prediction of disease-associated miRNAs plays an essential role in disease prevention and treatment. Current computational methods to predict disease-associated miRNAs construct different miRNA views and disease views based on various miRNA properties and disease properties and then integrate the multiviews to predict the relationship between miRNAs and diseases. However, most existing methods ignore the information interaction among the views and the consistency of miRNA features (disease features) across multiple views. This study proposes a computational method based on multiple hypergraph contrastive learning (MHCLMDA) to predict miRNA-disease associations. MHCLMDA first constructs multiple miRNA hypergraphs and disease hypergraphs based on various miRNA similarities and disease similarities and performs hypergraph convolution on each hypergraph to capture higher order interactions between nodes, followed by hypergraph contrastive learning to learn the consistent miRNA feature representation and disease feature representation under different views. Then, a variational auto-encoder is employed to extract the miRNA and disease features in known miRNA-disease association relationships. Finally, MHCLMDA fuses the miRNA and disease features from different views to predict miRNA-disease associations. The parameters of the model are optimized in an end-to-end way. We applied MHCLMDA to the prediction of human miRNA-disease association. The experimental results show that our method performs better than several other state-of-the-art methods in terms of the area under the receiver operating characteristic curve and the area under the precision-recall curve.


Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , Algoritmos , Biologia Computacional/métodos , Curva ROC
19.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36567255

RESUMO

Underlying medical conditions, such as cancer, kidney disease and heart failure, are associated with a higher risk for severe COVID-19. Accurate classification of COVID-19 patients with underlying medical conditions is critical for personalized treatment decision and prognosis estimation. In this study, we propose an interpretable artificial intelligence model termed VDJMiner to mine the underlying medical conditions and predict the prognosis of COVID-19 patients according to their immune repertoires. In a cohort of more than 1400 COVID-19 patients, VDJMiner accurately identifies multiple underlying medical conditions, including cancers, chronic kidney disease, autoimmune disease, diabetes, congestive heart failure, coronary artery disease, asthma and chronic obstructive pulmonary disease, with an average area under the receiver operating characteristic curve (AUC) of 0.961. Meanwhile, in this same cohort, VDJMiner achieves an AUC of 0.922 in predicting severe COVID-19. Moreover, VDJMiner achieves an accuracy of 0.857 in predicting the response of COVID-19 patients to tocilizumab treatment on the leave-one-out test. Additionally, VDJMiner interpretively mines and scores V(D)J gene segments of the T-cell receptors that are associated with the disease. The identified associations between single-cell V(D)J gene segments and COVID-19 are highly consistent with previous studies. The source code of VDJMiner is publicly accessible at https://github.com/TencentAILabHealthcare/VDJMiner. The web server of VDJMiner is available at https://gene.ai.tencent.com/VDJMiner/.


Assuntos
Asma , COVID-19 , Humanos , Inteligência Artificial , Curva ROC , Software
20.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36416116

RESUMO

DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.


Assuntos
Arabidopsis , Proteínas de Ligação a DNA , Algoritmos , Arabidopsis/genética , Arabidopsis/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Curva ROC , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA