Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
Front Public Health ; 11: 1086771, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37089491

RESUMO

Introduction: The triglyceride-glucose (TyG)-driven indices, incorporating obesity indices, have been proposed as reliable markers of insulin resistance and related comorbidities such as diabetes. This study evaluated the effectiveness of these indices in detecting prediabetes in normal-weight individuals from a Middle Eastern population. Methods: Using the data of 5,996 adult Qatari participants from the Qatar Biobank cohort, we employed adjusted logistic regression to assess the ability of various obesity and triglyceride-related indices to detect prediabetes in normal-weight (18.5 ≤ BMI <25 kg/m2) adults (≥18 years). Results: Of the normal-weight adults, 13.62% had prediabetes. TyG-waist-to-height ratio (TyG-WHTR) was significantly associated with prediabetes among normal-weight men [OR per 1-SD 2.68; 95% CI (1.67-4.32)] and women [OR per 1-SD 2.82; 95% CI (1.61-4.94)]. Compared with other indices, TyG-WHTR had the highest area under the curve (AUC) value for prediabetes in men [AUC: 0.76, 95% CI (0.70-0.81)] and women [AUC: 0.73, 95% CI (0.66-0.80)], and performed significantly higher than other indices (p < 0.05) in detecting prediabetes in men. Tyg-WHTR shared similar diagnostic values as fasting plasma glucose (FPG). Discussion: Our findings suggest that the TyG-WHTR index could be a better indicator of prediabetes for general clinical usage in normal weight Qatari adult men than other obesity and TyG-related indices. TyG-WHTR can help identify a person's risk for developing prediabetes in both men and women when combined with FPG results.


Assuntos
Estado Pré-Diabético , Masculino , Humanos , Adulto , Feminino , Estado Pré-Diabético/diagnóstico , Glucose , Estudos Transversais , Triglicerídeos , Obesidade/diagnóstico , Obesidade/epidemiologia
2.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36945891

RESUMO

MOTIVATION: Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identifying genes that are aberrant and cause Mendelian disorders. Recently developed models for this task rely on modeling RNA-Seq GE data using the negative binomial distribution (NBD). However, some of those models either rely on procedures for inferring NBD's parameters in a nonbiased way that are computationally demanding and thus make confounder control challenging, while others rely on less computationally demanding but biased procedures and convoluted confounder control approaches that hinder interpretability. RESULTS: In this article, we present OutSingle (Outlier detection using Singular Value Decomposition), an almost instantaneous way of detecting outliers in RNA-Seq GE data. It uses a simple log-normal approach for count modeling. For confounder control, it uses the recently discovered optimal hard threshold (OHT) method for noise detection, which itself is based on singular value decomposition (SVD). Due to its SVD/OHT utilization, OutSingle's model is straightforward to understand and interpret. We then show that our novel method, when used on RNA-Seq GE data with real biological outliers masked by confounders, outcompetes the previous state-of-the-art model based on an ad hoc denoising autoencoder. Additionally, OutSingle can be used to inject artificial outliers masked by confounders, which is difficult to achieve with previous approaches. We describe a way of using OutSingle for outlier injection and proceed to show how OutSingle outperforms its competition on 16 out of 18 datasets that were generated from three real datasets using OutSingle's injection procedure with different outlier types and magnitudes. Our methods are applicable to other types of similar problems involving finding outliers in matrices under the presence of confounders. AVAILABILITY AND IMPLEMENTATION: The code for OutSingle is available at https://github.com/esalkovic/outsingle.


Assuntos
RNA , Sequência de Bases , Sequenciamento do Exoma , RNA/metabolismo , RNA-Seq , Análise de Sequência de RNA/métodos , Expressão Gênica/genética
3.
J Infect Public Health ; 16(5): 799-807, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36966703

RESUMO

Monkeypox virus (MPXV) was confirmed in May 2022 and designated a global health emergency by WHO in July 2022. MPX virions are big, enclosed, brick-shaped, and contain a linear, double-stranded DNA genome as well as enzymes. MPXV particles bind to the host cell membrane via a variety of viral-host protein interactions. As a result, the wrapped structure is a potential therapeutic target. DeepRepurpose, an artificial intelligence-based compound-viral proteins interaction framework, was used via a transfer learning setting to prioritize a set of FDA approved and investigational drugs which can potentially inhibit MPXV viral proteins. To filter and narrow down the lead compounds from curated collections of pharmaceutical compounds, we used a rigorous computational framework that included homology modeling, molecular docking, dynamic simulations, binding free energy calculations, and binding pose metadynamics. We identified Elvitegravir as a potential inhibitor of MPXV virus using our comprehensive pipeline.


Assuntos
Reposicionamento de Medicamentos , Monkeypox virus , Humanos , Monkeypox virus/genética , Inteligência Artificial , Simulação de Acoplamento Molecular , Proteínas Virais/genética
4.
Nat Commun ; 14(1): 724, 2023 02 09.
Artigo em Inglês | MEDLINE | ID: mdl-36759620

RESUMO

The PML::RARA fusion protein is the hallmark driver of Acute Promyelocytic Leukemia (APL) and disrupts retinoic acid signaling, leading to wide-scale gene expression changes and uncontrolled proliferation of myeloid precursor cells. While known to be recruited to binding sites across the genome, its impact on gene regulation and expression is under-explored. Using integrated multi-omics datasets, we characterize the influence of PML::RARA binding on gene expression and regulation in an inducible PML::RARA cell line model and APL patient ex vivo samples. We find that genes whose regulatory elements recruit PML::RARA are not uniformly transcriptionally repressed, as commonly suggested, but also may be upregulated or remain unchanged. We develop a computational machine learning implementation called Regulatory Element Behavior Extraction Learning to deconvolute the complex, local transcription factor binding site environment at PML::RARA bound positions to reveal distinct signatures that modulate how PML::RARA directs the transcriptional response.


Assuntos
Leucemia Promielocítica Aguda , Humanos , Linhagem Celular , Regulação da Expressão Gênica , Leucemia Promielocítica Aguda/genética , Leucemia Promielocítica Aguda/metabolismo , Multiômica , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismo , Tretinoína/farmacologia
5.
Vet Sci ; 9(11)2022 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-36356097

RESUMO

Great advances have been made in human health care in the application of radiomics and artificial intelligence (AI) in a variety of areas, ranging from hospital management and virtual assistants to remote patient monitoring and medical diagnostics and imaging. To improve accuracy and reproducibility, there has been a recent move to integrate radiomics and AI as tools to assist clinical decision making and to incorporate it into routine clinical workflows and diagnosis. Although lagging behind human medicine, the use of radiomics and AI in veterinary diagnostic imaging is becoming more frequent with an increasing number of reported applications. The goal of this paper is to provide an overview of current radiomic and AI applications in veterinary diagnostic imaging.

6.
Nutrients ; 14(2)2022 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-35057526

RESUMO

Children are prescribed second-generation antipsychotic (SGA) medications, such as olanzapine (OLZ) for FDA-approved and "off-label" indications. The long-term impact of early-life SGA medication exposure is unclear. Olanzapine and other SGA medications are known to cause excessive weight gain in young and adult patients, suggesting the possibility of long-term complications associated with the use of these drugs, such as obesity, diabetes, and heart disease. Further, the weight gain effects of OLZ have previously been shown to depend on the presence of gut bacteria and treatment with OLZ, which shifts gut bacteria toward an "obesogenic" profile. The purpose of the current study was to evaluate changes in gut bacteria in adult mice following early life treatment with OLZ and being fed either a high-fat diet or a high-fat diet supplemented with fish oil, which has previously been shown to counteract gut dysbiosis, weight gain, and inflammation produced by a high-fat diet. Female and male C57Bl/6J mice were fed a high fat diet without (HF) or with the supplementation of fish oil (HF-FO) and treated with OLZ from postnatal day (PND) 37-65 resulting in four groups of mice: mice fed a HF diet and treated with OLZ (HF-OLZ), mice fed a HF diet and treated with vehicle (HF), mice fed a HF-FO diet and treated with OLZ (HF-FO-OLZ), and mice fed a HF-FO diet and treated with vehicle (HF-FO). Following euthanasia at approximately 164 days of age, we determined changes in gut bacteria populations and serum LPS binding protein, an established marker of gut inflammation and dysbiosis. Our results showed that male HF-FO and HF-FO-OLZ mice had lower body weights, at sacrifice, compared to the HF group, with a comparable body weight across groups in female mice. HF-FO and HF-FO-OLZ male groups also exhibited lower serum LPS binding protein levels compared to the HF group, with no differences across groups in female mice. Gut microbiota profiles were also different among the four groups; the Bacteroidetes-to-Firmicutes (B/F) ratio had the lowest value of 0.51 in the HF group compared to 0.6 in HF-OLZ, 0.9 in HF-FO, and 1.1 in HF-FO-OLZ, with no differences in female mice. In conclusion, FO reduced dietary obesity and its associated inflammation and increased the B/F ratio in male mice but did not benefit the female mice. Although the weight lowering effects of OLZ were unexpected, FO effects persisted in the presence of olanzapine, demonstrating its potential protective effects in male subjects using antipsychotic drugs.


Assuntos
Óleos de Peixe/administração & dosagem , Microbioma Gastrointestinal/efeitos dos fármacos , Obesidade/terapia , Olanzapina/efeitos adversos , Caracteres Sexuais , Animais , Peso Corporal , Dieta Hiperlipídica/efeitos adversos , Suplementos Nutricionais , Feminino , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Obesos , Obesidade/etiologia , Aumento de Peso/efeitos dos fármacos
7.
Biology (Basel) ; 10(6)2021 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-34073810

RESUMO

Epidemiological Modeling supports the evaluation of various disease management activities. The value of epidemiological models lies in their ability to study various scenarios and to provide governments with a priori knowledge of the consequence of disease incursions and the impact of preventive strategies. A prevalent method of modeling the spread of pandemics is to categorize individuals in the population as belonging to one of several distinct compartments, which represents their health status with regard to the pandemic. In this work, a modified SIR epidemic model is proposed and analyzed with respect to the identification of its parameters and initial values based on stated or recorded case data from public health sources to estimate the unreported cases and the effectiveness of public health policies such as social distancing in slowing the spread of the epidemic. The analysis aims to highlight the importance of unreported cases for correcting the underestimated basic reproduction number. In many epidemic outbreaks, the number of reported infections is likely much lower than the actual number of infections which can be calculated from the model's parameters derived from reported case data. The analysis is applied to the COVID-19 pandemic for several countries in the Gulf region and Europe.

8.
Brief Bioinform ; 22(2): 2126-2140, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32363397

RESUMO

Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.


Assuntos
Escherichia coli/genética , Aprendizado de Máquina , Regiões Promotoras Genéticas , Conjuntos de Dados como Assunto , Genes Bacterianos , Reprodutibilidade dos Testes
9.
J Diabetes Investig ; 12(6): 988-997, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33075216

RESUMO

AIMS/INTRODUCTION: The progression from prediabetes to type 2 diabetes is preventable by lifestyle intervention and/or pharmacotherapy in a large fraction of individuals with prediabetes. Our objective was to develop a risk score to screen for prediabetes in the Middle East, where diabetes prevalence is one of the highest in the world. MATERIALS AND METHODS: In this cross-sectional, case-control study, we used data of 4,895 controls and 2,373 prediabetic adults obtained from the Qatar Biobank cohort. Significant risk factors were identified by logistic regression and other machine learning methods. The receiver operating characteristic was used to calculate the area under curve, cut-off point, sensitivity, specificity, positive and negative predictive values. The prediabetes risk score was developed from data of Qatari citizens, as well as long-term (≥15 years) residents. RESULTS: The significant risk factors for the Prediabetes Risk Score in Qatar were age, sex, body mass index, waist circumference and blood pressure. The risk score ranges from 0 to 45. The area under the curve of the score was 80% (95% confidence interval 78-83%), and the cut-off point of 16 yielded sensitivity and specificity of 86.2% (95% confidence interval 82.7-89.2%) and 57.9% (95% confidence interval 65.5-71.4%), respectively. Prediabetes Risk Score in Qatar performed equally in Qatari nationals and long-term residents. CONCLUSIONS: Prediabetes Risk Score in Qatar is the first prediabetes screening score developed in a Middle Eastern population. It only uses risk factors measured non-invasively, is simple, cost-effective, and can be easily understood by the general public and health providers. Prediabetes Risk Score in Qatar is an important tool for early detection of prediabetes, and can help tremendously in curbing the diabetes epidemic in the region.


Assuntos
Bancos de Espécimes Biológicos/estatística & dados numéricos , Programas de Rastreamento/métodos , Estado Pré-Diabético/diagnóstico , Medição de Risco/métodos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Área Sob a Curva , Pressão Sanguínea , Índice de Massa Corporal , Estudos de Casos e Controles , Estudos Transversais , Diabetes Mellitus Tipo 2/prevenção & controle , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Catar , Valores de Referência , Fatores de Risco , Sensibilidade e Especificidade , Circunferência da Cintura , Adulto Jovem
10.
J Bioinform Comput Biol ; 18(4): 2050018, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32501138

RESUMO

Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and cell metabolism in prokaryotes such as bacteria. While evidence has emerged that protein histidine phosphorylation also occurs in more complex organisms, its role in mammalian cells has remained largely uncharted. Thus, it is highly desirable to develop computational tools that are able to identify histidine phosphorylation sites. Result: Here, we introduce PROSPECT that enables fast and accurate prediction of proteome-wide histidine phosphorylation substrates and sites. Our tool is based on a hybrid method that integrates the outputs of two convolutional neural network (CNN)-based classifiers and a random forest-based classifier. Three features, including the one-of-K coding, enhanced grouped amino acids content (EGAAC) and composition of k-spaced amino acid group pairs (CKSAAGP) encoding, were taken as the input to three classifiers, respectively. Our results show that it is able to accurately predict histidine phosphorylation sites from sequence information. Our PROSPECT web server is user-friendly and publicly available at http://PROSPECT.erc.monash.edu/. Conclusions: PROSPECT is superior than other pHis predictors in both the running speed and prediction accuracy and we anticipate that the PROSPECT webserver will become a popular tool for identifying the pHis sites in bacteria.


Assuntos
Histidina/metabolismo , Proteoma/metabolismo , Software , Biologia Computacional/métodos , Proteínas de Escherichia coli/metabolismo , Redes Neurais de Computação , Fosforilação
11.
Cells ; 9(6)2020 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-32466437

RESUMO

Overactivation of the renin-angiotensin system (RAS) during obesity disrupts adipocyte metabolic homeostasis and induces endoplasmic reticulum (ER) stress and inflammation; however, underlying mechanisms are not well known. We propose that overexpression of angiotensinogen (Agt), the precursor protein of RAS in adipose tissue or treatment of adipocytes with Angiotensin II (Ang II), RAS bioactive hormone, alters specific microRNAs (miRNA), that target ER stress and inflammation leading to adipocyte dysfunction. Epididymal white adipose tissue (WAT) from B6 wild type (Wt) and transgenic male mice overexpressing Agt (Agt-Tg) in adipose tissue and adipocytes treated with Ang II were used. Small RNA sequencing and microarray in WAT identified differentially expressed miRNAs and genes, out of which miR-690 and mitogen-activated protein kinase kinase 3 (MAP2K3) were validated as significantly up- and down-regulated, respectively, in Agt-Tg, and in Ang II-treated adipocytes compared to respective controls. Additionally, the direct regulatory role of miR-690 on MAP2K3 was confirmed using mimic, inhibitors and dual-luciferase reporter assay. Downstream protein targets of MAP2K3 which include p38, NF-κB, IL-6 and CHOP were all reduced. These results indicate a critical post-transcriptional role for miR-690 in inflammation and ER stress. In conclusion, miR-690 plays a protective function and could be a useful target to reduce obesity.


Assuntos
Angiotensina II/farmacologia , Estresse do Retículo Endoplasmático , Inflamação/genética , MicroRNAs/metabolismo , Células 3T3-L1 , Adipócitos/efeitos dos fármacos , Adipócitos/metabolismo , Tecido Adiposo Branco/metabolismo , Animais , Sequência de Bases , Biomarcadores/metabolismo , Estresse do Retículo Endoplasmático/efeitos dos fármacos , Estresse do Retículo Endoplasmático/genética , Regulação da Expressão Gênica/efeitos dos fármacos , Inflamação/patologia , Sistema de Sinalização das MAP Quinases/efeitos dos fármacos , Sistema de Sinalização das MAP Quinases/genética , Masculino , Camundongos , Camundongos Endogâmicos C57BL , MicroRNAs/genética , Sistema Renina-Angiotensina/efeitos dos fármacos , Sistema Renina-Angiotensina/genética , Reprodutibilidade dos Testes , Transdução de Sinais/efeitos dos fármacos
12.
Brief Bioinform ; 21(5): 1676-1696, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31714956

RESUMO

RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2-6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.


Assuntos
Biologia Computacional/métodos , Processamento Pós-Transcricional do RNA , RNA/genética , Análise de Sequência de RNA/métodos , Algoritmos , Aprendizado Profundo
13.
Bioinformatics ; 36(5): 1429-1438, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31603511

RESUMO

MOTIVATION: X-ray crystallography has facilitated the majority of protein structures determined to date. Sequence-based predictors that can accurately estimate protein crystallization propensities would be highly beneficial to overcome the high expenditure, large attrition rate, and to reduce the trial-and-error settings required for crystallization. RESULTS: In this study, we present a novel model, BCrystal, which uses an optimized gradient boosting machine (XGBoost) on sequence, structural and physio-chemical features extracted from the proteins of interest. BCrystal also provides explanations, highlighting the most important features for the predicted crystallization propensity of an individual protein using the SHAP algorithm. On three independent test sets, BCrystal outperforms state-of-the-art sequence-based methods by more than 12.5% in accuracy, 18% in recall and 0.253 in Matthew's correlation coefficient, with an average accuracy of 93.7%, recall of 96.63% and Matthew's correlation coefficient of 0.868. For relative solvent accessibility of exposed residues, we observed higher values to associate positively with protein crystallizability and the number of disordered regions, fraction of coils and tripeptide stretches that contain multiple histidines associate negatively with crystallizability. The higher accuracy of BCrystal enables it to accurately screen for sequence variants with enhanced crystallizability. AVAILABILITY AND IMPLEMENTATION: Our BCrystal webserver is at https://machinelearning-protein.qcri.org/ and source code is available at https://github.com/raghvendra5688/BCrystal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Cristalização , Cristalografia por Raios X , Software
14.
PLoS One ; 14(11): e0225382, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31756219

RESUMO

Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.


Assuntos
Biomarcadores/análise , Doenças Inflamatórias Intestinais/microbiologia , Metagenômica/métodos , Algoritmos , Humanos , Doenças Inflamatórias Intestinais/metabolismo , Aprendizado de Máquina , Modelos Teóricos
15.
Sci Rep ; 9(1): 14696, 2019 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-31604961

RESUMO

Broadly neutralizing antibodies (bNAbs) targeting the HIV-1 envelope glycoprotein (Env) have promising utility in prevention and treatment of HIV-1 infection, and several are currently undergoing clinical trials. Due to the high sequence diversity and mutation rate of HIV-1, viral isolates are often resistant to specific bNAbs. Currently, resistant isolates are commonly identified by time-consuming and expensive in vitro neutralization assays. Here, we report machine learning classifiers that accurately predict resistance of HIV-1 isolates to 33 bNAbs. Notably, our classifiers achieved an overall prediction accuracy of 96% for 212 clinical isolates from patients enrolled in four different clinical trials. Moreover, use of gradient boosting machine - a tree-based machine learning method - enabled us to identify critical features, which had high accordance with epitope residues that distinguished between antibody resistance and sensitivity. The availability of an in silico antibody resistance predictor should facilitate informed decisions of antibody usage and sequence-based monitoring of viral escape in clinical settings.


Assuntos
Anticorpos Amplamente Neutralizantes/imunologia , Confiabilidade dos Dados , Aprendizado Profundo , Farmacorresistência Viral/imunologia , Anticorpos Anti-HIV/imunologia , Infecções por HIV/imunologia , HIV-1/imunologia , Sítios de Ligação de Anticorpos , Simulação por Computador , Mapeamento de Epitopos/métodos , Infecções por HIV/virologia , HIV-1/classificação , Humanos , Idiótipos de Imunoglobulinas/imunologia , Testes de Neutralização , Prognóstico , Produtos do Gene env do Vírus da Imunodeficiência Humana/imunologia
16.
J Phys Chem A ; 123(33): 7323-7334, 2019 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-31343887

RESUMO

Forecasting the structural stability of hybrid organic/inorganic compounds, where polyatomic molecules replace atoms, is a challenging task; the composition space is vast, and the reference structure for the organic molecules is ambiguously defined. In this work, we use a range of machine-learning algorithms, constructed from state-of-the-art density functional theory data, to conduct a systematic analysis on the likelihood of a given cation to be housed in the perovskite structure. In particular, we consider both ABC3 chalcogenide (I-V-VI3) and halide (I-II-VII3) perovskites. We find that the effective atomic radius and the number of lone pairs residing on the A-site cation are sufficient features to describe the perovskite phase stability. Thus, the presented machine-learning approach provides an efficient way to map the phase stability of the vast class of compounds, including situations where a cation mixture replaces a single A-site cation. This work demonstrates that advanced electronic structure theory combined with machine-learning analysis can provide an efficient strategy superior to the conventional trial-and-error approach in materials design.

17.
Phys Chem Chem Phys ; 21(5): 2821, 2019 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-30657154

RESUMO

Correction for 'Exploring new approaches towards the formability of mixed-ion perovskites by DFT and machine learning' by Heesoo Park et al., Phys. Chem. Chem. Phys., 2019, DOI: 10.1039/c8cp06528d.

18.
Bioinformatics ; 35(8): 1388-1394, 2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30192921

RESUMO

MOTIVATION: Biological experiments including proteomics and transcriptomics approaches often reveal sets of proteins that are most likely to be involved in a disease/disorder. To understand the functional nature of a set of proteins, it is important to capture the function of the proteins as a group, even in cases where function of individual proteins is not known. In this work, we propose a model that takes groups of proteins found to work together in a certain biological context, integrates them into functional relevance networks, and subsequently employs an iterative inference on graphical models to identify group functions of the proteins, which are then extended to predict function of individual proteins. RESULTS: The proposed algorithm, iterative group function prediction (iGFP), depicts proteins as a graph that represents functional relevance of proteins considering their known functional, proteomics and transcriptional features. Proteins in the graph will be clustered into groups by their mutual functional relevance, which is iteratively updated using a probabilistic graphical model, the conditional random field. iGFP showed robust accuracy even when substantial amount of GO annotations were missing. The perspective of 'group' function annotation opens up novel approaches for understanding functional nature of proteins in biological systems.Availability and implementation: http://kiharalab.org/iGFP/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Algoritmos , Proteínas , Proteômica
19.
Bioinformatics ; 35(13): 2216-2225, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30462171

RESUMO

MOTIVATION: Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. RESULTS: Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew's correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. AVAILABILITY AND IMPLEMENTATION: The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Biologia Computacional , Cristalização , Proteínas
20.
Genome Res ; 29(1): 125-134, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30514702

RESUMO

Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.


Assuntos
População Negra/genética , Família , Genoma Humano , População Branca/genética , Feminino , Humanos , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...