Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
J Clin Med ; 13(4)2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38398418

RESUMO

Background: The current study explores the genetic underpinnings of cardiac arrhythmia phenotypes within Middle Eastern populations, which are under-represented in genomic medicine research. Methods: Whole-genome sequencing data from 14,259 individuals from the Qatar Biobank were used and contained 47.8% of Arab ancestry, 18.4% of South Asian ancestry, and 4.6% of African ancestry. The frequency of rare functional variants within a set of 410 candidate genes for cardiac arrhythmias was assessed. Polygenic risk score (PRS) performance for atrial fibrillation (AF) prediction was evaluated. Results: This study identified 1196 rare functional variants, including 162 previously linked to arrhythmia phenotypes, with varying frequencies across Arab, South Asian, and African ancestries. Of these, 137 variants met the pathogenic or likely pathogenic (P/LP) criteria according to ACMG guidelines. Of these, 91 were in ACMG actionable genes and were present in 1030 individuals (~7%). Ten P/LP variants showed significant associations with atrial fibrillation p < 2.4 × 10-10. Five out of ten existing PRSs were significantly associated with AF (e.g., PGS000727, p = 0.03, OR = 1.43 [1.03, 1.97]). Conclusions: Our study is the largest to study the genetic predisposition to arrhythmia phenotypes in the Middle East using whole-genome sequence data. It underscores the importance of including diverse populations in genomic investigations to elucidate the genetic landscape of cardiac arrhythmias and mitigate health disparities in genomic medicine.

2.
J Clin Med ; 13(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38202283

RESUMO

BACKGROUND: Resting electrocardiogram (ECG) is a valuable non-invasive diagnostic tool used in clinical medicine to assess the electrical activity of the heart while the patient is resting. Abnormalities in ECG may be associated with clinical biomarkers and can predict early stages of diseases. In this study, we evaluated the association between ECG traits, clinical biomarkers, and diseases and developed risk scores to predict the risk of developing coronary artery disease (CAD) in the Qatar Biobank. METHODS: This study used 12-lead ECG data from 13,827 participants. The ECG traits used for association analysis were RR, PR, QRS, QTc, PW, and JT. Association analysis using regression models was conducted between ECG variables and serum electrolytes, sugars, lipids, blood pressure (BP), blood and inflammatory biomarkers, and diseases (e.g., type 2 diabetes, CAD, and stroke). ECG-based and clinical risk scores were developed, and their performance was assessed to predict CAD. Classical regression and machine-learning models were used for risk score development. RESULTS: Significant associations were observed with ECG traits. RR showed the largest number of associations: e.g., positive associations with bicarbonate, chloride, HDL-C, and monocytes, and negative associations with glucose, insulin, neutrophil, calcium, and risk of T2D. QRS was positively associated with phosphorus, bicarbonate, and risk of CAD. Elevated QTc was observed in CAD patients, whereas decreased QTc was correlated with decreased levels of calcium and potassium. Risk scores developed using regression models were outperformed by machine-learning models. The area under the receiver operating curve reached 0.84 using a machine-learning model that contains ECG traits, sugars, lipids, serum electrolytes, and cardiovascular disease risk factors. The odds ratio for the top decile of CAD risk score compared to the remaining deciles was 13.99. CONCLUSIONS: ECG abnormalities were associated with serum electrolytes, sugars, lipids, and blood and inflammatory biomarkers. These abnormalities were also observed in T2D and CAD patients. Risk scores showed great predictive performance in predicting CAD.

3.
Circ Genom Precis Med ; 15(6): e003712, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36252120

RESUMO

BACKGROUND: Enthusiasm for using polygenic risk scores (PRSs) in clinical practice is tempered by concerns about their portability to diverse ancestry groups, thus motivating genome-wide association studies in non-European ancestry cohorts. METHODS: We conducted a genome-wide association study for coronary heart disease in a Middle Eastern cohort using whole genome sequencing and assessed the performance of 6 PRSs developed with methods including LDpred (PGS000296), metaGRS (PGS000018), Pruning and Thresholding (PGS000337), and an EnsemblePRS we developed. Additionally, we evaluated the burden of rare variants in lipid genes in cases and controls. Whole genome sequencing at 30× coverage was performed in 1067 coronary heart disease cases (mean age=59 years; 70.3% males) and 6170 controls (mean age=40 years; 43.5% males). RESULTS: The majority of PRSs performed well; odds ratio (OR) per 1 SD increase (OR1sd) was highest for PGS000337 (OR1sd=1.81, 95% CI [1.66-1.98], P=3.07×10-41). EnsemblePRS performed better than individual PRSs (OR1sd=1.8, 95% CI [1.66-1.96], P=5.89×10-44). The OR for the 10th decile versus the remaining deciles was >3.2 for PGS000337, PGS000296, PGS000018, and reached 4.58 for EnsemblePRS. Of 400 known genome-wide significant loci, 33 replicated at P<10-4. However, the 9p21 locus did not replicate. Six suggestive (P<10-5) new loci/genes with plausible biological function were identified (eg, CORO7, RBM47, PDE4D). The burden of rare functional variants in LDLR, APOB, PCSK9, and ANGPTL4 was greater in cases than controls. CONCLUSIONS: Overall, we demonstrate that PRSs derived from European ancestry genome-wide association studies performed well in a Middle Eastern cohort, suggesting these could be used in the clinical setting while ancestry-specific PRSs are developed.


Assuntos
Doença das Coronárias , Pró-Proteína Convertase 9 , Masculino , Humanos , Pessoa de Meia-Idade , Adulto , Feminino , Estudo de Associação Genômica Ampla , Fatores de Risco , Doença das Coronárias/genética , Sequenciamento Completo do Genoma , Proteínas de Ligação a RNA/genética
4.
Metabolites ; 12(6)2022 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-35736450

RESUMO

Coronary heart disease (CHD) is a major cause of death in Middle Eastern (ME) populations, with current studies of the metabolic fingerprints of CHD lacking in diversity. Identification of specific biomarkers to uncover potential mechanisms for developing predictive models and targeted therapies for CHD is urgently needed for the least-studied ME populations. A case-control study was carried out in a cohort of 1001 CHD patients and 2999 controls. Untargeted metabolomics was used, generating 1159 metabolites. Univariate and pathway enrichment analyses were performed to understand functional changes in CHD. A metabolite risk score (MRS) was developed to assess the predictive performance of CHD using multivariate analysis and machine learning. A total of 511 metabolites were significantly different between the CHD patients and the controls (FDR p < 0.05). The enriched pathways (FDR p < 10−300) included D-arginine and D-ornithine metabolism, glycolysis, oxidation and degradation of branched chain fatty acids, and sphingolipid metabolism. MRS showed good discriminative power between the CHD cases and the controls (AUC = 0.99). In this first study in the Middle East, known and novel circulating metabolites and metabolic pathways associated with CHD were identified. A small panel of metabolites can efficiently discriminate CHD cases and controls and therefore can be used as a diagnostic/predictive tool.

5.
Lancet Oncol ; 23(3): 341-352, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35150601

RESUMO

BACKGROUND: Disparities in the genetic risk of cancer among various ancestry groups and populations remain poorly defined. This challenge is even more acute for Middle Eastern populations, where the paucity of genomic data could affect the clinical potential of cancer genetic risk profiling. We used data from the phase 1 cohort of the Qatar Genome Programme to investigate genetic variation in cancer-susceptibility genes in the Qatari population. METHODS: The Qatar Genome Programme generated high-coverage genome sequencing on DNA samples collected from 6142 native Qataris, stratified into six distinct ancestry groups: general Arab, Persian, Arabian Peninsula, Admixture Arab, African, and South Asian. In this population-based, cohort study, we evaluated the performance of polygenic risk scores for the most common cancers in Qatar (breast, prostate, and colorectal cancers). Polygenic risk scores were trained in The Cancer Genome Atlas (TCGA) dataset, and their distributions were subsequently applied to the six different genetic ancestry groups of the Qatari population. Rare deleterious variants within 1218 cancer susceptibility genes were analysed, and their clinical pathogenicity was assessed by ClinVar and the CharGer computational tools. FINDINGS: The cohort included in this study was recruited by the Qatar Biobank between Dec 11, 2012, and June 9, 2016. The initial dataset comprised 6218 cohort participants, and whole genome sequencing quality control filtering led to a final dataset of 6142 samples. Polygenic risk score analyses of the most common cancers in Qatar showed significant differences between the six ancestry groups (p<0·0001). Qataris with Arabian Peninsula ancestry showed the lowest polygenic risk score mean for colorectal cancer (-0·41), and those of African ancestry showed the highest average for prostate cancer (0·85). Cancer-gene rare variant analysis identified 76 Qataris (1·2% of 6142 individuals in the Qatar Genome Programme cohort) carrying ClinVar pathogenic or likely pathogenic variants in clinically actionable cancer genes. Variant analysis using CharGer identified 195 individuals carriers (3·17% of the cohort). Breast cancer pathogenic variants were over-represented in Qataris of Persian origin (22 [56·4%] of 39 BRCA1/BRCA2 variant carriers) and completely absent in those of Arabian Peninsula origin. INTERPRETATION: We observed a high degree of heterogeneity for cancer predisposition genes and polygenic risk scores across ancestries in this population from Qatar. Stratification systems could be considered for the implementation of national cancer preventive medicine programmes. FUNDING: Qatar Foundation.


Assuntos
Predisposição Genética para Doença , Neoplasias , Estudos de Coortes , Humanos , Masculino , Neoplasias/epidemiologia , Neoplasias/genética , Oncogenes , Catar/epidemiologia
6.
STAR Protoc ; 3(4): 101809, 2022 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-36595917

RESUMO

Germline genetic variants modulate human immune response. We present analytical pipelines for assessing the contribution of hosts' genetic background to the immune landscape of solid tumors using harmonized data from more than 9,000 patients in The Cancer Genome Atlas (TCGA). These include protocols for heritability, genome-wide association studies (GWAS), colocalization, and rare variant analyses. These workflows are developed around the structure of TCGA but can be adapted to explore other repositories or in the context of cancer immunotherapy. For complete details on the use and execution of this protocol, please refer to Sayaman et al. (2021).


Assuntos
Estudo de Associação Genômica Ampla , Neoplasias , Humanos , Estudo de Associação Genômica Ampla/métodos , Neoplasias/genética , Neoplasias/terapia , Genoma , Imunidade , Células Germinativas
7.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33979427

RESUMO

A cancer immune phenotype characterized by an active T-helper 1 (Th1)/cytotoxic response is associated with responsiveness to immunotherapy and favorable prognosis across different tumors. However, in some cancers, such an intratumoral immune activation does not confer protection from progression or relapse. Defining mechanisms associated with immune evasion is imperative to refine stratification algorithms, to guide treatment decisions and to identify candidates for immune-targeted therapy. Molecular alterations governing mechanisms for immune exclusion are still largely unknown. The availability of large genomic datasets offers an opportunity to ascertain key determinants of differential intratumoral immune response. We follow a network-based protocol to identify transcription regulators (TRs) associated with poor immunologic antitumor activity. We use a consensus of four different pipelines consisting of two state-of-the-art gene regulatory network inference techniques, regularized gradient boosting machines and ARACNE to determine TR regulons, and three separate enrichment techniques, including fast gene set enrichment analysis, gene set variation analysis and virtual inference of protein activity by enriched regulon analysis to identify the most important TRs affecting immunologic antitumor activity. These TRs, referred to as master regulators (MRs), are unique to immune-silent and immune-active tumors, respectively. We validated the MRs coherently associated with the immune-silent phenotype across cancers in The Cancer Genome Atlas and a series of additional datasets in the Prediction of Clinical Outcomes from Genomic Profiles repository. A downstream analysis of MRs specific to the immune-silent phenotype resulted in the identification of several enriched candidate pathways, including NOTCH1, TGF-$\beta $, Interleukin-1 and TNF-$\alpha $ signaling pathways. TGFB1I1 emerged as one of the main negative immune modulators preventing the favorable effects of a Th1/cytotoxic response.


Assuntos
Biomarcadores Tumorais , Suscetibilidade a Doenças , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias/etiologia , Neoplasias/metabolismo , Fenótipo , Biologia Computacional/métodos , Bases de Dados Genéticas , Suscetibilidade a Doenças/imunologia , Perfilação da Expressão Gênica/métodos , Humanos , Imunofenotipagem , Reprodutibilidade dos Testes , Transdução de Sinais , Transcriptoma
8.
Bioinformatics ; 36(5): 1429-1438, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31603511

RESUMO

MOTIVATION: X-ray crystallography has facilitated the majority of protein structures determined to date. Sequence-based predictors that can accurately estimate protein crystallization propensities would be highly beneficial to overcome the high expenditure, large attrition rate, and to reduce the trial-and-error settings required for crystallization. RESULTS: In this study, we present a novel model, BCrystal, which uses an optimized gradient boosting machine (XGBoost) on sequence, structural and physio-chemical features extracted from the proteins of interest. BCrystal also provides explanations, highlighting the most important features for the predicted crystallization propensity of an individual protein using the SHAP algorithm. On three independent test sets, BCrystal outperforms state-of-the-art sequence-based methods by more than 12.5% in accuracy, 18% in recall and 0.253 in Matthew's correlation coefficient, with an average accuracy of 93.7%, recall of 96.63% and Matthew's correlation coefficient of 0.868. For relative solvent accessibility of exposed residues, we observed higher values to associate positively with protein crystallizability and the number of disordered regions, fraction of coils and tripeptide stretches that contain multiple histidines associate negatively with crystallizability. The higher accuracy of BCrystal enables it to accurately screen for sequence variants with enhanced crystallizability. AVAILABILITY AND IMPLEMENTATION: Our BCrystal webserver is at https://machinelearning-protein.qcri.org/ and source code is available at https://github.com/raghvendra5688/BCrystal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Cristalização , Cristalografia por Raios X , Software
9.
Genome Res ; 29(1): 125-134, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30514702

RESUMO

Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.


Assuntos
População Negra/genética , Família , Genoma Humano , População Branca/genética , Feminino , Humanos , Masculino
10.
Bioinformatics ; 35(13): 2216-2225, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30462171

RESUMO

MOTIVATION: Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. RESULTS: Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew's correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. AVAILABILITY AND IMPLEMENTATION: The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Biologia Computacional , Cristalização , Proteínas
11.
Bioinformatics ; 34(15): 2605-2613, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29554211

RESUMO

Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Proteínas/química , Sequência de Aminoácidos , Simulação por Computador , Solubilidade
12.
Nucleic Acids Res ; 46(7): e39, 2018 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-29361062

RESUMO

We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.


Assuntos
Redes Reguladoras de Genes/genética , Glioma/genética , Motivos de Nucleotídeos/genética , Fatores de Transcrição/genética , Algoritmos , Regulação Neoplásica da Expressão Gênica/genética , Glioma/classificação , Glioma/patologia , Humanos , Aprendizado de Máquina , Proteínas Associadas aos Microtúbulos/genética , Receptor Tipo 3 de Fator de Crescimento de Fibroblastos/genética
13.
Bioinformatics ; 34(9): 1591-1593, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29267877

RESUMO

Summary: Genome-wide association studies have become common over the last ten years, with a shift towards targeting rare variants, especially in pedigree-data. Despite lower costs, sequencing for rare variants still remains expensive. To have a relatively large sample with acceptable cost, imputation approaches may be used, such as GIGI for pedigree data. GIGI is an imputation method that handles large pedigrees and is particularly good for rare variant imputation. GIGI requires a subset of individuals in a pedigree to be fully sequenced, while other individuals are sequenced only at relevant markers. The imputation will infer the missing genotypes at untyped markers. Running GIGI on large pedigrees for large numbers of markers can be very time consuming. We present GIGI-Quick as a method to efficiently split GIGI's input, run GIGI in parallel and efficiently merge the output to reduce the runtime with the number of cores. This allows obtaining imputation results faster, and therefore all subsequent association analyses. Availability and and implementation: GIGI-Quick is open source and publicly available via: https://cse-git.qcri.org/Imputation/GIGI-Quick. Contact: msaad@hbku.edu.qa. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Genótipo , Linhagem , Software
14.
Bioinformatics ; 34(7): 1092-1098, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29069295

RESUMO

Motivation: Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought. Results: In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthew's correlation coefficient, with an overall accuracy of 74% and Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability. Availability and implementation: PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP). Contact: gwo-yu.chuang@nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Biologia Computacional/métodos , Reprodutibilidade dos Testes , Solubilidade
15.
BMC Bioinformatics ; 17(1): 533, 2016 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-27978812

RESUMO

BACKGROUND: The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. RESULTS: Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew's correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. CONCLUSION: We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Teorema de Bayes , Modelos Moleculares , Proteínas/genética , Análise de Sequência de Proteína/métodos , Software
17.
PLoS One ; 10(11): e0143245, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26579711

RESUMO

The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env.


Assuntos
Evolução Molecular , Glicoproteínas/genética , HIV-1/metabolismo , Produtos do Gene env do Vírus da Imunodeficiência Humana/genética , Aminoácidos/química , Antígenos CD4/metabolismo , Glicoproteínas/química , Modelos Moleculares , Estrutura Terciária de Proteína , Internalização do Vírus , Produtos do Gene env do Vírus da Imunodeficiência Humana/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...