Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 71
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Am J Hum Genet ; 108(12): 2301-2318, 2021 12 02.
Article in English | MEDLINE | ID: mdl-34762822

ABSTRACT

Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.


Subject(s)
Databases, Genetic , Gain of Function Mutation , Loss of Function Mutation , Proteins/genetics , Cloud Computing , Genetic Predisposition to Disease , Genome, Human , Germ-Line Mutation , Humans , Internet-Based Intervention , Machine Learning
2.
Indian J Med Res ; 159(2): 223-231, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38517215

ABSTRACT

BACKGROUND OBJECTIVES: The Omicron sub-lineages are known to have higher infectivity, immune escape and lower virulence. During December 2022 - January 2023 and March - April 2023, India witnessed increased SARS-CoV-2 infections, mostly due to newer Omicron sub-lineages. With this unprecedented rise in cases, we assessed the neutralization potential of individuals vaccinated with ChAdOx1 nCoV (Covishield) and BBV152 (Covaxin) against emerging Omicron sub-lineages. METHODS: Neutralizing antibody responses were measured in the sera collected from individuals six months post-two doses (n=88) of Covishield (n=44) or Covaxin (n=44) and post-three doses (n=102) of Covishield (n=46) or Covaxin (n=56) booster dose against prototype B.1 strain, lineages of Omicron; XBB.1, BQ.1, BA.5.2 and BF.7. RESULTS: The sera of individuals collected six months after the two-dose and the three-dose demonstrated neutralizing activity against all variants. The neutralizing antibody (NAbs) level was highest against the prototype B.1 strain, followed by BA5.2 (5-6 fold lower), BF.7 (11-12 fold lower), BQ.1 (12 fold lower) and XBB.1 (18-22 fold lower). INTERPRETATION CONCLUSIONS: Persistence of NAb responses was comparable in individuals with two- and three-dose groups post six months of vaccination. Among the Omicron sub-variants, XBB.1 showed marked neutralization escape, thus pointing towards an eventual immune escape, which may cause more infections. Further, the correlation of study data with complete clinical profile of the participants along with observations for cell-mediated immunity may provide a clear picture for the sustained protection due to three-dose vaccination as well as hybrid immunity against the newer variants.


Subject(s)
COVID-19 Vaccines , COVID-19 , ChAdOx1 nCoV-19 , Vaccines, Inactivated , Humans , COVID-19/prevention & control , SARS-CoV-2 , Antibodies, Neutralizing , Vaccination , Antibodies, Viral
3.
Hum Mol Genet ; 30(10): 952-960, 2021 05 29.
Article in English | MEDLINE | ID: mdl-33704450

ABSTRACT

Diabetic retinopathy (DR) is a common consequence in type 2 diabetes (T2D) and a leading cause of blindness in working-age adults. Yet, its genetic predisposition is largely unknown. Here, we examined the polygenic architecture underlying DR by deriving and assessing a genome-wide polygenic risk score (PRS) for DR. We evaluated the PRS in 6079 individuals with T2D of European, Hispanic, African and other ancestries from a large-scale multi-ethnic biobank. Main outcomes were PRS association with DR diagnosis, symptoms and complications, and time to diagnosis, and transferability to non-European ancestries. We observed that PRS was significantly associated with DR. A standard deviation increase in PRS was accompanied by an adjusted odds ratio (OR) of 1.12 [95% confidence interval (CI) 1.04-1.20; P = 0.001] for DR diagnosis. When stratified by ancestry, PRS was associated with the highest OR in European ancestry (OR = 1.22, 95% CI 1.02-1.41; P = 0.049), followed by African (OR = 1.15, 95% CI 1.03-1.28; P = 0.028) and Hispanic ancestries (OR = 1.10, 95% CI 1.00-1.10; P = 0.050). Individuals in the top PRS decile had a 1.8-fold elevated risk for DR versus the bottom decile (P = 0.002). Among individuals without DR diagnosis, the top PRS decile had more DR symptoms than the bottom decile (P = 0.008). The PRS was associated with retinal hemorrhage (OR = 1.44, 95% CI 1.03-2.02; P = 0.03) and earlier DR presentation (10% probability of DR by 4 years in the top PRS decile versus 8 years in the bottom decile). These results establish the significant polygenic underpinnings of DR and indicate the need for more diverse ancestries in biobanks to develop multi-ancestral PRS.


Subject(s)
Diabetes Mellitus, Type 2/epidemiology , Diabetic Retinopathy/epidemiology , Genetic Predisposition to Disease , Genome-Wide Association Study , Adult , Aged , Black People/genetics , Diabetes Mellitus, Type 2/complications , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/pathology , Diabetic Retinopathy/complications , Diabetic Retinopathy/genetics , Diabetic Retinopathy/pathology , Hispanic or Latino/genetics , Humans , Middle Aged , Multifactorial Inheritance/genetics , Risk Assessment , Risk Factors , White People/genetics
4.
Am Heart J ; 250: 29-33, 2022 08.
Article in English | MEDLINE | ID: mdl-35526571

ABSTRACT

Genetic risk for coronary artery disease (CAD) is commonly measured with polygenic risk scores (PRS); yet, the relationship of atherosclerotic burden with PRS in healthy individuals not at high clinical risk for CAD (ie, without a high pooled cohort equations [PCE] score) is unknown. Here, we implemented a novel recall-by-PRS strategy to measure coronary artery calcium (CAC) scores prospectively in 53 healthy individuals with extreme high PRS (median [IQR] PRS = 94% [83-98]) and low PRS (median [IQR] PRS = 3.6% [1.2-10]). The high PRS group was associated with a 2.8-fold greater CAC than the low PRS group, adjusted for age, sex, BMI, smoking, and statin use, and had a 6.7-fold greater proportion of individuals with CAC exceeding 300 HU. These findings reveal that extreme PRS tracks with CAD risk even in those without high clinical risk and demonstrate proof of principle for recall-by-PRS approaches that should be assessed prospectively in larger trials.


Subject(s)
Calcium , Coronary Artery Disease , Calcium, Dietary , Cohort Studies , Coronary Artery Disease/genetics , Humans , Risk Assessment , Risk Factors
5.
J Am Soc Nephrol ; 32(1): 151-160, 2021 01.
Article in English | MEDLINE | ID: mdl-32883700

ABSTRACT

BACKGROUND: Early reports indicate that AKI is common among patients with coronavirus disease 2019 (COVID-19) and associated with worse outcomes. However, AKI among hospitalized patients with COVID-19 in the United States is not well described. METHODS: This retrospective, observational study involved a review of data from electronic health records of patients aged ≥18 years with laboratory-confirmed COVID-19 admitted to the Mount Sinai Health System from February 27 to May 30, 2020. We describe the frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aORs) with mortality. RESULTS: Of 3993 hospitalized patients with COVID-19, AKI occurred in 1835 (46%) patients; 347 (19%) of the patients with AKI required dialysis. The proportions with stages 1, 2, or 3 AKI were 39%, 19%, and 42%, respectively. A total of 976 (24%) patients were admitted to intensive care, and 745 (76%) experienced AKI. Of the 435 patients with AKI and urine studies, 84% had proteinuria, 81% had hematuria, and 60% had leukocyturia. Independent predictors of severe AKI were CKD, men, and higher serum potassium at admission. In-hospital mortality was 50% among patients with AKI versus 8% among those without AKI (aOR, 9.2; 95% confidence interval, 7.5 to 11.3). Of survivors with AKI who were discharged, 35% had not recovered to baseline kidney function by the time of discharge. An additional 28 of 77 (36%) patients who had not recovered kidney function at discharge did so on posthospital follow-up. CONCLUSIONS: AKI is common among patients hospitalized with COVID-19 and is associated with high mortality. Of all patients with AKI, only 30% survived with recovery of kidney function by the time of discharge.


Subject(s)
Acute Kidney Injury/etiology , COVID-19/complications , SARS-CoV-2 , Acute Kidney Injury/epidemiology , Acute Kidney Injury/therapy , Acute Kidney Injury/urine , Aged , Aged, 80 and over , COVID-19/mortality , Female , Hematuria/etiology , Hospital Mortality , Hospitals, Private/statistics & numerical data , Hospitals, Urban/statistics & numerical data , Humans , Incidence , Inpatients , Leukocytes , Male , Middle Aged , New York City/epidemiology , Proteinuria/etiology , Renal Dialysis , Retrospective Studies , Treatment Outcome , Urine/cytology
6.
JAMA ; 327(4): 350-359, 2022 01 25.
Article in English | MEDLINE | ID: mdl-35076666

ABSTRACT

Importance: Population-based assessment of disease risk associated with gene variants informs clinical decisions and risk stratification approaches. Objective: To evaluate the population-based disease risk of clinical variants in known disease predisposition genes. Design, Setting, and Participants: This cohort study included 72 434 individuals with 37 780 clinical variants who were enrolled in the BioMe Biobank from 2007 onwards with follow-up until December 2020 and the UK Biobank from 2006 to 2010 with follow-up until June 2020. Participants had linked exome and electronic health record data, were older than 20 years, and were of diverse ancestral backgrounds. Exposures: Variants previously reported as pathogenic or predicted to cause a loss of protein function by bioinformatic algorithms (pathogenic/loss-of-function variants). Main Outcomes and Measures: The primary outcome was the disease risk associated with clinical variants. The risk difference (RD) between the prevalence of disease in individuals with a variant allele (penetrance) vs in individuals with a normal allele was measured. Results: Among 72 434 study participants, 43 395 were from the UK Biobank (mean [SD] age, 57 [8.0] years; 24 065 [55%] women; 2948 [7%] non-European) and 29 039 were from the BioMe Biobank (mean [SD] age, 56 [16] years; 17 355 [60%] women; 19 663 [68%] non-European). Of 5360 pathogenic/loss-of-function variants, 4795 (89%) were associated with an RD less than or equal to 0.05. Mean penetrance was 6.9% (95% CI, 6.0%-7.8%) for pathogenic variants and 0.85% (95% CI, 0.76%-0.95%) for benign variants reported in ClinVar (difference, 6.0 [95% CI, 5.6-6.4] percentage points), with a median of 0% for both groups due to large numbers of nonpenetrant variants. Penetrance of pathogenic/loss-of-function variants for late-onset diseases was modified by age: mean penetrance was 10.3% (95% CI, 9.0%-11.6%) in individuals 70 years or older and 8.5% (95% CI, 7.9%-9.1%) in individuals 20 years or older (difference, 1.8 [95% CI, 0.40-3.3] percentage points). Penetrance of pathogenic/loss-of-function variants was heterogeneous even in known disease predisposition genes, including BRCA1 (mean [range], 38% [0%-100%]), BRCA2 (mean [range], 38% [0%-100%]), and PALB2 (mean [range], 26% [0%-100%]). Conclusions and Relevance: In 2 large biobank cohorts, the estimated penetrance of pathogenic/loss-of-function variants was variable but generally low. Further research of population-based penetrance is needed to refine variant interpretation and clinical evaluation of individuals with these variant alleles.


Subject(s)
Genetic Predisposition to Disease , Genetic Variation , Loss of Function Mutation , Penetrance , Aged , Biological Specimen Banks , Cohort Studies , Female , Humans , Male , Mutation , United Kingdom
7.
Hum Mutat ; 42(8): 969-977, 2021 08.
Article in English | MEDLINE | ID: mdl-34005834

ABSTRACT

Biobanks with exomes linked to electronic health records (EHRs) enable the study of genetic pleiotropy between rare variants and seemingly disparate diseases. We performed robust clinical phenotyping of rare, putatively deleterious variants (loss-of-function [LoF] and deleterious missense variants) in ERCC6, a gene implicated in inherited retinal disease. We analyzed 213,084 exomes, along with a targeted set of retinal, cardiac, and immune phenotypes from two large-scale EHR-linked biobanks. In the primary analysis, a burden of deleterious variants in ERCC6 was strongly associated with (1) retinal disorders; (2) cardiac and electrocardiogram perturbations; and (3) immunodeficiency and decreased immunoglobulin levels. Meta-analysis of results from the BioMe Biobank and UK Biobank showed a significant association of deleterious ERCC6 burden with retinal dystrophy (odds ratio [OR] = 2.6, 95% confidence interval [CI]: 1.5-4.6; p = 8.7 × 10-4 ), atypical atrial flutter (OR = 3.5, 95% CI: 1.9-6.5; p = 6.2 × 10-5 ), arrhythmia (OR = 1.5, 95% CI: 1.2-2.0; p = 2.7 × 10-3 ), and lymphocyte immunodeficiency (OR = 3.8, 95% CI: 2.1-6.8; p = 5.0 × 10-6 ). Carriers of ERCC6 LoF variants who lacked a diagnosis of these conditions exhibited increased symptoms, indicating underdiagnosis. These results reveal a unique genetic link among retinal, cardiac, and immune disorders and underscore the value of EHR-linked biobanks in assessing the full clinical profile of carriers of rare variants.


Subject(s)
Genetic Pleiotropy , Retinal Dystrophies , Arrhythmias, Cardiac , DNA Helicases , DNA Repair Enzymes , Exome , Humans , Poly-ADP-Ribose Binding Proteins , Retinal Dystrophies/genetics , Exome Sequencing/methods
8.
Blood Purif ; 50(4-5): 621-627, 2021.
Article in English | MEDLINE | ID: mdl-33631752

ABSTRACT

BACKGROUND/AIMS: Acute kidney injury (AKI) in critically ill patients is common, and continuous renal replacement therapy (CRRT) is a preferred mode of renal replacement therapy (RRT) in hemodynamically unstable patients. Prediction of clinical outcomes in patients on CRRT is challenging. We utilized several approaches to predict RRT-free survival (RRTFS) in critically ill patients with AKI requiring CRRT. METHODS: We used the Medical Information Mart for Intensive Care (MIMIC-III) database to identify patients ≥18 years old with AKI on CRRT, after excluding patients who had ESRD on chronic dialysis, and kidney transplantation. We defined RRTFS as patients who were discharged alive and did not require RRT ≥7 days prior to hospital discharge. We utilized all available biomedical data up to CRRT initiation. We evaluated 7 approaches, including logistic regression (LR), random forest (RF), support vector machine (SVM), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), and MLP with long short-term memory (MLP + LSTM). We evaluated model performance by using area under the receiver operating characteristic (AUROC) curves. RESULTS: Out of 684 patients with AKI on CRRT, 205 (30%) patients had RRTFS. The median age of patients was 63 years and their median Simplified Acute Physiology Score (SAPS) II was 67 (interquartile range 52-84). The MLP + LSTM showed the highest AUROC (95% CI) of 0.70 (0.67-0.73), followed by MLP 0.59 (0.54-0.64), LR 0.57 (0.52-0.62), SVM 0.51 (0.46-0.56), AdaBoost 0.51 (0.46-0.55), RF 0.44 (0.39-0.48), and XGBoost 0.43 (CI 0.38-0.47). CONCLUSIONS: A MLP + LSTM model outperformed other approaches for predicting RRTFS. Performance could be further improved by incorporating other data types.


Subject(s)
Acute Kidney Injury/therapy , Renal Replacement Therapy , Acute Kidney Injury/diagnosis , Age Factors , Aged , Critical Care , Female , Humans , Logistic Models , Machine Learning , Male , Middle Aged , Prognosis
9.
Kidney Int ; 97(2): 383-392, 2020 02.
Article in English | MEDLINE | ID: mdl-31883805

ABSTRACT

Symptoms are common in patients on maintenance hemodialysis but identification is challenging. New informatics approaches including natural language processing (NLP) can be utilized to identify symptoms from narrative clinical documentation. Here we utilized NLP to identify seven patient symptoms from notes of maintenance hemodialysis patients of the BioMe Biobank and validated our findings using a separate cohort and the MIMIC-III database. NLP performance was compared for symptom detection with International Classification of Diseases (ICD)-9/10 codes and the performance of both methods were validated against manual chart review. From 1034 and 519 hemodialysis patients within BioMe and MIMIC-III databases, respectively, the most frequently identified symptoms by NLP were fatigue, pain, and nausea/vomiting. In BioMe, sensitivity for NLP (0.85 - 0.99) was higher than for ICD codes (0.09 - 0.59) for all symptoms with similar results in the BioMe validation cohort and MIMIC-III. ICD codes were significantly more specific for nausea/vomiting in BioMe and more specific for fatigue, depression, and pain in the MIMIC-III database. A majority of patients in both cohorts had four or more symptoms. Patients with more symptoms identified by NLP, ICD, and chart review had more clinical encounters. NLP had higher specificity in inpatient notes but higher sensitivity in outpatient notes and performed similarly across pain severity subgroups. Thus, NLP had higher sensitivity compared to ICD codes for identification of seven common hemodialysis-related symptoms, with comparable specificity between the two methods. Hence, NLP may be useful for the high-throughput identification of patient-centered outcomes when using electronic health records.


Subject(s)
Electronic Health Records , Natural Language Processing , Algorithms , Databases, Factual , Humans , Renal Dialysis/adverse effects
10.
Kidney Int ; 98(5): 1323-1330, 2020 11.
Article in English | MEDLINE | ID: mdl-32540406

ABSTRACT

Urinary tract stones have high heritability indicating a strong genetic component. However, genome-wide association studies (GWAS) have uncovered only a few genome wide significant single nucleotide polymorphisms (SNPs). Polygenic risk scores (PRS) sum cumulative effect of many SNPs and shed light on underlying genetic architecture. Using GWAS summary statistics from 361,141 participants in the United Kingdom Biobank, we generated a PRS and determined association with stone diagnosis in 28,877 participants in the Mount Sinai BioMe Biobank. In BioMe (1,071 cases and 27,806 controls), for every standard deviation increase, we observed a significant increment in adjusted odds ratio of a factor of 1.2 (95% confidence interval 1.13-1.26). In comparison, a risk score comprised of GWAS significant SNPs was not significantly associated with diagnosis. After stratifying individuals into low and high-risk categories on clinical risk factors, there was a significant increment in adjusted odds ratio of 1.3 (1.12-1.6) in the low- and 1.2 (1.1-1.2) in the high-risk group for every standard deviation increment in PRS. In a 14,348-participant validation cohort (Penn Medicine Biobank), every standard deviation increment was associated with a significant adjusted odds ratio of 1.1 (1.03 - 1.2). Thus, a genome-wide PRS is associated with urinary tract stones overall and in the absence of known clinical risk factors and illustrates their complex polygenic architecture.


Subject(s)
Genome-Wide Association Study , Urinary Calculi , Genetic Predisposition to Disease , Humans , Multifactorial Inheritance , Polymorphism, Single Nucleotide , United Kingdom/epidemiology
11.
JAMA ; 322(22): 2191-2202, 2019 12 10.
Article in English | MEDLINE | ID: mdl-31821430

ABSTRACT

Importance: Hereditary transthyretin (TTR) amyloid cardiomyopathy (hATTR-CM) due to the TTR V122I variant is an autosomal-dominant disorder that causes heart failure in elderly individuals of African ancestry. The clinical associations of carrying the variant, its effect in other African ancestry populations including Hispanic/Latino individuals, and the rates of achieving a clinical diagnosis in carriers are unknown. Objective: To assess the association between the TTR V122I variant and heart failure and identify rates of hATTR-CM diagnosis among carriers with heart failure. Design, Setting, and Participants: Cross-sectional analysis of carriers and noncarriers of TTR V122I of African ancestry aged 50 years or older enrolled in the Penn Medicine Biobank between 2008 and 2017 using electronic health record data from 1996 to 2017. Case-control study in participants of African and Hispanic/Latino ancestry with and without heart failure in the Mount Sinai BioMe Biobank enrolled between 2007 and 2015 using electronic health record data from 2007 to 2018. Exposures: TTR V122I carrier status. Main Outcomes and Measures: The primary outcome was prevalent heart failure. The rate of diagnosis with hATTR-CM among TTR V122I carriers with heart failure was measured. Results: The cross-sectional cohort included 3724 individuals of African ancestry with a median age of 64 years (interquartile range, 57-71); 1755 (47%) were male, 2896 (78%) had a diagnosis of hypertension, and 753 (20%) had a history of myocardial infarction or coronary revascularization. There were 116 TTR V122I carriers (3.1%); 1121 participants (30%) had heart failure. The case-control study consisted of 2307 individuals of African ancestry and 3663 Hispanic/Latino individuals; the median age was 73 years (interquartile range, 68-80), 2271 (38%) were male, 4709 (79%) had a diagnosis of hypertension, and 1008 (17%) had a history of myocardial infarction or coronary revascularization. There were 1376 cases of heart failure. TTR V122I was associated with higher rates of heart failure (cross-sectional cohort: n = 51/116 TTR V122I carriers [44%], n = 1070/3608 noncarriers [30%], adjusted odds ratio, 1.7 [95% CI, 1.2-2.4], P = .006; case-control study: n = 36/1376 heart failure cases [2.6%], n = 82/4594 controls [1.8%], adjusted odds ratio, 1.8 [95% CI, 1.2-2.7], P = .008). Ten of 92 TTR V122I carriers with heart failure (11%) were diagnosed as having hATTR-CM; the median time from onset of symptoms to clinical diagnosis was 3 years. Conclusions and Relevance: Among individuals of African or Hispanic/Latino ancestry enrolled in 2 academic medical center-based biobanks, the TTR V122I genetic variant was significantly associated with heart failure.


Subject(s)
Amyloid Neuropathies, Familial/genetics , Black or African American/genetics , Heart Failure/genetics , Hispanic or Latino/genetics , Prealbumin/genetics , Academic Medical Centers , Aged , Amyloid Neuropathies, Familial/complications , Amyloid Neuropathies, Familial/ethnology , Biological Specimen Banks , Case-Control Studies , Cross-Sectional Studies , Female , Genetic Variation , Heart Failure/ethnology , Humans , Male , Middle Aged
12.
J Proteome Res ; 17(1): 337-347, 2018 01 05.
Article in English | MEDLINE | ID: mdl-29110491

ABSTRACT

Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+), and 67 negative estrogen receptor (ER-) to test the accuracies of feed-forward networks, a deep learning (DL) framework, as well as six widely used machine learning models, namely random forest (RF), support vector machines (SVM), recursive partitioning and regression trees (RPART), linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), and generalized boosted models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value <0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion and absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC = 0.93) and better revelation of disease biology. We encourage the adoption of feed-forward networks based deep learning method in the metabolomics research community for classification.


Subject(s)
Breast Neoplasms/classification , Machine Learning/standards , Metabolomics/methods , Receptors, Estrogen/analysis , Area Under Curve , Female , Humans
13.
J Transl Med ; 16(1): 181, 2018 07 03.
Article in English | MEDLINE | ID: mdl-29970096

ABSTRACT

BACKGROUND: Evidences in literature strongly advocate the potential of immunomodulatory peptides for use as vaccine adjuvants. All the mechanisms of vaccine adjuvants ensuing immunostimulatory effects directly or indirectly stimulate antigen presenting cells (APCs). While numerous methods have been developed in the past for predicting B cell and T-cell epitopes; no method is available for predicting the peptides that can modulate the APCs. METHODS: We named the peptides that can activate APCs as A-cell epitopes and developed methods for their prediction in this study. A dataset of experimentally validated A-cell epitopes was collected and compiled from various resources. To predict A-cell epitopes, we developed support vector machine-based machine learning models using different sequence-based features. RESULTS: A hybrid model developed on a combination of sequence-based features (dipeptide composition and motif occurrence), achieved the highest accuracy of 95.71% with Matthews correlation coefficient (MCC) value of 0.91 on the training dataset. We also evaluated the hybrid models on an independent dataset and achieved a comparable accuracy of 95.00% with MCC 0.90. CONCLUSION: The models developed in this study were implemented in a web-based platform VaxinPAD to predict and design immunomodulatory peptides or A-cell epitopes. This web server available at http://webs.iiitd.edu.in/raghava/vaxinpad/ will facilitate researchers in designing peptide-based vaccine adjuvants.


Subject(s)
Adjuvants, Immunologic/pharmacology , Antigen-Presenting Cells/drug effects , Computer Simulation , Drug Design , Vaccines, Subunit/pharmacology , Amino Acid Motifs , Amino Acid Sequence , Databases, Protein , Epitopes/metabolism , Humans , Immunologic Factors/pharmacology , Internet , Models, Theoretical , Support Vector Machine , User-Computer Interface , Vaccines, Subunit/chemistry
14.
Nucleic Acids Res ; 44(D1): D1098-103, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26586798

ABSTRACT

CPPsite 2.0 (http://crdd.osdd.net/raghava/cppsite/) is an updated version of manually curated database (CPPsite) of cell-penetrating peptides (CPPs). The current version holds around 1850 peptide entries, which is nearly two times than the entries in the previous version. The updated data were curated from research papers and patents published in last three years. It was observed that most of the CPPs discovered/ tested, in last three years, have diverse chemical modifications (e.g. non-natural residues, linkers, lipid moieties, etc.). We have compiled this information on chemical modifications systematically in the updated version of the database. In order to understand the structure-function relationship of these peptides, we predicted tertiary structure of CPPs, possessing both modified and natural residues, using state-of-the-art techniques. CPPsite 2.0 also maintains information about model systems (in vitro/in vivo) used for CPP evaluation and different type of cargoes (e.g. nucleic acid, protein, nanoparticles, etc.) delivered by these peptides. In order to assist a wide range of users, we developed a user-friendly responsive website, with various tools, suitable for smartphone, tablet and desktop users. In conclusion, CPPsite 2.0 provides significant improvements over the previous version in terms of data content.


Subject(s)
Cell-Penetrating Peptides/chemistry , Databases, Protein , Drug Carriers/chemistry , Protein Conformation , Structure-Activity Relationship
15.
Nucleic Acids Res ; 44(D1): D1119-26, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26527728

ABSTRACT

SATPdb (http://crdd.osdd.net/raghava/satpdb/) is a database of structurally annotated therapeutic peptides, curated from 22 public domain peptide databases/datasets including 9 of our own. The current version holds 19192 unique experimentally validated therapeutic peptide sequences having length between 2 and 50 amino acids. It covers peptides having natural, non-natural and modified residues. These peptides were systematically grouped into 10 categories based on their major function or therapeutic property like 1099 anticancer, 10585 antimicrobial, 1642 drug delivery and 1698 antihypertensive peptides. We assigned or annotated structure of these therapeutic peptides using structural databases (Protein Data Bank) and state-of-the-art structure prediction methods like I-TASSER, HHsearch and PEPstrMOD. In addition, SATPdb facilitates users in performing various tasks that include: (i) structure and sequence similarity search, (ii) peptide browsing based on their function and properties, (iii) identification of moonlighting peptides and (iv) searching of peptides having desired structure and therapeutic activities. We hope this database will be useful for researchers working in the field of peptide-based therapeutics.


Subject(s)
Databases, Pharmaceutical , Peptides/chemistry , Peptides/therapeutic use , Antihypertensive Agents/pharmacology , Antineoplastic Agents/pharmacology , Molecular Sequence Annotation , Peptides/pharmacology
16.
Nucleic Acids Res ; 43(Database issue): D956-62, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25392419

ABSTRACT

AHTPDB (http://crdd.osdd.net/raghava/ahtpdb/) is a manually curated database of experimentally validated antihypertensive peptides. Information pertaining to peptides with antihypertensive activity was collected from research articles and from various peptide repositories. These peptides were derived from 35 major sources that include milk, egg, fish, pork, chicken, soybean, etc. In AHTPDB, most of the peptides belong to a family of angiotensin-I converting enzyme inhibiting peptides. The current release of AHTPDB contains 5978 peptide entries among which 1694 are unique peptides. Each entry provides detailed information about a peptide like sequence, inhibitory concentration (IC50), toxicity/bitterness value, source, length, molecular mass and information related to purification of peptides. In addition, the database provides structural information of these peptides that includes predicted tertiary and secondary structures. A user-friendly web interface with various tools has been developed to retrieve and analyse the data. It is anticipated that AHTPDB will be a useful and unique resource for the researchers working in the field of antihypertensive peptides.


Subject(s)
Antihypertensive Agents/chemistry , Databases, Chemical , Peptides/chemistry , Peptides/pharmacology , Antihypertensive Agents/pharmacology , Antihypertensive Agents/toxicity , Internet , Peptides/toxicity , Software
17.
BMC Cancer ; 16: 77, 2016 Feb 09.
Article in English | MEDLINE | ID: mdl-26860193

ABSTRACT

BACKGROUND: In past, numerous quantitative structure-activity relationship (QSAR) based models have been developed for predicting anticancer activity for a specific class of molecules against different cancer drug targets. In contrast, limited attempt have been made to predict the anticancer activity of a diverse class of chemicals against a wide variety of cancer cell lines. In this study, we described a hybrid method developed on thousands of anticancer and non-anticancer molecules tested against National Cancer Institute (NCI) 60 cancer cell lines. RESULTS: Our analysis of anticancer molecules revealed that majority of anticancer molecules contains 18-24 carbon atoms and are dominated by functional groups like R2NH, R3N, ROH, RCOR, and ROR. It was also observed that certain substructures (e.g., 1-methoxy-4-methylbenzene, 1-methoxy benzene, Nitrobenzene, Indole, Propenyl benzene) are more abundant in anticancer molecules. Next, we developed anticancer molecule prediction models using various machine-learning techniques and achieved maximum matthews correlation coefficient (MCC) of 0.81 with 90.40% accuracy using support vector machine (SVM) based models. In another approach, a novel similarity or potency score based method has been developed using selected fragments/fingerprints and achieved maximum MCC of 0.82 with 90.65% accuracy. Finally, we combined the strength of above methods and developed a hybrid method with maximum MCC of 0.85 with 92.47% accuracy. CONCLUSIONS: We developed a hybrid method utilizing the best of machine learning and potency score based method. The highly accurate hybrid method can be used for classification of anticancer and non-anticancer molecules. In order to facilitate scientific community working in the field of anticancer drug discovery, we integrate hybrid and potency method in a web server CancerIN. This server provides various facilities that includes; virtual screening of anticancer molecules, analog based drug design, and similarity with known anticancer molecules ( http://crdd.osdd.net/oscadd/cancerin).


Subject(s)
Anticarcinogenic Agents/chemistry , Cell Line, Tumor/drug effects , Drug Evaluation, Preclinical , Neoplasms/drug therapy , Anticarcinogenic Agents/pharmacology , Carbon/chemistry , Computational Biology , Humans , Models, Molecular , Neoplasms/pathology , Software
18.
Nucleic Acids Res ; 42(Database issue): D444-9, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24174543

ABSTRACT

Hemolytik (http://crdd.osdd.net/raghava/hemolytik/) is a manually curated database of experimentally determined hemolytic and non-hemolytic peptides. Data were compiled from a large number of published research articles and various databases like Antimicrobial Peptide Database, Collection of Anti-microbial Peptides, Dragon Antimicrobial Peptide Database and Swiss-Prot. The current release of Hemolytik database contains ∼3000 entries that include ∼2000 unique peptides whose hemolytic activities were evaluated on erythrocytes isolated from as many as 17 different sources. Each entry in Hemolytik provides comprehensive information about a peptide, like its name, sequence, origin, reported function, property such as chirality, types (linear and cyclic), end modifications as well as details pertaining to its hemolytic activity. In addition, tertiary structure of each peptide has been predicted, and secondary structure states have been assigned. To facilitate the scientific community, a user-friendly interface has been developed with various tools for data searching and analysis. We hope, Hemolytik will be useful for researchers working in the field of designing therapeutic peptides.


Subject(s)
Databases, Protein , Hemolytic Agents/toxicity , Peptides/toxicity , Hemolysis , Hemolytic Agents/chemistry , Internet , Peptides/chemistry , Software
20.
J Transl Med ; 11: 74, 2013 Mar 22.
Article in English | MEDLINE | ID: mdl-23517638

ABSTRACT

BACKGROUND: Cell penetrating peptides have gained much recognition as a versatile transport vehicle for the intracellular delivery of wide range of cargoes (i.e. oligonucelotides, small molecules, proteins, etc.), that otherwise lack bioavailability, thus offering great potential as future therapeutics. Keeping in mind the therapeutic importance of these peptides, we have developed in silico methods for the prediction of cell penetrating peptides, which can be used for rapid screening of such peptides prior to their synthesis. METHODS: In the present study, support vector machine (SVM)-based models have been developed for predicting and designing highly effective cell penetrating peptides. Various features like amino acid composition, dipeptide composition, binary profile of patterns, and physicochemical properties have been used as input features. The main dataset used in this study consists of 708 peptides. In addition, we have identified various motifs in cell penetrating peptides, and used these motifs for developing a hybrid prediction model. Performance of our method was evaluated on an independent dataset and also compared with that of the existing methods. RESULTS: In cell penetrating peptides, certain residues (e.g. Arg, Lys, Pro, Trp, Leu, and Ala) are preferred at specific locations. Thus, it was possible to discriminate cell-penetrating peptides from non-cell penetrating peptides based on amino acid composition. All models were evaluated using five-fold cross-validation technique. We have achieved a maximum accuracy of 97.40% using the hybrid model that combines motif information and binary profile of the peptides. On independent dataset, we achieved maximum accuracy of 81.31% with MCC of 0.63. CONCLUSION: The present study demonstrates that features like amino acid composition, binary profile of patterns and motifs, can be used to train an SVM classifier that can predict cell penetrating peptides with higher accuracy. The hybrid model described in this study achieved more accuracy than the previous methods and thus may complement the existing methods. Based on the above study, a user-friendly web server CellPPD has been developed to help the biologists, where a user can predict and design CPPs with much ease. CellPPD web server is freely accessible at http://crdd.osdd.net/raghava/cellppd/.


Subject(s)
Cell-Penetrating Peptides/pharmacology , Protein Engineering/methods , Amino Acid Motifs , Cell-Penetrating Peptides/chemical synthesis , Cell-Penetrating Peptides/chemistry , Computer Simulation , Databases, Protein , Drug Delivery Systems , Oligonucleotides/genetics , Protein Structure, Tertiary , ROC Curve , Reproducibility of Results , Sequence Analysis, Protein , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL