RESUMEN
The metabolomic profile of aging is complex. Here, we analyse 325 nuclear magnetic resonance (NMR) biomarkers from 250,341 UK Biobank participants, identifying 54 representative aging-related biomarkers associated with all-cause mortality. We conduct genome-wide association studies (GWAS) for these 325 biomarkers using whole-genome sequencing (WGS) data from 95,372 individuals and perform multivariable Mendelian randomization (MVMR) analyses, discovering 439 candidate "biomarker - disease" causal pairs at the nominal significance level. We develop a metabolomic aging score that outperforms other aging metrics in predicting short-term mortality risk and exhibits strong potential for discriminating aging-accelerated populations and improving disease risk prediction. A longitudinal analysis of 13,263 individuals enables us to calculate a metabolomic aging rate which provides more refined aging assessments and to identify candidate anti-aging and pro-aging NMR biomarkers. Taken together, our study has presented a comprehensive aging-related metabolomic profile and highlighted its potential for personalized aging monitoring and early disease intervention.
Asunto(s)
Envejecimiento , Bancos de Muestras Biológicas , Biomarcadores , Estudio de Asociación del Genoma Completo , Metabolómica , Humanos , Envejecimiento/genética , Envejecimiento/metabolismo , Reino Unido/epidemiología , Masculino , Femenino , Metabolómica/métodos , Anciano , Persona de Mediana Edad , Biomarcadores/metabolismo , Análisis de la Aleatorización Mendeliana , Espectroscopía de Resonancia Magnética , Metaboloma , Estudios Longitudinales , Secuenciación Completa del Genoma , Adulto , Anciano de 80 o más Años , Biobanco del Reino UnidoRESUMEN
Background: Smoking is a widespread behavior, while the relationship between smoking and various diseases remains a topic of debate. Objective: We conducted analysis to further examine the identified associations and assess potential causal relationships. Methods: We utilized seven single nucleotide polymorphisms (SNPs) known to be linked to smoking extracting genotype data from the UK Biobank, a large-scale biomedical repository encompassing comprehensive health-related and genetic information of European descent. Phenome-wide association study (PheWAS) analysis was conducted to map the association of genetically predicted smoking status with 1,549 phenotypes. The associations identified in the PheWAS were then meticulously examined through two-sample Mendelian randomization (MR) analysis, utilizing data from the UK Biobank (n = 487,365) and the Sequencing Consortium of Alcohol and Nicotine Use (GSCAN) (n = 337,334). This approach allowed us to comprehensively characterize the links between smoking and disease patterns. Results: The PheWAS analysis produced 34 phenotypes that demonstrated significant associations with smoking (P = 0.05/1460). Importantly, sickle cell anemia and type 2 diabetes exhibited the most significant SNPs (both 85.71% significant SNPs). Furthermore, the MR analyses provided compelling evidence supporting causal associations between smoking and the risk of following diseases: obstructive chronic bronchitis (IVW: Beta = 0.48, 95% confidence interval (CI) 0.36-0.61, P = 1.62×10-13), cancer of the bronchus (IVW: Beta = 0.92, 95% CI 0.68-1.17, P = 2.02×10-13), peripheral vascular disease (IVW: Beta = 1.09, 95% CI 0.71-1.46, P = 1.63×10-8), emphysema (IVW: Beta = 1.63, 95% CI 0.90-2.36, P = 1.29×10-5), pneumococcal pneumonia (IVW: Beta = 0.30, 95% CI 0.11-0.49, P = 1.60×10-3), chronic airway obstruction (IVW: Beta = 0.83, 95% CI 0.30-1.36, P = 2.00×10-3) and type 2 diabetes (IVW: Beta = 0.53, 95% CI 0.16-0.90, P = 5.08×10-3). Conclusion: This study affirms causal relationships between smoking and obstructive chronic bronchitis, cancer of the bronchus, peripheral vascular disease, emphysema, pneumococcal pneumonia, chronic airway obstruction, type 2 diabetes, in the European population. These findings highlight the broad health impacts of smoking and support smoking cessation efforts.
RESUMEN
Sarcopenia presenting a critical challenge in population-aging healthcare. The elucidation of the interplay between brain structure and sarcopenia necessitates further research. The aim of this study is to explore the casual association between brain structure and sarcopenia. Linkage disequilibrium score regression (LDSC) was conducted to estimate the genetic correlations; MR was then performed to explore the causal relationship between Brain imaging-derived phenotypes (BIDPs) and three sarcopenia-related traits: handgrip strength, walking pace, and appendicular lean mass (ALM). The main analyses were conducted using the inverse-variance weighted method. Moreover, weighted median and MR-Egger were conducted as sensitivity analyses. Genetic association between 6.41% of BIDPs and ALM was observed, and 4.68% of BIDPs exhibited causal MR association with handgrip strength, 2.11% of BIDPs were causally associated with walking pace, and 2.04% of BIDPs showed causal association with ALM. Volume of ventromedial hypothalamus was associated with increased odds of handgrip strength (OR: 1.18, 95% CI: 1.02 to 1.37) and ALM (OR: 1.05, 95% CI: 1.01 to 1.09). Mean thickness of G-pariet-inf-Angular was associated with decreased odds of handgrip strength (OR: 0.83, 95% CI: 0.70 to 0.97) and walking pace (OR: 0.97, 95% CI: 0.93 to 0.99). As part of the brain structure forward causally influences sarcopenia, which may provide new perspectives for the prevention of sarcopenia and offer valuable insights for further research on the brain-muscle axis.
Asunto(s)
Encéfalo , Análisis de la Aleatorización Mendeliana , Sarcopenia , Humanos , Sarcopenia/genética , Sarcopenia/patología , Encéfalo/diagnóstico por imagen , Encéfalo/patología , Fuerza de la Mano/fisiología , Masculino , Femenino , FenotipoRESUMEN
Substantial evidence shown that the age at onset (AAO) of Parkinson's disease (PD) is a major determinant of clinical heterogeneity. However, the mechanisms underlying heterogeneity in the AAO remain unclear. To investigate the risk factors with the AAO of PD, a total of 3156 patients with PD from the UK Biobank were included in this study. We evaluated the effects of polygenic risk scores (PRS), nongenetic risk factors, and their interaction on the AAO using Mann-Whitney U tests and regression analyses. We further identified the genes interacting with nongenetic risk factors for the AAO using genome-wide environment interaction studies. We newly found physical activity (P < 0.0001) was positively associated with AAO and excessive daytime sleepiness (P < 0.0001) was negatively associated with AAO, and reproduced the positive associations of smoking and non-steroidal anti-inflammatory drug intake and the negative association of family history with AAO. In the dose-dependent analyses, smoking duration (P = 1.95 × 10-6), coffee consumption (P = 0.0150), and tea consumption (P = 0.0008) were positively associated with AAO. Individuals with higher PRS had younger AAO (P = 3.91 × 10-5). In addition, we observed a significant interaction between the PRS and smoking for AAO (P = 0.0316). Specifically, several genes, including ANGPT1 (P = 7.17 × 10-7) and PLEKHA6 (P = 4.87 × 10-6), may influence the positive relationship between smoking and AAO. Our data suggests that genetic and nongenetic risk factors are associated with the AAO of PD and that there is an interaction between the two.
RESUMEN
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Asunto(s)
Bases de Datos Factuales , Variación Genética , Genoma Humano , Programas Informáticos , Humanos , Bases de Datos Genéticas , Variaciones en el Número de Copia de ADN , Nucleótidos , Estudio de Asociación del Genoma CompletoRESUMEN
BACKGROUND: Genomic variants outside of the canonical splicing site (±2) may generate abnormal mRNA splicing, which are defined as non-canonical splicing variants (NCSVs). However, the clinical interpretation of NCSVs in neurodevelopmental disorders (NDDs) is largely unknown. METHODS: We investigated the contribution of NCSVs to NDDs from 345,787 de novo variants (DNVs) in 47,574 patients with NDDs. We performed functional enrichment and protein-protein interaction analysis to assess the association between genes carrying prioritised NCSVs and NDDs. Minigene was used to validate the impact of NCSVs on mRNA splicing. FINDINGS: We observed significantly more NCSVs (p = 0.02, odds ratio [OR] = 2.05) among patients with NDD than in controls. Both canonical splicing variants (CSVs) and NCSVs contributed to an equal proportion of patients with NDD (0.76% vs. 0.82%). The candidate genes carrying NCSVs were associated with glutamatergic synapse and chromatin remodelling. Minigene successfully validated 59 of 79 (74.68%) NCSVs that led to abnormal splicing in 40 candidate genes, and 9 of the genes (ARID1B, KAT6B, TCF4, SMARCA2, SHANK3, PDHA1, WDR45, SCN2A, SYNGAP1) harboured recurrent NCSVs with the same variant present in more than two unrelated patients with NDD. Moreover, 36 of 59 (61.02%) NCSVs are novel clinically relevant variants, including 34 unreported and 2 clinically conflicting interpretations or of uncertain significance NCSVs in the ClinVar database. INTERPRETATION: This study highlights the common pathology and clinical importance of NCSVs in unsolved patients with NDD. FUNDING: The present study was funded by grants from the National Natural Science Foundation of China, China Postdoctoral Science Foundation, the Hunan Youth Science and Technology Innovation Talent Project, the Provincial Natural Science Foundation of Hunan, The Scientific Research Program of FuRong laboratory, and the Natural Science Project of the University of Anhui Province.
Asunto(s)
Trastornos del Neurodesarrollo , Adolescente , Humanos , Mutación , Trastornos del Neurodesarrollo/genética , Empalme del ARN/genética , Exones , ARN Mensajero , Histona Acetiltransferasas/genética , Proteínas Portadoras/genéticaRESUMEN
BACKGROUND AND OBJECTIVES: Pulmonary embolism (PE) is a complex disease with high mortality and morbidity rate, leading to increasing society burden. However, current diagnosis is solely based on symptoms and laboratory data despite its complex pathology, which easily leads to misdiagnosis and missed diagnosis by inexperienced doctors. Especially, CT pulmonary angiography, the gold standard method, is not widely available. In this study, we aim to establish a rapid and accurate screening model for pulmonary embolism using machine learning technology. Importantly, data required for disease prediction are easily accessed, including routine laboratory data and medical record information of patients. METHODS: We extracted features from patients' routine laboratory results and medical records, including blood routine, biochemical group, blood coagulation routine and other test results, as well as symptoms and medical history information. Samples with a feature loss rate greater than 0.8 were deleted from the original database. Data from 4723 cases were retained, 231 of which were positive for pulmonary embolism. 50 features were retained through the positive and negative statistical hypothesis testing which was used to build the predictive model. In order to avoid identification as majority-class samples caused by the imbalance of sample proportion, we used the method of Synthetic Minority Oversampling Technique (SMOTE) to increase the amount of information on minority samples. Five typical machine learning algorithms were used to model the screening of pulmonary embolism, including Support Vector Machines, Logistic Regression, Random Forest, XGBoost, and Back Propagation Neural Networks. To evaluate model performance, sensitivity, specificity and AUC curve were analyzed as the main evaluation indicators. Furthermore, a baseline model was established using the characteristics of the pulmonary embolism guidelines as a comparison model. RESULTS: We found that XGBoost showed better performance compared to other models, with the highest sensitivity and specificity (0.99 and 0.99, respectively). Moreover, it showed significant improvement in performance compared to the baseline model (sensitivity and specificity were 0.76 and 0.76 respectively). More important, our model showed low missed diagnosis rate (0.46) and high AUC value (0.992). Finally, the calculation time of our model is only about 0.05 s to obtain the possibility of pulmonary embolism. CONCLUSIONS: In this study, five machine learning classification models were established to assess the likelihood of patients suffering from pulmonary embolism, and the XGBoost model most significantly improved the precision, sensitivity, and AUC for pulmonary embolism screening. Collectively, we have established an AI-based model to accurately predict pulmonary embolism at early stage.
Asunto(s)
Algoritmos , Embolia Pulmonar , Humanos , Sensibilidad y Especificidad , Registros Electrónicos de Salud , Aprendizaje Automático , Embolia Pulmonar/diagnósticoRESUMEN
STUDY QUESTION: Can potential mechanisms involved in the likely concurrence of diminished ovarian reserve (DOR) and miscarriage be identified using genetic data? SUMMARY ANSWER: Concurrence between ovarian reserve and spontaneous miscarriage was observed, and may be attributed to shared genetic risk loci enriched in antigen processing and presentation and autoimmune disease pathways. WHAT IS KNOWN ALREADY: Previous studies have shown that lower serum anti-Müllerian hormone (AMH) levels are associated with increased risk of embryo aneuploidy and spontaneous miscarriage, although findings have not been consistent across all studies. A recent meta-analysis suggested that the association between DOR and miscarriage may not be causal, but rather a result of shared underlying causes such as clinical conditions or past exposure. Motivated by this hypothesis, we conducted the present analysis to explore the concurrence between DOR and miscarriage, and to investigate potential mechanisms using genetic data. STUDY DESIGN, SIZE, DURATION: Three data sources were used in the study: the clinical IVF data were retrospectively collected from an academically affiliated Reproductive Medicine Center (17 786 cycles included); the epidemiological data from the UK Biobank (UKB), which is a large-scale, population-based, prospective cohort study (35 316 white women included), were analyzed; and individual-level genotype data from the UKB were extracted for further analysis. PARTICIPANTS/MATERIALS, SETTING, METHODS: There were three modules of analysis. First, clinical IVF data were used to test the association between ovarian reserve biomarkers and the subsequent early spontaneous miscarriage risk. Second, the UKB data were used to test the association of spontaneous miscarriage history and early menopause. Third, individual-level genotype data from the UKB were analyzed to identify specific pleiotropic genes which affect the development of miscarriage and menopause. MAIN RESULTS AND THE ROLE OF CHANCE: In the analysis of clinical IVF data, the risk of early spontaneous miscarriage was 1.57 times higher in the group with AMH < 1.1 ng/ml group (P < 0.001), 1.62 times for antral follicular count <5 (P < 0.001), and 1.39 times for FSH ≥10 mIU/ml (P < 0.001) in comparison with normal ovarian reserve groups. In the analysis of UKB data, participants with a history of three or more miscarriages had a one-third higher risk of experiencing early menopause (odds ratio: 1.30, 95% CI 1.13-1.49, P < 0.001), compared with participants without spontaneous miscarriage history. We identified 158 shared genetic risk loci that affect both miscarriage and menopause, which enrichment analysis showed were involved in antigen processing and presentation and autoimmune disease pathways. LIMITATIONS, REASONS FOR CAUTION: The analyses of the UKB data were restricted to participants of European ancestry, as 94.6% of the cohort were of white ethnicity. Further studies are needed in non-white populations. Additionally, maternal age at the time of spontaneous miscarriage was not available in the UKB cohort, therefore we adjusted for age at baseline assessment in the models instead. It is known that miscarriage rate in IVF is higher compared to natural conception, highlighting a need for caution when generalizing our findings from the IVF cohort to the general population. WIDER IMPLICATIONS OF THE FINDINGS: Our findings have implications for IVF clinicians in terms of patient counseling on the prognosis of IVF treatment, as well as for genetic counseling regarding miscarriage. Our results highlight the importance of further research on the shared genetic architecture and common pathophysiological basis of DOR and miscarriage, which may lead to new therapeutic opportunities. STUDY FUNDING/COMPETING INTEREST(S): This work was supported by the Hunan Youth Science and Technology Innovation Talent Project (2020RC3060), the International Postdoctoral Exchange Fellowship Program (Talent-Introduction Program, YJ20220220), the fellowship of China Postdoctoral Science Foundation (2022M723564), and the Natural Science Foundation of Hunan Province, China (2023JJ41016). This work has been accepted for poster presentation at the 39th Annual Meeting of ESHRE, Copenhagen, Denmark, 25-28 June 2023 (Poster number: P-477). The authors declare no conflict of interest. TRIAL REGISTRATION NUMBER: N/A.
Asunto(s)
Aborto Espontáneo , Enfermedades Autoinmunes , Menopausia Prematura , Enfermedades del Ovario , Reserva Ovárica , Embarazo , Humanos , Femenino , Adolescente , Aborto Espontáneo/epidemiología , Estudios Retrospectivos , Estudios Prospectivos , Hormona Antimülleriana , Fertilización In Vitro/métodosRESUMEN
Background: Common polygenic risk and de novo variants (DNVs) capture a small proportion of autism spectrum disorder (ASD) liability, and ASD phenotypic heterogeneity remains difficult to explain. Integrating multiple genetic factors contribute to clarifying the risk and clinical presentation of ASD. Methods: In our study, we investigated the individual and combined effects of polygenic risk, damaging DNVs (including those in ASD risk genes), and sex among 2,591 ASD simplex families in the Simons Simplex Collection. We also explored the interactions among these factors, along with the broad autism phenotypes of ASD probands and their unaffected siblings. Finally, we combined the effects of polygenic risk, damaging DNVs in ASD risk genes, and sex to explain the total liability of ASD phenotypic spectrum. Results: Our findings revealed that both polygenic risk and damaging DNVs contribute to an increased risk for ASD, with females exhibiting higher genetic burdens than males. ASD probands that carry damaging DNVs in ASD risk genes showed reduced polygenic risk. The effects of polygenic risk and damaging DNVs on autism broad phenotypes were inconsistent; probands with higher polygenic risk exhibited improvement in some behaviors, such as adaptive/cognitive behaviors, while those with damaging DNVs exhibited more severe phenotypes. Siblings with higher polygenic risk and damaging DNVs tended to have higher scores on broader autism phenotypes. Females exhibited more severe cognitive and behavioral problems compared to males among both ASD probands and siblings. The combination of polygenic risk, damaging DNVs in ASD risk genes, and sex explained 1-4% of the total liability of adaptive/cognitive behavior measurements. Conclusion: Our study revealed that the risk for ASD and the autism broad phenotypes likely arises from a combination of common polygenic risk, damaging DNVs (including those in ASD risk genes), and sex.
RESUMEN
Huntington's disease (HD) is an autosomal dominant neurodegenerative disease. It is caused by the expansion of the CAG trinucleotide repeat sequence in the HTT gene. HD mainly manifests as involuntary dance-like movements and severe mental disorders. As it progresses, patients lose the ability to speak, think, and even swallow. Although the pathogenesis is unclear, studies have found that mitochondrial dysfunctions occupy an important position in the pathogenesis of HD. Based on the latest research advances, this review sorts out and discusses the role of mitochondrial dysfunction on HD in terms of bioenergetics, abnormal autophagy, and abnormal mitochondrial membranes. This review provides researchers with a more complete perspective on the mechanisms underlying the relationship between mitochondrial dysregulation and HD.
Asunto(s)
Artrogriposis , Enfermedad de Huntington , Enfermedades Neurodegenerativas , Humanos , Enfermedad de Huntington/genética , Enfermedad de Huntington/patología , Mitocondrias/genética , Mitocondrias/patología , Membranas Mitocondriales/patologíaRESUMEN
Genetic factors, particularly, de novo variants (DNV), and an environment factor, exposure to pregnancy-induced hypertension (PIH), were reported to be associated with risk of autism spectrum disorder (ASD); however, how they jointly affect the severity of ASD symptom is unclear. We assessed the severity of core ASD symptoms affected by functional de novo variants or PIH. We selected phenotype data from Simon's Simplex Collection database, used genotypes from previous studies, and created linear regression models. We found that ASD patients carrying DNV with PIH exposure had increased adaptive and cognitive ability, decreased social problems, and enhanced repetitive behaviors; however, there was no difference in patients without DNV between those with or without PIH exposure. In addition, the DNV genes carried by patients exposed to PIH were enriched in ubiquitin-dependent proteolytic processes, highlighting how candidate genes in pathways and environments interact. The results indicate the joint contribution of DNV and PIH to ASD.
RESUMEN
Transcriptomics studies have yielded great insights into disease processes by detecting differentially expressed genes (DEGs). In this study, due to the high heritability of Parkinson's disease (PD), we performed bioinformatics analyses on nine transcriptomic datasets regarding substantia nigra from Gene Expression Omnibus database, including seven microarray datasets and two next-generation sequencing datasets. As a result, between age-matched PD patients and normal control, we identified 630 DEGs, of which 22 hub DEGs involved in PD or ferroptosis were found to be associated with each other at the transcriptional level and protein-protein interaction network, suggesting their high correlations among these hub genes. Moreover, 16 DEGs were singled out due to their comparable AUC (>0.6) in random forest classifiers, including seven PD-related genes (MAP4K4, LRP10, UCHL1, PAM, RIT2, SNCA, GCH1) and nine ferroptosis-related genes (GCH1, DDIT4, RGS4, MAPK9, CAV1, RELA, DUSP1, ATP6V1G2, ATF4 and ISCU). Furthermore, to probe the potential of those hub genes in predicting the PD progression and survival, we constructed a Cox model featured by an eight-gene signature, including four PD-related genes (SNCA, UCHL1, LRP10, and GCH1) and four ferroptosis-related genes (DDIT4, RGS4, RELA, and CAV1), and validated it successful in an independent dataset, indicating that it would be an effective tool for clinical research to predict PD progression. In conclusion, ferroptosis-related DEGs identified in this study were closely correlated with the known PD-related genes, revealing the involvement of ferroptosis in the development of PD. This study presented the potential of several ferroptosis-related genes as novel clinical biomarkers for PD.
RESUMEN
A proportion of previously defined benign variants or variants of uncertain significance in humans, which are challenging to identify, may induce an abnormal splicing process. An increasing number of methods have been developed to predict splicing variants, but their performance has not been completely evaluated using independent benchmarks. Here, we manually sourced â¼50 000 positive/negative splicing variants from > 8000 studies and selected the independent splicing variants to evaluate the performance of prediction methods. These methods showed different performances in recognizing splicing variants in donor and acceptor regions, reminiscent of different weight coefficient applications to predict novel splicing variants. Of these methods, 66.67% exhibited higher specificities than sensitivities, suggesting that more moderate cut-off values are necessary to distinguish splicing variants. Moreover, the high correlation and consistent prediction ratio validated the feasibility of integration of the splicing prediction method in identifying splicing variants. We developed a splicing analytics platform called SPCards, which curates splicing variants from publications and predicts splicing scores of variants in genomes. SPCards also offers variant-level and gene-level annotation information, including allele frequency, non-synonymous prediction and comprehensive functional information. SPCards is suitable for high-throughput genetic identification of splicing variants, particularly those located in non-canonical splicing regions.
Asunto(s)
Empalme del ARN , Humanos , Empalme del ARN/genética , Frecuencia de los Genes , Anotación de Secuencia MolecularRESUMEN
Background: Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases and the regulation of gene expression. Long-read sequencing (LRS) offers a potential solution to genome-wide STR analysis. However, characterizing STRs in human genomes using LRS on a large population scale has not been reported. Methods: We conducted the large LRS-based STR analysis in 193 unrelated samples of the Chinese population and performed genome-wide profiling of STR variation in the human genome. The repeat dynamic index (RDI) was introduced to evaluate the variability of STR. We sourced the expression data from the Genotype-Tissue Expression to explore the tissue specificity of highly variable STRs related genes across tissues. Enrichment analyses were also conducted to identify potential functional roles of the high variable STRs. Results: This study reports the large-scale analysis of human STR variation by LRS and offers a reference STR database based on the LRS dataset. We found that the disease-associated STRs (dSTRs) and STRs associated with the expression of nearby genes (eSTRs) were highly variable in the general population. Moreover, tissue-specific expression analysis showed that those highly variable STRs related genes presented the highest expression level in brain tissues, and enrichment pathways analysis found those STRs are involved in synaptic function-related pathways. Conclusion: Our study profiled the genome-wide landscape of STR using LRS and highlighted the highly variable STRs in the human genome, which provide a valuable resource for studying the role of STRs in human disease and complex traits.
RESUMEN
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from catalogue of somatic mutations in cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait loci (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the curve (AUC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUC: 0.4984-0.7131), common regulatory variants from curated eQTL data (AUC: 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUC: 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection, guiding the development of new techniques in interpreting non-coding variants.
RESUMEN
Increasing evidences suggest that mitochondrial dysfunction is implicated in diseases and aging, and whole-genome sequencing (WGS) is the most unbiased method in analyzing the mitochondrial genome (mtDNA). However, the genetic landscape of mtDNA in the Chinese population has not been fully examined. Here, we described the genetic landscape of mtDNA using WGS data from Chinese individuals (n = 3241). We identified 3892 mtDNA variants, of which 3349 (86%) were rare variants. Interestingly, we observed a trend toward extreme heterogeneity of mtDNA variants. Our study observed a distinct purifying selection on mtDNA, which inhibits the accumulation of harmful heteroplasmies at the individual level: (1) mitochondrial dN/dS ratios were much <1; (2) the dN/dS ratio of heteroplasmies was higher than homoplasmies; (3) heteroplasmies had more indels and predicted deleterious variants than homoplasmies. Furthermore, we found that haplogroup M (20.27%) and D (20.15%) had the highest frequencies in the Chinese population, followed by B (18.51%) and F (16.45%). The number of variants per individual differed across haplogroup groups, with a higher number of homoplasmies for the M lineage. Meanwhile, mtDNA copy number was negatively correlated with age but positively correlated with the female sex. Finally, we developed an mtDNA variation database of Chinese populations called MTCards (http://genemed.tech/mtcards/) to facilitate the query of mtDNA variants in this study. In summary, these findings contribute to different aspects of understanding mtDNA, providing a better understanding of the genetic basis of mitochondrial-related diseases.
Asunto(s)
Genoma Mitocondrial , ADN Mitocondrial/genética , Femenino , Genoma Humano/genética , Genoma Mitocondrial/genética , Humanos , Mitocondrias/genética , Secuenciación Completa del GenomaRESUMEN
The clinical similarity among different neuropsychiatric disorders (NPDs) suggested a shared genetic basis. We catalogued 23,109 coding de novo mutations (DNMs) from 6511 patients with autism spectrum disorder (ASD), 4,293 undiagnosed developmental disorder (UDD), 933 epileptic encephalopathy (EE), 1022 intellectual disability (ID), 1094 schizophrenia (SCZ), and 3391 controls. We evaluated that putative functional DNMs contribute to 38.11%, 34.40%, 33.31%, 10.98% and 6.91% of patients with ID, EE, UDD, ASD and SCZ, respectively. Consistent with phenotype similarity and heterogeneity in different NPDs, they show different degree of genetic association. Cross-disorder analysis of DNMs prioritized 321 candidate genes (FDR < 0.05) and showed that genes shared in more disorders were more likely to exhibited specific expression pattern, functional pathway, genetic convergence, and genetic intolerance.
Asunto(s)
Trastorno del Espectro Autista , Discapacidad Intelectual , Esquizofrenia , Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad , Humanos , Discapacidad Intelectual/genética , Mutación , FenotipoRESUMEN
Background: Recent years have witnessed an increasing number of studies indicating an essential role of the lysosomal dysfunction in Parkinson's disease (PD) at the genetic, biochemical, and cellular pathway levels. In this study, we investigated the association between rare variants in lysosomal storage disorder (LSD) genes and Chinese mainland PD. Methods: We explored the association between rare variants of 69 LSD genes and PD in 3,879 patients and 2,931 controls from Parkinson's Disease & Movement Disorders Multicenter Database and Collaborative Network in China (PD-MDCNC) using next-generation sequencing, which were analyzed by using the optimized sequence kernel association test. Results: We identified the significant burden of rare putative LSD gene variants in Chinese mainland patients with PD. This association was robust in familial or sporadic early-onset patients after excluding the GBA variants but not in sporadic late-onset patients. The burden analysis of variant sets in genes of LSD subgroups revealed a suggestive significant association between variant sets in genes of sphingolipidosis deficiency disorders and familial or sporadic early-onset patients. In contrast, variant sets in genes of sphingolipidoses, mucopolysaccharidoses, and post-translational modification defect disorders were suggestively associated with sporadic late-onset patients. Then, SMPD1 and other four novel genes (i.e., GUSB, CLN6, PPT1, and SCARB2) were suggestively associated with sporadic early-onset or familial patients, whereas GALNS and NAGA were suggestively associated with late-onset patients. Conclusion: Our findings supported the association between LSD genes and PD and revealed several novel risk genes in Chinese mainland patients with PD, which confirmed the importance of lysosomal mechanisms in PD pathogenesis. Moreover, we identified the genetic heterogeneity in early-onset and late-onset of patients with PD, which may provide valuable suggestions for the treatment.
RESUMEN
Hearing loss (HL) is one of the most common disabilities in the world. In industrialized countries, HL occurs in 1-2/1,000 newborns, and approximately 60% of HL is caused by genetic factors. Next generation sequencing (NGS) has been widely used to identify many candidate genes and variants in patients with HL, but the data are scattered in multitudinous studies. It is a challenge for scientists, clinicians, and biologists to easily obtain and analyze HL genes and variant data from these studies. Thus, we developed a one-stop database of HL-related genes and variants, Gene4HL (http://www.genemed.tech/gene4hl/), making it easy to catalog, search, browse and analyze the genetic data. Gene4HL integrates the detailed genetic and clinical data of 326 HL-related genes from 1,608 published studies, along with 62 popular genetic data sources to provide comprehensive knowledge of candidate genes and variants associated with HL. Additionally, Gene4HL supports the users to analyze their own genetic engineering network data, performs comprehensive annotation, and prioritizes candidate genes and variations using custom parameters. Thus, Gene4HL can help users explain the function of HL genes and the clinical significance of variants by correlating the genotypes and phenotypes in humans.
RESUMEN
Parkinson's disease (PD) is a complex neurodegenerative disorder with a strong genetic component. A growing number of variants and genes have been reported to be associated with PD; however, there is no database that integrate different type of genetic data, and support analyzing of PD-associated genes (PAGs). By systematic review and curation of multiple lines of public studies, we integrate multiple layers of genetic data (rare variants and copy-number variants identified from patients with PD, associated variants identified from genome-wide association studies, differentially expressed genes, and differential DNA methylation genes) and age at onset in PD. We integrated five layers of genetic data (8302 terms) with different levels of evidences from more than 3,000 studies and prioritized 124 PAGs with strong or suggestive evidences. These PAGs were identified to be significantly interacted with each other and formed an interconnected functional network enriched in several functional pathways involved in PD, suggesting these genes may contribute to the pathogenesis of PD. Furthermore, we identified 10 genes were associated with a juvenile-onset (age ≤ 30 years), 11 genes were associated with an early-onset (age of 30-50 years), whereas another 10 genes were associated with a late-onset (age > 50 years). Notably, the AAOs of patients with loss of function variants in five genes were significantly lower than that of patients with deleterious missense variants, while patients with VPS13C (P = 0.01) was opposite. Finally, we developed an online database named Gene4PD (http://genemed.tech/gene4pd) which integrated published genetic data in PD, the PAGs, and 63 popular genomic data sources, as well as an online pipeline for prioritize risk variants in PD. In conclusion, Gene4PD provides researchers and clinicians comprehensive genetic knowledge and analytic platform for PD, and would also improve the understanding of pathogenesis in PD.