Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Hum Mol Genet ; 33(7): 624-635, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38129112

RESUMO

Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying "silver standard" genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Humanos , Transcriptoma/genética , Estudo de Associação Genômica Ampla/métodos , Simulação por Computador , Locos de Características Quantitativas/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença
2.
Am J Hum Genet ; 109(3): 446-456, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35216679

RESUMO

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Probabilidade
3.
J Proteome Res ; 23(5): 1593-1602, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38626392

RESUMO

With the rapid expansion of sequencing of genomes, the functional annotation of proteins becomes a bottleneck in understanding proteomes. The Chromosome-centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome and find functional annotations for them. However, until now there are still 1137 identified human proteins without functional annotation, called uPE1 proteins. Sequence alignment was insufficient to predict their functions, and the crystal structures of most proteins were unavailable. In this study, we demonstrated a new functional annotation strategy, AlphaFun, based on structural alignment using deep-learning-predicted protein structures. Using this strategy, we functionally annotated 99% of the human proteome, including the uPE1 proteins and missing proteins, which have not been identified yet. The accuracy of the functional annotations was validated using the known-function proteins. The uPE1 proteins shared similar functions to the known-function PE1 proteins and tend to express only in very limited tissues. They are evolutionally young genes and thus should conduct functions only in specific tissues and conditions, limiting their occurrence in commonly studied biological models. Such functional annotations provide hints for functional investigations on the uPE1 proteins. This proteome-wide-scale functional annotation strategy is also applicable to any other species.


Assuntos
Anotação de Sequência Molecular , Proteoma , Humanos , Proteoma/genética , Proteoma/metabolismo , Proteoma/análise , Proteoma/química , Aprendizado Profundo , Alinhamento de Sequência , Genoma Humano , Proteômica/métodos , Bases de Dados de Proteínas
4.
Am J Hum Genet ; 104(5): 896-913, 2019 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-31051114

RESUMO

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Genes/genética , Doenças Genéticas Inatas/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Humanos , Anotação de Sequência Molecular , Fenótipo , Software
5.
Int J Mol Sci ; 22(18)2021 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-34576183

RESUMO

Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.


Assuntos
Aprendizado Profundo , Algoritmos , Humanos , Plasmodium falciparum/patogenicidade , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo
6.
Proteins ; 88(1): 15-30, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31228283

RESUMO

Sequence based DNA-binding protein (DBP) prediction is a widely studied biological problem. Sliding windows on position specific substitution matrices (PSSMs) rows predict DNA-binding residues well on known DBPs but the same models cannot be applied to unequally sized protein sequences. PSSM summaries representing column averages and their amino-acid wise versions have been effectively used for the task, but it remains unclear if these features carry all the PSSM's predictive power, traditionally harnessed for binding site predictions. Here we evaluate if PSSMs scaled up to a fixed size by zero-vector padding (pPSSM) could perform better than the summary based features on similar models. Using multilayer perceptron (MLP) and deep convolutional neural network (CNN), we found that (a) Summary features work well for single-genome (human-only) data but are outperformed by pPSSM for diverse PDB-derived data sets, suggesting greater summary-level redundancy in the former, (b) even when summary features work comparably well with pPSSM, a consensus on the two outperforms both of them (c) CNN models comprehensively outperform their corresponding MLP models and (d) actual predicted scores from different models depend on the choice of input feature sets used whereas overall performance levels are model-dependent in which CNN leads the accuracy.


Assuntos
Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Redes Neurais de Computação , Aminoácidos/química , Aminoácidos/metabolismo , Animais , Arabidopsis/química , Arabidopsis/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Sítios de Ligação , DNA/metabolismo , Humanos , Camundongos , Modelos Biológicos , Conformação Proteica
7.
BMC Genomics ; 21(1): 783, 2020 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-33176675

RESUMO

BACKGROUND: Specific adaptive features including disease resistance and growth abilities in harsh environments are attributed to indigenous cattle breeds of Benin, but these breeds are endangered due to crossbreeding. So far, there is a lack of systematic trait recording, being the basis for breed characterizations, and for structured breeding program designs aiming on conservation. Bridging this gap, own phenotyping for morphological traits considered measurements for height at withers (HAW), sacrum height (SH), heart girth (HG), hip width (HW), body length (BL) and ear length (EL), including 449 cattle from the four indigenous Benin breeds Lagune, Somba, Borgou and Pabli. In order to utilize recent genomic tools for breed characterizations and genetic evaluations, phenotypes for novel traits were merged with high-density SNP marker data. Multi-breed genetic parameter estimations and genome-wide association studies (GWAS) for the six morphometric traits were carried out. Continuatively, we aimed on inferring genomic regions and functional loci potentially associated with conformation, carcass and adaptive traits. RESULTS: SNP-based heritability estimates for the morphometric traits ranged between 0.46 ± 0.14 (HG) and 0.74 ± 0.13 (HW). Phenotypic and genetic correlations ranged from 0.25 ± 0.05 (HW-BL) to 0.89 ± 0.01 (HAW-SH), and from 0.14 ± 0.10 (HW-BL) to 0.85 ± 0.02 (HAW-SH), respectively. Three genome-wide and 25 chromosome-wide significant SNP positioned on different chromosomes were detected, located in very close chromosomal distance (±25 kb) to 15 genes (or located within the genes). The genes PIK3R6 and PIK3R1 showed direct functional associations with height and body size. We inferred the potential candidate genes VEPH1, CNTNAP5, GYPC for conformation, growth and carcass traits including body weight and body fat deposition. According to their functional annotations, detected potential candidate genes were associated with stress or immune response (genes PTAFR, PBRM1, ADAMTS12) and with feed efficiency (genes MEGF11 SLC16A4, CCDC117). CONCLUSIONS: Accurate measurements contributed to large SNP heritabilities for some morphological traits, even for a small mixed-breed sample size. Multi-breed GWAS detected different loci associated with conformation or carcass traits. The identified potential candidate genes for immune response or feed efficiency indicators reflect the evolutionary development and adaptability features of the breeds.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Genoma , Genômica , Fenótipo
8.
Am J Hum Genet ; 101(3): 340-352, 2017 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-28844485

RESUMO

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Metabolômica , Anotação de Sequência Molecular/métodos , Polimorfismo de Nucleotídeo Único , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/metabolismo , Doenças Cardiovasculares/patologia , Simulação por Computador , Predisposição Genética para Doença , Humanos , Lipídeos/análise , Proteínas de Transporte de Cátions Orgânicos/genética , Fenótipo
9.
BMC Bioinformatics ; 20(1): 620, 2019 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-31791231

RESUMO

BACKGROUND: Cancer arises through accumulation of somatically acquired genetic mutations. An important question is to delineate the temporal order of somatic mutations during carcinogenesis, which contributes to better understanding of cancer biology and facilitates identification of new therapeutic targets. Although a number of statistical and computational methods have been proposed to estimate the temporal order of mutations, they do not account for the differences in the functional impacts of mutations and thus are likely to be obscured by the presence of passenger mutations that do not contribute to cancer progression. In addition, many methods infer the order of mutations at the gene level, which have limited power due to the low mutation rate in most genes. RESULTS: In this paper, we develop a Probabilistic Approach for estimating the Temporal Order of Pathway mutations by leveraging functional Annotations of mutations (PATOPA). PATOPA infers the order of mutations at the pathway level, wherein it uses a probabilistic method to characterize the likelihood of mutational events from different pathways occurring in a certain order. The functional impact of each mutation is incorporated to weigh more on a mutation that is more integral to tumor development. A maximum likelihood method is used to estimate parameters and infer the probability of one pathway being mutated prior to another. Simulation studies and analysis of whole exome sequencing data from The Cancer Genome Atlas (TCGA) demonstrate that PATOPA is able to accurately estimate the temporal order of pathway mutations and provides new biological insights on carcinogenesis of colorectal and lung cancers. CONCLUSIONS: PATOPA provides a useful tool to estimate temporal order of mutations at the pathway level while leveraging functional annotations of mutations.


Assuntos
Carcinogênese/genética , Anotação de Sequência Molecular , Mutação/genética , Probabilidade , Transdução de Sinais/genética , Simulação por Computador , Bases de Dados Genéticas , Humanos , Taxa de Mutação , Neoplasias/genética , Reprodutibilidade dos Testes , Fatores de Tempo
10.
BMC Bioinformatics ; 20(1): 736, 2019 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-31881961

RESUMO

BACKGROUND: With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. RESULTS: We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.


Assuntos
Doenças Transmissíveis/genética , Aprendizado de Máquina , Aminoácidos/análise , Ontologia Genética , Humanos , Redes Neurais de Computação , Mapas de Interação de Proteínas
11.
BMC Genomics ; 17(1): 807, 2016 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-27756223

RESUMO

BACKGROUND: Alzheimer's disease (AD) is a complex progressive neurodegenerative disorder commonly characterized by short term memory loss. Presently no effective therapeutic treatments exist that can completely cure this disease. The cause of Alzheimer's is still unclear, however one of the other major factors involved in AD pathogenesis are the genetic factors and around 70 % risk of the disease is assumed to be due to the large number of genes involved. Although genetic association studies have revealed a number of potential AD susceptibility genes, there still exists a need for identification of unidentified AD-associated genes and therapeutic targets to have better understanding of the disease-causing mechanisms of Alzheimer's towards development of effective AD therapeutics. RESULTS: In the present study, we have used machine learning approach to identify candidate AD associated genes by integrating topological properties of the genes from the protein-protein interaction networks, sequence features and functional annotations. We also used molecular docking approach and screened already known anti-Alzheimer drugs against the novel predicted probable targets of AD and observed that an investigational drug, AL-108, had high affinity for majority of the possible therapeutic targets. Furthermore, we performed molecular dynamics simulations and MM/GBSA calculations on the docked complexes to validate our preliminary findings. CONCLUSIONS: To the best of our knowledge, this is the first comprehensive study of its kind for identification of putative Alzheimer-associated genes using machine learning approaches and we propose that such computational studies can improve our understanding on the core etiology of AD which could lead to the development of effective anti-Alzheimer drugs.


Assuntos
Doença de Alzheimer/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Aprendizado de Máquina , Doença de Alzheimer/metabolismo , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas , Reprodutibilidade dos Testes
12.
Proteomics ; 14(9): 1014-9, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24677806

RESUMO

One of the major bottlenecks in the proteomics field today resides in the computational interpretation of the massive data generated by the latest generation of high-throughput MS instruments. MS/MS datasets are constantly increasing in size and complexity and it becomes challenging to comprehensively process such huge datasets and afterwards deduce most relevant biological information. The Mass Spectrometry Data Analysis (MSDA, https://msda.unistra.fr) online software suite provides a series of modules for in-depth MS/MS data analysis. It includes a custom databases generation toolbox, modules for filtering and extracting high-quality spectra, for running high-performance database and de novo searches, and for extracting modified peptides spectra and functional annotations. Additionally, MSDA enables running the most computationally intensive steps, namely database and de novo searches, on a computer grid thus providing a net time gain of up to 99% for data processing.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Software , Peptídeos/análise , Peptídeos/química , Proteínas/análise , Proteínas/química
13.
Plants (Basel) ; 13(6)2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38592820

RESUMO

Flowering in cassava (Manihot esculenta Crantz) is crucial for the generation of botanical seed for breeding. However, genotypes preferred by most farmers are erect and poor at flowering or never flower. To elucidate the genetic basis of flowering, 293 diverse cassava accessions were evaluated for flowering-associated traits at two locations and seasons in Uganda. Genotyping using the Diversity Array Technology Pty Ltd. (DArTseq) platform identified 24,040 single-nucleotide polymorphisms (SNPs) distributed on the 18 cassava chromosomes. Population structure analysis using principal components (PCs) and kinships showed three clusters; the first five PCs accounted for 49.2% of the observed genetic variation. Linkage disequilibrium (LD) estimation averaged 0.32 at a distance of ~2850 kb (kilo base pairs). Polymorphism information content (PIC) and minor allele frequency (MAF) were 0.25 and 0.23, respectively. A genome-wide association study (GWAS) analysis uncovered 53 significant marker-trait associations (MTAs) with flowering-associated traits involving 27 loci. Two loci, SNPs S5_29309724 and S15_11747301, were associated with all the traits. Using five of the 27 SNPs with a Phenotype_Variance_Explained (PVE) ≥ 5%, 44 candidate genes were identified in the peak SNP sites located within 50 kb upstream or downstream, with most associated with branching traits. Eight of the genes, orthologous to Arabidopsis and other plant species, had known functional annotations related to flowering, e.g., eukaryotic translation initiation factor and myb family transcription factor. This study identified genomic regions associated with flowering-associated traits in cassava, and the identified SNPs can be useful in marker-assisted selection to overcome hybridization challenges, like unsynchronized flowering, and candidate gene validation.

14.
medRxiv ; 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38496672

RESUMO

The co-occurrence of insulin resistance (IR)-related metabolic conditions with neuropsychiatric disorders is a complex public health challenge. Evidence of the genetic links between these phenotypes is emerging, but little is currently known about the genomic regions and biological functions that are involved. To address this, we performed Local Analysis of [co]Variant Association (LAVA) using large-scale (N=9,725-933,970) genome-wide association studies (GWASs) results for three IR-related conditions (type 2 diabetes mellitus, obesity, and metabolic syndrome) and nine neuropsychiatric disorders. Subsequently, positional and expression quantitative trait locus (eQTL)-based gene mapping and downstream functional genomic analyses were performed on the significant loci. Patterns of negative and positive local genetic correlations (|rg|=0.21-1, pFDR<0.05) were identified at 109 unique genomic regions across all phenotype pairs. Local correlations emerged even in the absence of global genetic correlations between IR-related conditions and Alzheimer's disease, bipolar disorder, and Tourette's syndrome. Genes mapped to the correlated regions showed enrichment in biological pathways integral to immune-inflammatory function, vesicle trafficking, insulin signalling, oxygen transport, and lipid metabolism. Colocalisation analyses further prioritised 10 genetically correlated regions for likely harbouring shared causal variants, displaying high deleterious or regulatory potential. These variants were found within or in close proximity to genes, such as SLC39A8 and HLA-DRB1, that can be targeted by supplements and already known drugs, including omega-3/6 fatty acids, immunomodulatory, antihypertensive, and cholesterol-lowering drugs. Overall, our findings underscore the complex genetic landscape of IR-neuropsychiatric multimorbidity, advocating for an integrated disease model and offering novel insights for research and treatment strategies in this domain.

15.
J Fungi (Basel) ; 9(2)2023 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-36836361

RESUMO

Potatoes rank third in terms of human consumption after rice and wheat. Globodera spp. are significant pests of potato crop worldwide. Globodera rostochiensis was found in Weining County, Guizhou Province, China, in 2019. We collected soil from the rhizosphere zone from infected potato plants and separated mature cysts through simple floatation and sieving methods. The selected cysts were surface-sterilized, and the colonized fungi were isolated and purified. At the same time, the preliminary identification of fungi and fungi parasites on the cysts of nematodes was carried out. This study aimed to define the species and frequency of fungi-colonizing cysts of G. rostochiensis collected from Weining County, Guizhou Province, China, and provide a basis for the control of G. rostochiensis. As a result, 139 strains of colonized fungi were successfully isolated. Multigene analyses showed that these isolates included 11 orders, 17 families, and 23 genera. The genera Fusarium (with a separation frequency of 59%), Penicillium (11%), Edenia (3.6%), and Paraphaeosphaeria (3.6%) were the most frequently occurring. Among the 44 strains, 27 had a colonization rate of 100% on the cysts of G. rostochiensis. Meanwhile, the functional annotation of 23 genera indicated that some fungi have multitrophic lifestyles combining endophytic, pathogenic, and saprophytic behavior. In conclusion, this study showed the species composition and lifestyle diversity of colonized fungi from G. rostochiensis and demonstrated these isolates as potential sources of biocontrol agents. Colonized fungi were isolated from G. rostochiensis for the first time in China, and the taxonomic diversity of fungi from G. rostochiensis was clarified.

16.
Methods Mol Biol ; 2493: 289-314, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35751823

RESUMO

Variant annotations, in general, refer to the process of information enrichment of genomic variants from a sequencing experiment. Typically these annotations include functional predictions, such as predicting the amino acid sequence changes from the DNA variant, predicting whether the variant will induce a splice anomaly, or predicting nonsense mediated decay. But other annotations also include combining with genomic databases, adding conservation scores, or comparing to allele frequencies from large population databases. Finally, all these annotations are combined to prioritize and filter variants into a reduced set of highly relevant variants for the study or clinical assay.


Assuntos
Bases de Dados Genéticas , Genômica , Frequência do Gene , Anotação de Sequência Molecular , Mutação , Software
17.
Front Bioinform ; 2: 968327, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36388843

RESUMO

Functional enrichment analysis or pathway enrichment analysis (PEA) is a bioinformatics technique which identifies the most over-represented biological pathways in a list of genes compared to those that would be associated with them by chance. These biological functions are found on bioinformatics annotated databases such as The Gene Ontology or KEGG; the more abundant pathways are identified through statistical techniques such as Fisher's exact test. All PEA tools require a list of genes as input. A few tools, however, read lists of genomic regions as input rather than lists of genes, and first associate these chromosome regions with their corresponding genes. These tools perform a procedure called genomic regions enrichment analysis, which can be useful for detecting the biological pathways related to a set of chromosome regions. In this brief survey, we analyze six tools for genomic regions enrichment analysis (BEHST, g:Profiler g:GOSt, GREAT, LOLA, Poly-Enrich, and ReactomePA), outlining and comparing their main features. Our comparison results indicate that the inclusion of data for regulatory elements, such as ChIP-seq, is common among these tools and could therefore improve the enrichment analysis results.

18.
Front Genet ; 13: 958217, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36186472

RESUMO

Crop Brassicas contain monogenomic and digenomic species, with no evidence of a trigenomic Brassica in nature. Through somatic fusion (Sinapis alba + B. juncea), a novel allohexaploid trigenomic Brassica (H1 = AABBSS; 2n = 60) was produced and used for transcriptome analysis to uncover genes for thermotolerance, annotations, and microsatellite markers for future molecular breeding. Illumina Novaseq 6000 generated a total of 76,055,546 paired-end raw reads, which were used for de-novo assembly, resulting in the development of 486,066 transcripts. A total of 133,167 coding sequences (CDSs) were predicted from transcripts with a mean length of 507.12 bp and 46.15% GC content. The BLASTX search of CDSs against public protein databases showed a maximum of 126,131 (94.72%) and a minimum of 29,810 (22.39%) positive hits. Furthermore, 953,773 gene ontology (GO) terms were found in 77,613 (58.28%) CDSs, which were divided into biological processes (49.06%), cellular components (31.67%), and molecular functions (19.27%). CDSs were assigned to 144 pathways by a pathway study using the KEGG database and 1,551 pathways by a similar analysis using the Reactome database. Further investigation led to the discovery of genes encoding over 2,000 heat shock proteins (HSPs). The discovery of a large number of HSPs in allohexaploid Brassica validated our earlier findings for heat tolerance at seed maturity. A total of 15,736 SSRs have been found in 13,595 CDSs, with an average of one SSR per 4.29 kb length and an SSR frequency of 11.82%. The first transcriptome assembly of a meiotically stable allohexaploid Brassica has been given in this article, along with functional annotations and the presence of SSRs, which could aid future genetic and genomic studies.

19.
Front Genet ; 13: 928783, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36081994

RESUMO

Objective: Despite being a powerful tool to identify novel variants, genome-wide association studies (GWAS) are not sufficient to explain the biological function of variants. In this study, we aimed to elucidate at the gene level the biological mechanisms involved in gastric cancer (GC) development and to identify candidate drug target genes. Materials and methods: We conducted a systematic review for GWAS on GC following the PRISMA guidelines. Single nucleotide polymorphism (SNP)-level meta-analysis and gene-based analysis (GBA) were performed to identify SNPs and genes significantly associated with GC. Expression quantitative trait loci (eQTL), disease network, pathway enrichment, gene ontology, gene-drug, and chemical interaction analyses were conducted to elucidate the function of the genes identified by GBA. Results: A review of GWAS on GC identified 226 SNPs located in 91 genes. In the comprehensive GBA, 44 genes associated with GC were identified, among which 12 genes (THBS3, GBAP1, KRTCAP2, TRIM46, HCN3, MUC1, DAP3, EFNA1, MTX1, PRKAA1, PSCA, and ABO) were eQTL. Using disease network and pathway analyses, we identified that PRKAA, THBS3, and EFNA1 were significantly associated with the PI3K-Alt-mTOR-signaling pathway, which is involved in various oncogenic processes, and that MUC1 acts as a regulator in both the PI3K-Alt-mTOR and P53 signaling pathways. Furthermore, RPKAA1 had the highest number of interactions with drugs and chemicals. Conclusion: Our study suggests that PRKAA1, a gene in the PI3K-Alt-mTOR-signaling pathway, could be a potential target gene for drug development associated with GC in the future. Systematic Review Registration: website, identifier registration number.

20.
HGG Adv ; 3(1): 100063, 2022 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-35047852

RESUMO

Genome-wide association studies (GWASs) have identified hundreds of thousands of genetic variants associated with complex diseases and traits. However, most variants are noncoding and not clearly linked to genes, making it challenging to interpret these GWAS signals. We present a systematic variant-to-function study, prioritizing the most likely functional elements of the genome for experimental follow-up, for >148,000 variants identified for hematological traits. Specifically, we developed VAMPIRE: Variant Annotation Method Pointing to Interesting Regulatory Effects, an interactive web application implemented in R Shiny. This tool efficiently integrates and displays information from multiple complementary sources, including epigenomic signatures from blood-cell-relevant tissues or cells, functional and conservation summary scores, variant impact on protein and gene expression, chromatin conformation information, as well as publicly available GWAS and phenome-wide association study (PheWAS) results. Leveraging data generated from independently performed functional validation experiments, we demonstrate that our prioritized variants, genes, or variant-gene links are significantly more likely to be experimentally validated. This study not only has important implications for systematic and efficient revelation of functional mechanisms underlying GWAS variants for hematological traits but also provides a prototype that can be adapted to many other complex traits, paving the path for efficient variant-to-function (V2F) analyses.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA