Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 589
Filtrar
Más filtros

Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38747283

RESUMEN

The analysis and comparison of gene neighborhoods is a powerful approach for exploring microbial genome structure, function, and evolution. Although numerous tools exist for genome visualization and comparison, genome exploration across large genomic databases or user-generated datasets remains a challenge. Here, we introduce AnnoView, a web server designed for interactive exploration of gene neighborhoods across the bacterial and archaeal tree of life. Our server offers users the ability to identify, compare, and visualize gene neighborhoods of interest from 30 238 bacterial genomes and 1672 archaeal genomes, through integration with the comprehensive Genome Taxonomy Database and AnnoTree databases. Identified gene neighborhoods can be visualized using pre-computed functional annotations from different sources such as KEGG, Pfam and TIGRFAM, or clustered based on similarity. Alternatively, users can upload and explore their own custom genomic datasets in GBK, GFF or CSV format, or use AnnoView as a genome browser for relatively small genomes (e.g. viruses and plasmids). Ultimately, we anticipate that AnnoView will catalyze biological discovery by enabling user-friendly search, comparison, and visualization of genomic data. AnnoView is available at http://annoview.uwaterloo.ca.


Asunto(s)
Programas Informáticos , Bases de Datos Genéticas , Genoma Bacteriano , Genoma Arqueal , Genómica/métodos , Archaea/genética , Genes Microbianos/genética , Biología Computacional/métodos , Bacterias/genética , Bacterias/clasificación
2.
Mol Cell Proteomics ; 23(5): 100763, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38608842

RESUMEN

The human gut microbiome is closely associated with human health and diseases. Metaproteomics has emerged as a valuable tool for studying the functionality of the gut microbiome by analyzing the entire proteins present in microbial communities. Recent advancements in liquid chromatography and tandem mass spectrometry (LC-MS/MS) techniques have expanded the detection range of metaproteomics. However, the overall coverage of the proteome in metaproteomics is still limited. While metagenomics studies have revealed substantial microbial diversity and functional potential of the human gut microbiome, few studies have summarized and studied the human gut microbiome landscape revealed with metaproteomics. In this article, we present the current landscape of human gut metaproteomics studies by re-analyzing the identification results from 15 published studies. We quantified the limited proteome coverage in metaproteomics and revealed a high proportion of annotation coverage of metaproteomics-identified proteins. We conducted a preliminary comparison between the metaproteomics view and the metagenomics view of the human gut microbiome, identifying key areas of consistency and divergence. Based on the current landscape of human gut metaproteomics, we discuss the feasibility of using metaproteomics to study functionally unknown proteins and propose a whole workflow peptide-centric analysis. Additionally, we suggest enhancing metaproteomics analysis by refining taxonomic classification and calculating confidence scores, as well as developing tools for analyzing the interaction between taxonomy and function.


Asunto(s)
Microbioma Gastrointestinal , Metagenómica , Proteómica , Humanos , Proteómica/métodos , Metagenómica/métodos , Proteoma/metabolismo , Espectrometría de Masas en Tándem , Cromatografía Liquida
3.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36847701

RESUMEN

Emerging studies have shown that circular RNAs (circRNAs) are involved in a variety of biological processes and play a key role in disease diagnosing, treating and inferring. Although many methods, including traditional machine learning and deep learning, have been developed to predict associations between circRNAs and diseases, the biological function of circRNAs has not been fully exploited. Some methods have explored disease-related circRNAs based on different views, but how to efficiently use the multi-view data about circRNA is still not well studied. Therefore, we propose a computational model to predict potential circRNA-disease associations based on collaborative learning with circRNA multi-view functional annotations. First, we extract circRNA multi-view functional annotations and build circRNA association networks, respectively, to enable effective network fusion. Then, a collaborative deep learning framework for multi-view information is designed to get circRNA multi-source information features, which can make full use of the internal relationship among circRNA multi-view information. We build a network consisting of circRNAs and diseases by their functional similarity and extract the consistency description information of circRNAs and diseases. Last, we predict potential associations between circRNAs and diseases based on graph auto encoder. Our computational model has better performance in predicting candidate disease-related circRNAs than the existing ones. Furthermore, it shows the high practicability of the method that we use several common diseases as case studies to find some unknown circRNAs related to them. The experiments show that CLCDA can efficiently predict disease-related circRNAs and are helpful for the diagnosis and treatment of human disease.


Asunto(s)
Aprendizaje Profundo , Prácticas Interdisciplinarias , Humanos , ARN Circular/genética , Aprendizaje Automático , Biología Computacional/métodos
4.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-36988160

RESUMEN

Small open reading frames (smORFs) encoding proteins less than 100 amino acids (aa) are known to be important regulators of key cellular processes. However, their computational identification remains a challenge. Based on a comprehensive analysis of known prokaryotic small ORFs, we have developed the ProsmORF-pred resource which uses a machine learning (ML)-based method for prediction of smORFs in the prokaryotic genome sequences. ProsmORF-pred consists of two ML models, one for initiation site recognition in nucleic acid sequences upstream of putative start codons and the other uses translated amino acid sequences to decipher functional protein like sequences. The nucleotide sequence-based initiation site recognition model has been trained using longer ORFs (>100 aa) in the same genome while the ML model for identification of protein like sequences has been trained using annotated smORFs from Escherichia coli. Comprehensive benchmarking of ProsmORF-pred reveals that its performance is comparable to other state-of-the-art approaches on the annotated smORF set derived from 32 prokaryotic genomes. Its performance is distinctly superior to other tools like PRODIGAL and RANSEPS for prediction of newly identified smORFs which have a length range of 10-30 aa, where prediction of smORFs has been a major challenge. Apart from identification of smORFs in genomic sequences, ProsmORF-pred can also aid in functional annotation of the predicted smORFs based on sequence similarity and genomic neighbourhood similarity searches in ProsmORFDB, a well-curated database of known smORFs. ProsmORF-pred along with its backend database ProsmORFDB is available as a user-friendly web server (http://www.nii.ac.in/prosmorfpred.html).


Asunto(s)
Genoma , Proteínas , Sistemas de Lectura Abierta , Proteínas/genética , Genómica , Secuencia de Aminoácidos
5.
Genomics ; 116(5): 110932, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39216707

RESUMEN

Dendrobium officinale is a rare and precious medicinal plant. Southern blight is a destructive disease in the artificial cultivation of D. officinale, and one of its pathogens is Sclerotium delphinii. S. delphinii is a phytopathogenic fungus with a wide host range with extremely strong pathogenicity. In this study, S. delphinii was isolated from D. officinale with southern blight. Subsequently, this specific strain underwent thorough whole-genome sequencing using the PacBio Sequel II platform, which employed single-molecule real-time (SMRT) technology. Comprehensive annotations were obtained through functional annotation of protein sequences using various publicly available databases. The genome of S. delphinii measures 73.66 Mb, with an N90 contig size of 2,707,110 bp, and it contains 18,506 putative predictive genes. This study represents the first report on the genome size assembly and annotation of S. delphinii, making it the initial species within the Sclerotium genus to undergo whole-genome sequencing, which can provide solid data and a theoretical basis for further research on the pathogenesis, omics of S. delphinii.


Asunto(s)
Dendrobium , Genoma Fúngico , Enfermedades de las Plantas , Secuenciación Completa del Genoma , Dendrobium/microbiología , Dendrobium/genética , Enfermedades de las Plantas/microbiología , Anotación de Secuencia Molecular , Basidiomycota/genética , Basidiomycota/patogenicidad
6.
BMC Bioinformatics ; 25(1): 65, 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-38336614

RESUMEN

BACKGROUND: Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. RESULTS: We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. CONCLUSIONS: By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils .


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Teorema de Bayes , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial , Programas Informáticos , Factores de Riesgo
7.
Proteins ; 92(6): 776-794, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38258321

RESUMEN

Three-dimensional (3D) structure information, now available at the proteome scale, may facilitate the detection of remote evolutionary relationships in protein superfamilies. Here, we illustrate this with the identification of a novel family of protein domains related to the ferredoxin-like superfold, by combining (i) transitive sequence similarity searches, (ii) clustering approaches, and (iii) the use of AlphaFold2 3D structure models. Domains of this family were initially identified in relation with the intracellular biomineralization of calcium carbonates by Cyanobacteria. They are part of the large heavy-metal-associated (HMA) superfamily, departing from the latter by specific sequence and structural features. In particular, most of them share conserved basic amino acids  (hence their name CoBaHMA for Conserved Basic residues HMA), forming a positively charged surface, which is likely to interact with anionic partners. CoBaHMA domains are found in diverse modular organizations in bacteria, existing in the form of monodomain proteins or as part of larger proteins, some of which are membrane proteins involved in transport or lipid metabolism. This suggests that the CoBaHMA domains may exert a regulatory function, involving interactions with anionic lipids. This hypothesis might have a particular resonance in the context of the compartmentalization observed for cyanobacterial intracellular calcium carbonates.


Asunto(s)
Secuencia de Aminoácidos , Proteínas Bacterianas , Metales Pesados , Modelos Moleculares , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Proteínas Bacterianas/genética , Metales Pesados/química , Metales Pesados/metabolismo , Dominios Proteicos , Cianobacterias/metabolismo , Cianobacterias/química , Cianobacterias/genética , Ferredoxinas/química , Ferredoxinas/metabolismo , Pliegue de Proteína
8.
BMC Genomics ; 25(1): 96, 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38262929

RESUMEN

BACKGROUND: Angelica sinensis (Danggui), a renowned medicinal orchid, has gained significant recognition for its therapeutic effects in treating a wide range of ailments. Genome information serves as a valuable resource, enabling researchers to gain a deeper understanding of gene function. In recent times, the availability of chromosome-level genomes for A. sinensis has opened up vast opportunities for exploring gene functionality. Integrating multiomics data can allow researchers to unravel the intricate mechanisms underlying gene function in A. sinensis and further enhance our knowledge of its medicinal properties. RESULTS: In this study, we utilized genomic and transcriptomic data to construct a coexpression network for A. sinensis. To annotate genes, we aligned them with sequences from various databases, such as the NR, TAIR, trEMBL, UniProt, and SwissProt databases. For GO and KEGG annotations, we employed InterProScan and GhostKOALA software. Additionally, gene families were predicted using iTAK, HMMER, OrholoFinder, and KEGG annotation. To facilitate gene functional analysis in A. sinensis, we developed a comprehensive platform that integrates genomic and transcriptomic data with processed functional annotations. The platform includes several tools, such as BLAST, GSEA, Heatmap, JBrowse, and Sequence Extraction. This integrated resource and approach will enable researchers to explore the functional aspects of genes in A. sinensis more effectively. CONCLUSION: We developed a platform, named ASAP, to facilitate gene functional analysis in A. sinensis. ASAP ( www.gzybioinformatics.cn/ASAP ) offers a comprehensive collection of genome data, transcriptome resources, and analysis tools. This platform serves as a valuable resource for researchers conducting gene functional research in their projects, providing them with the necessary data and tools to enhance their studies.


Asunto(s)
Angelica sinensis , Genómica , Bases de Datos de Proteínas , Perfilación de la Expresión Génica , Investigación Genética
9.
BMC Genomics ; 25(1): 587, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38862915

RESUMEN

BACKGROUND: The field of bee genomics has considerably advanced in recent years, however, the most diverse group of honey producers on the planet, the stingless bees, are still largely neglected. In fact, only eleven of the ~ 600 described stingless bee species have been sequenced, and only three using a long-read (LR) sequencing technology. Here, we sequenced the nuclear and mitochondrial genomes of the most common, widespread and broadly reared stingless bee in Brazil and other neotropical countries-Tetragonisca angustula (popularly known in Brazil as jataí). RESULTS: A total of 48.01 Gb of DNA data were generated, including 2.31 Gb of Pacific Bioscience HiFi reads and 45.70 Gb of Illumina short reads (SRs). Our preferred assembly comprised 683 contigs encompassing 284.49 Mb, 62.84 Mb of which (22.09%) corresponded to 445,793 repetitive elements. N50, L50 and complete BUSCOs reached 1.02 Mb, 91 contigs and 97.1%, respectively. We predicted that the genome of T. angustula comprises 17,459 protein-coding genes and 4,108 non-coding RNAs. The mitogenome consisted of 17,410 bp, and all 37 genes were found to be on the positive strand, an unusual feature among bees. A phylogenomic analysis of 26 hymenopteran species revealed that six odorant receptor orthogroups of T. angustula were found to be experiencing rapid evolution, four of them undergoing significant contractions. CONCLUSIONS: Here, we provided the first nuclear and mitochondrial genome assemblies for the ecologically and economically important T. angustula, the fourth stingless bee species to be sequenced with LR technology thus far. We demonstrated that even relatively small amounts of LR data in combination with sufficient SR data can yield high-quality genome assemblies for bees.


Asunto(s)
Genoma Mitocondrial , Filogenia , Animales , Abejas/genética , Núcleo Celular/genética , Anotación de Secuencia Molecular , Polinización , Genómica/métodos , Genoma de los Insectos , Análisis de Secuencia de ADN
10.
BMC Genomics ; 25(1): 6, 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38166563

RESUMEN

BACKGROUND: Microsporidia are a large taxon of intracellular pathogens characterized by extraordinarily streamlined genomes with unusually high sequence divergence and many species-specific adaptations. These unique factors pose challenges for traditional genome annotation methods based on sequence similarity. As a result, many of the microsporidian genomes sequenced to date contain numerous genes of unknown function. Recent innovations in rapid and accurate structure prediction and comparison, together with the growing amount of data in structural databases, provide new opportunities to assist in the functional annotation of newly sequenced genomes. RESULTS: In this study, we established a workflow that combines sequence and structure-based functional gene annotation approaches employing a ChimeraX plugin named ANNOTEX (Annotation Extension for ChimeraX), allowing for visual inspection and manual curation. We employed this workflow on a high-quality telomere-to-telomere sequenced tetraploid genome of Vairimorpha necatrix. First, the 3080 predicted protein-coding DNA sequences, of which 89% were confirmed with RNA sequencing data, were used as input. Next, ColabFold was used to create protein structure predictions, followed by a Foldseek search for structural matching to the PDB and AlphaFold databases. The subsequent manual curation, using sequence and structure-based hits, increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools. Our workflow resulted in a comprehensive description of the V. necatrix genome, along with a structural summary of the most prevalent protein groups, such as the ricin B lectin family. In addition, and to test our tool, we identified the functions of several previously uncharacterized Encephalitozoon cuniculi genes. CONCLUSION: We provide a new functional annotation tool for divergent organisms and employ it on a newly sequenced, high-quality microsporidian genome to shed light on this uncharacterized intracellular pathogen of Lepidoptera. The addition of a structure-based annotation approach can serve as a valuable template for studying other microsporidian or similarly divergent species.


Asunto(s)
Genoma , Genómica , Anotación de Secuencia Molecular
11.
BMC Genomics ; 25(1): 690, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003468

RESUMEN

BACKGROUND: Heritability partitioning approaches estimate the contribution of different functional classes, such as coding or regulatory variants, to the genetic variance. This information allows a better understanding of the genetic architecture of complex traits, including complex diseases, but can also help improve the accuracy of genomic selection in livestock species. However, methods have mainly been tested on human genomic data, whereas livestock populations have specific characteristics, such as high levels of relatedness, small effective population size or long-range levels of linkage disequilibrium. RESULTS: Here, we used data from 14,762 cows, imputed at the whole-genome sequence level for 11,537,240 variants, to simulate traits in a typical livestock population and evaluate the accuracy of two state-of-the-art heritability partitioning methods, GREML and a Bayesian mixture model. In simulations where a single functional class had increased contribution to heritability, we observed that the estimators were unbiased but had low precision. When causal variants were enriched in variants with low (< 0.05) or high (> 0.20) minor allele frequency or low (below 1st quartile) or high (above 3rd quartile) linkage disequilibrium scores, it was necessary to partition the genetic variance into multiple classes defined on the basis of allele frequencies or LD scores to obtain unbiased results. When multiple functional classes had variable contributions to heritability, estimators showed higher levels of variation and confounding between certain categories was observed. In addition, estimators from small categories were particularly imprecise. However, the estimates and their ranking were still informative about the contribution of the classes. We also demonstrated that using methods that estimate the contribution of a single category at a time, a commonly used approach, results in an overestimation. Finally, we applied the methods to phenotypes for muscular development and height and estimated that, on average, variants in open chromatin regions had a higher contribution to the genetic variance (> 45%), while variants in coding regions had the strongest individual effects (> 25-fold enrichment on average). Conversely, variants in intergenic or intronic regions showed lower levels of enrichment (0.2 and 0.6-fold on average, respectively). CONCLUSIONS: Heritability partitioning approaches should be used cautiously in livestock populations, in particular for small categories. Two-component approaches that fit only one functional category at a time lead to biased estimators and should not be used.


Asunto(s)
Desequilibrio de Ligamiento , Ganado , Animales , Ganado/genética , Bovinos/genética , Teorema de Bayes , Modelos Genéticos , Frecuencia de los Genes , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Variación Genética , Genómica/métodos , Fenotipo
12.
Trends Genet ; 37(12): 1081-1094, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34315631

RESUMEN

Human large-scale genetic association studies have identified sequence variations at thousands of genetic risk loci that are more common in patients with diverse metabolic disease compared with healthy controls. While these genetic associations have been replicated in multiple large cohorts and sometimes can explain up to 50% of heritability, the molecular and cellular mechanisms affected by common genetic variation associated with metabolic disease remains mostly unknown. A variety of new genome-wide data types, in conjunction with novel biostatistical and computational analytical methodologies and foundational experimental technologies, are paving the way for a principled approach to systematic variant-to-function (V2F) studies for metabolic diseases, turning associated regions into causal variants, cell types and states of action, effector genes, and cellular and physiological mechanisms. Identification of new target genes and cellular programs for metabolic risk loci will improve mechanistic understanding of disease biology and identification of novel therapeutic strategies.


Asunto(s)
Estudio de Asociación del Genoma Completo , Enfermedades Metabólicas , Estudios de Asociación Genética , Sitios Genéticos , Predisposición Genética a la Enfermedad , Variación Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Genética Humana , Humanos , Enfermedades Metabólicas/genética , Polimorfismo de Nucleótido Simple
13.
Am J Hum Genet ; 108(7): 1190-1203, 2021 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-34146516

RESUMEN

A combination of genetic and functional approaches has identified three independent breast cancer risk loci at 2q35. A recent fine-scale mapping analysis to refine these associations resulted in 1 (signal 1), 5 (signal 2), and 42 (signal 3) credible causal variants at these loci. We used publicly available in silico DNase I and ChIP-seq data with in vitro reporter gene and CRISPR assays to annotate signals 2 and 3. We identified putative regulatory elements that enhanced cell-type-specific transcription from the IGFBP5 promoter at both signals (30- to 40-fold increased expression by the putative regulatory element at signal 2, 2- to 3-fold by the putative regulatory element at signal 3). We further identified one of the five credible causal variants at signal 2, a 1.4 kb deletion (esv3594306), as the likely causal variant; the deletion allele of this variant was associated with an average additional increase in IGFBP5 expression of 1.3-fold (MCF-7) and 2.2-fold (T-47D). We propose a model in which the deletion allele of esv3594306 juxtaposes two transcription factor binding regions (annotated by estrogen receptor alpha ChIP-seq peaks) to generate a single extended regulatory element. This regulatory element increases cell-type-specific expression of the tumor suppressor gene IGFBP5 and, thereby, reduces risk of estrogen receptor-positive breast cancer (odds ratio = 0.77, 95% CI 0.74-0.81, p = 3.1 × 10-31).


Asunto(s)
Proteína 5 de Unión a Factor de Crecimiento Similar a la Insulina/genética , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas , Neoplasias de la Mama/genética , Sistemas CRISPR-Cas , Línea Celular , Mapeo Cromosómico , Cromosomas Humanos Par 2 , Femenino , Estudios de Asociación Genética , Variación Genética , Humanos , Factores de Riesgo , Eliminación de Secuencia
14.
Am J Hum Genet ; 108(8): 1488-1501, 2021 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-34214457

RESUMEN

Across species, offspring of related individuals often exhibit significant reduction in fitness-related traits, known as inbreeding depression (ID), yet the genetic and molecular basis for ID remains elusive. Here, we develop a method to quantify enrichment of ID within specific genomic annotations and apply it to human data. We analyzed the phenomes and genomes of ∼350,000 unrelated participants of the UK Biobank and found, on average of over 11 traits, significant enrichment of ID within genomic regions with high recombination rates (>21-fold; p < 10-5), with conserved function across species (>19-fold; p < 10-4), and within regulatory elements such as DNase I hypersensitive sites (∼5-fold; p = 8.9 × 10-7). We also quantified enrichment of ID within trait-associated regions and found suggestive evidence that genomic regions contributing to additive genetic variance in the population are enriched for ID signal. We find strong correlations between functional enrichment of SNP-based heritability and that of ID (r = 0.8, standard error: 0.1). These findings provide empirical evidence that ID is most likely due to many partially recessive deleterious alleles in low linkage disequilibrium regions of the genome. Our study suggests that functional characterization of ID may further elucidate the genetic architectures and biological mechanisms underlying complex traits and diseases.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica/métodos , Depresión Endogámica/genética , Desequilibrio de Ligamiento , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple , Femenino , Humanos , Masculino
15.
BMC Plant Biol ; 24(1): 410, 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38760710

RESUMEN

Rosa roxburghii Tratt, a valuable plant in China with long history, is famous for its fruit. It possesses various secondary metabolites, such as L-ascorbic acid (vitamin C), alkaloids and poly saccharides, which make it a high nutritional and medicinal value. Here we characterized the chromosome-level genome sequence of R. roxburghii, comprising seven pseudo-chromosomes with a total size of 531 Mb and a heterozygosity of 0.25%. We also annotated 45,226 coding gene loci after masking repeat elements. Orthologs for 90.1% of the Complete Single-Copy BUSCOs were found in the R. roxburghii annotation. By aligning with protein sequences from public platform, we annotated 85.89% genes from R. roxburghii. Comparative genomic analysis revealed that R. roxburghii diverged from Rosa chinensis approximately 5.58 to 13.17 million years ago, and no whole-genome duplication event occurred after the divergence from eudicots. To fully utilize this genomic resource, we constructed a genomic database RroFGD with various analysis tools. Otherwise, 69 enzyme genes involved in L-ascorbate biosynthesis were identified and a key enzyme in the biosynthesis of vitamin C, GDH (L-Gal-1-dehydrogenase), is used as an example to introduce the functions of the database. This genome and database will facilitate the future investigations into gene function and molecular breeding in R. roxburghii.


Asunto(s)
Cromosomas de las Plantas , Genoma de Planta , Rosa , Rosa/genética , Rosa/metabolismo , Cromosomas de las Plantas/genética , Bases de Datos Genéticas , Metabolismo Secundario/genética , Ácido Ascórbico/metabolismo , Ácido Ascórbico/biosíntesis
16.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34643213

RESUMEN

Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Genoma Humano , Genómica , Humanos , Aprendizaje Automático
17.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36088548

RESUMEN

A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.


Asunto(s)
Genómica , Programas Informáticos , Bases de Datos Factuales , Bases de Datos Genéticas , Genómica/métodos , Humanos
18.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36168811

RESUMEN

Time-course single-cell RNA sequencing (scRNA-seq) data have been widely used to explore dynamic changes in gene expression of transcription factors (TFs) and their target genes. This information is useful to reconstruct cell-type-specific gene regulatory networks (GRNs). However, the existing tools are commonly designed to analyze either time-course bulk gene expression data or static scRNA-seq data via pseudo-time cell ordering. A few methods successfully utilize the information from multiple time points while also considering the characteristics of scRNA-seq data. We proposed dynDeepDRIM, a novel deep learning model to reconstruct GRNs using time-course scRNA-seq data. It represents the joint expression of a gene pair as an image and utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRNs from time-course scRNA-seq data. dynDeepDRIM can effectively remove the transitive TF-gene interactions by considering neighborhood context and model the gene expression dynamics using high-dimensional tensors. We compared dynDeepDRIM with six GRN reconstruction methods on both simulation and four real time-course scRNA-seq data. dynDeepDRIM achieved substantially better performance than the other methods in inferring TF-gene interactions and eliminated the false positives effectively. We also applied dynDeepDRIM to annotate gene functions and found it achieved evidently better performance than the other tools due to considering the neighbor genes.


Asunto(s)
Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Expresión Génica
19.
Hum Genomics ; 17(1): 65, 2023 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-37461066

RESUMEN

BACKGROUND: A pathogenic filamentous fungus causing eyelid cellulitis was isolated from the secretion from a patient's left eyelid, and a phylogenetic analysis based on the rDNA internal transcribed spacer region (ITS) and single-copy gene families identified the isolated strain as Paraconiothyrium brasiliense. The genus Paraconiothyrium contains the major plant pathogenic fungi, and in our study, P. brasiliense was identified for the first time as causing human infection. To comprehensively analyze the pathogenicity, and proteomics of the isolated strain from a genetic perspective, whole-genome sequencing was performed with the Illumina NovaSeq and Oxford Nanopore Technologies platforms, and a bioinformatics analysis was performed with BLAST against genome sequences in various publicly available databases. RESULTS: The genome of P. brasiliense GGX 413 is 39.49 Mb in length, with a 51.2% GC content, and encodes 13,057 protein-coding genes and 181 noncoding RNAs. Functional annotation showed that 592 genes encode virulence factors that are involved in human disease, including 61 lethal virulence factors and 30 hypervirulence factors. Fifty-four of these 592 virulence genes are related to carbohydrate-active enzymes, including 46 genes encoding secretory CAZymes, and 119 associated with peptidases, including 70 genes encoding secretory peptidases, and 27 are involved in secondary metabolite synthesis, including four that are associated with terpenoid metabolism. CONCLUSIONS: This study establishes the genomic resources of P. brasiliense and provides a theoretical basis for future studies of the pathogenic mechanism of its infection of humans, the treatment of the diseases caused, and related research.


Asunto(s)
Celulitis (Flemón) , Factores de Virulencia , Humanos , Filogenia , Péptido Hidrolasas/genética
20.
Mol Biol Rep ; 51(1): 406, 2024 Mar 09.
Artículo en Inglés | MEDLINE | ID: mdl-38459415

RESUMEN

BACKGROUND: Bursera trees are conspicuous elements of the tropical dry forests in the Neotropics that have significant cultural value due to their fragrant resins (incense), wood sources (handcrafts), and ecological benefits. Despite their relevance, genetic resources developed for the genus are scarce. METHODS AND RESULTS: We obtained the complete chloroplast (Cp) genome sequence, analyzed the genome structure, and performed functional annotation of three Bursera species of the Bullockia section: Bursera cuneata, B. palmeri, and B. bipinnata. The Cp genome sizes ranged from 159,824 to 159,872 bp in length, including a large single-copy (LSC) region from 87,668 to 87,656 bp, a small single-copy (SSC) from 18,581 to 18,571 bp, and two inverted repeats regions (IRa and IRb) of 26,814 bp each. The three Cp genomes consisted of 135 genes, of which 90 were protein-coding, 37 tRNAs, and 8 rRNAs. The Cp genomes were relatively conserved, with the LSC region exhibiting the greatest nucleotide divergence (psbJ, trnQ-UCC, trnG-UCC, and petL genes), whereas few changes were observed in the IR border regions. Between 589 and 591 simple sequence repeats were identified. Analysis of phylogenetic relationships using our data for each Cp region (LSC, SSC, IRa, and IRb) and of seven species within Burseraceae confirmed that Commiphora is the sister genus of Bursera. Only the phylogenetic trees based on the SSC and LSC regions resolved the close relationship between B. bipinnata and B. palmeri. CONCLUSION: Our work contributes to the development of Bursera's genomic resources for taxonomic, evolutionary, and ecological-genetic studies.


Asunto(s)
Bursera , Genoma del Cloroplasto , Filogenia , Bursera/genética , Sulindac , Genoma del Cloroplasto/genética , Genómica/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA