Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 573
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38747283

RESUMO

The analysis and comparison of gene neighborhoods is a powerful approach for exploring microbial genome structure, function, and evolution. Although numerous tools exist for genome visualization and comparison, genome exploration across large genomic databases or user-generated datasets remains a challenge. Here, we introduce AnnoView, a web server designed for interactive exploration of gene neighborhoods across the bacterial and archaeal tree of life. Our server offers users the ability to identify, compare, and visualize gene neighborhoods of interest from 30 238 bacterial genomes and 1672 archaeal genomes, through integration with the comprehensive Genome Taxonomy Database and AnnoTree databases. Identified gene neighborhoods can be visualized using pre-computed functional annotations from different sources such as KEGG, Pfam and TIGRFAM, or clustered based on similarity. Alternatively, users can upload and explore their own custom genomic datasets in GBK, GFF or CSV format, or use AnnoView as a genome browser for relatively small genomes (e.g. viruses and plasmids). Ultimately, we anticipate that AnnoView will catalyze biological discovery by enabling user-friendly search, comparison, and visualization of genomic data. AnnoView is available at http://annoview.uwaterloo.ca.


Assuntos
Software , Bases de Dados Genéticas , Genoma Bacteriano , Genoma Arqueal , Genômica/métodos , Archaea/genética , Genes Microbianos/genética , Biologia Computacional/métodos , Bactérias/genética , Bactérias/classificação
2.
Mol Cell Proteomics ; 23(5): 100763, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38608842

RESUMO

The human gut microbiome is closely associated with human health and diseases. Metaproteomics has emerged as a valuable tool for studying the functionality of the gut microbiome by analyzing the entire proteins present in microbial communities. Recent advancements in liquid chromatography and tandem mass spectrometry (LC-MS/MS) techniques have expanded the detection range of metaproteomics. However, the overall coverage of the proteome in metaproteomics is still limited. While metagenomics studies have revealed substantial microbial diversity and functional potential of the human gut microbiome, few studies have summarized and studied the human gut microbiome landscape revealed with metaproteomics. In this article, we present the current landscape of human gut metaproteomics studies by re-analyzing the identification results from 15 published studies. We quantified the limited proteome coverage in metaproteomics and revealed a high proportion of annotation coverage of metaproteomics-identified proteins. We conducted a preliminary comparison between the metaproteomics view and the metagenomics view of the human gut microbiome, identifying key areas of consistency and divergence. Based on the current landscape of human gut metaproteomics, we discuss the feasibility of using metaproteomics to study functionally unknown proteins and propose a whole workflow peptide-centric analysis. Additionally, we suggest enhancing metaproteomics analysis by refining taxonomic classification and calculating confidence scores, as well as developing tools for analyzing the interaction between taxonomy and function.


Assuntos
Microbioma Gastrointestinal , Metagenômica , Proteômica , Humanos , Proteômica/métodos , Metagenômica/métodos , Proteoma/metabolismo , Espectrometria de Massas em Tandem , Cromatografia Líquida
3.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36847701

RESUMO

Emerging studies have shown that circular RNAs (circRNAs) are involved in a variety of biological processes and play a key role in disease diagnosing, treating and inferring. Although many methods, including traditional machine learning and deep learning, have been developed to predict associations between circRNAs and diseases, the biological function of circRNAs has not been fully exploited. Some methods have explored disease-related circRNAs based on different views, but how to efficiently use the multi-view data about circRNA is still not well studied. Therefore, we propose a computational model to predict potential circRNA-disease associations based on collaborative learning with circRNA multi-view functional annotations. First, we extract circRNA multi-view functional annotations and build circRNA association networks, respectively, to enable effective network fusion. Then, a collaborative deep learning framework for multi-view information is designed to get circRNA multi-source information features, which can make full use of the internal relationship among circRNA multi-view information. We build a network consisting of circRNAs and diseases by their functional similarity and extract the consistency description information of circRNAs and diseases. Last, we predict potential associations between circRNAs and diseases based on graph auto encoder. Our computational model has better performance in predicting candidate disease-related circRNAs than the existing ones. Furthermore, it shows the high practicability of the method that we use several common diseases as case studies to find some unknown circRNAs related to them. The experiments show that CLCDA can efficiently predict disease-related circRNAs and are helpful for the diagnosis and treatment of human disease.


Assuntos
Aprendizado Profundo , Práticas Interdisciplinares , Humanos , RNA Circular/genética , Aprendizado de Máquina , Biologia Computacional/métodos
4.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36988160

RESUMO

Small open reading frames (smORFs) encoding proteins less than 100 amino acids (aa) are known to be important regulators of key cellular processes. However, their computational identification remains a challenge. Based on a comprehensive analysis of known prokaryotic small ORFs, we have developed the ProsmORF-pred resource which uses a machine learning (ML)-based method for prediction of smORFs in the prokaryotic genome sequences. ProsmORF-pred consists of two ML models, one for initiation site recognition in nucleic acid sequences upstream of putative start codons and the other uses translated amino acid sequences to decipher functional protein like sequences. The nucleotide sequence-based initiation site recognition model has been trained using longer ORFs (>100 aa) in the same genome while the ML model for identification of protein like sequences has been trained using annotated smORFs from Escherichia coli. Comprehensive benchmarking of ProsmORF-pred reveals that its performance is comparable to other state-of-the-art approaches on the annotated smORF set derived from 32 prokaryotic genomes. Its performance is distinctly superior to other tools like PRODIGAL and RANSEPS for prediction of newly identified smORFs which have a length range of 10-30 aa, where prediction of smORFs has been a major challenge. Apart from identification of smORFs in genomic sequences, ProsmORF-pred can also aid in functional annotation of the predicted smORFs based on sequence similarity and genomic neighbourhood similarity searches in ProsmORFDB, a well-curated database of known smORFs. ProsmORF-pred along with its backend database ProsmORFDB is available as a user-friendly web server (http://www.nii.ac.in/prosmorfpred.html).


Assuntos
Genoma , Proteínas , Fases de Leitura Aberta , Proteínas/genética , Genômica , Sequência de Aminoácidos
5.
BMC Bioinformatics ; 25(1): 65, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38336614

RESUMO

BACKGROUND: Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. RESULTS: We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. CONCLUSIONS: By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils .


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Teorema de Bayes , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial , Software , Fatores de Risco
6.
Proteins ; 92(6): 776-794, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38258321

RESUMO

Three-dimensional (3D) structure information, now available at the proteome scale, may facilitate the detection of remote evolutionary relationships in protein superfamilies. Here, we illustrate this with the identification of a novel family of protein domains related to the ferredoxin-like superfold, by combining (i) transitive sequence similarity searches, (ii) clustering approaches, and (iii) the use of AlphaFold2 3D structure models. Domains of this family were initially identified in relation with the intracellular biomineralization of calcium carbonates by Cyanobacteria. They are part of the large heavy-metal-associated (HMA) superfamily, departing from the latter by specific sequence and structural features. In particular, most of them share conserved basic amino acids  (hence their name CoBaHMA for Conserved Basic residues HMA), forming a positively charged surface, which is likely to interact with anionic partners. CoBaHMA domains are found in diverse modular organizations in bacteria, existing in the form of monodomain proteins or as part of larger proteins, some of which are membrane proteins involved in transport or lipid metabolism. This suggests that the CoBaHMA domains may exert a regulatory function, involving interactions with anionic lipids. This hypothesis might have a particular resonance in the context of the compartmentalization observed for cyanobacterial intracellular calcium carbonates.


Assuntos
Sequência de Aminoácidos , Proteínas de Bactérias , Metais Pesados , Modelos Moleculares , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Metais Pesados/química , Metais Pesados/metabolismo , Domínios Proteicos , Cianobactérias/metabolismo , Cianobactérias/química , Cianobactérias/genética , Ferredoxinas/química , Ferredoxinas/metabolismo , Dobramento de Proteína
7.
BMC Genomics ; 25(1): 96, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38262929

RESUMO

BACKGROUND: Angelica sinensis (Danggui), a renowned medicinal orchid, has gained significant recognition for its therapeutic effects in treating a wide range of ailments. Genome information serves as a valuable resource, enabling researchers to gain a deeper understanding of gene function. In recent times, the availability of chromosome-level genomes for A. sinensis has opened up vast opportunities for exploring gene functionality. Integrating multiomics data can allow researchers to unravel the intricate mechanisms underlying gene function in A. sinensis and further enhance our knowledge of its medicinal properties. RESULTS: In this study, we utilized genomic and transcriptomic data to construct a coexpression network for A. sinensis. To annotate genes, we aligned them with sequences from various databases, such as the NR, TAIR, trEMBL, UniProt, and SwissProt databases. For GO and KEGG annotations, we employed InterProScan and GhostKOALA software. Additionally, gene families were predicted using iTAK, HMMER, OrholoFinder, and KEGG annotation. To facilitate gene functional analysis in A. sinensis, we developed a comprehensive platform that integrates genomic and transcriptomic data with processed functional annotations. The platform includes several tools, such as BLAST, GSEA, Heatmap, JBrowse, and Sequence Extraction. This integrated resource and approach will enable researchers to explore the functional aspects of genes in A. sinensis more effectively. CONCLUSION: We developed a platform, named ASAP, to facilitate gene functional analysis in A. sinensis. ASAP ( www.gzybioinformatics.cn/ASAP ) offers a comprehensive collection of genome data, transcriptome resources, and analysis tools. This platform serves as a valuable resource for researchers conducting gene functional research in their projects, providing them with the necessary data and tools to enhance their studies.


Assuntos
Angelica sinensis , Genômica , Bases de Dados de Proteínas , Perfilação da Expressão Gênica , Pesquisa em Genética
8.
BMC Genomics ; 25(1): 587, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38862915

RESUMO

BACKGROUND: The field of bee genomics has considerably advanced in recent years, however, the most diverse group of honey producers on the planet, the stingless bees, are still largely neglected. In fact, only eleven of the ~ 600 described stingless bee species have been sequenced, and only three using a long-read (LR) sequencing technology. Here, we sequenced the nuclear and mitochondrial genomes of the most common, widespread and broadly reared stingless bee in Brazil and other neotropical countries-Tetragonisca angustula (popularly known in Brazil as jataí). RESULTS: A total of 48.01 Gb of DNA data were generated, including 2.31 Gb of Pacific Bioscience HiFi reads and 45.70 Gb of Illumina short reads (SRs). Our preferred assembly comprised 683 contigs encompassing 284.49 Mb, 62.84 Mb of which (22.09%) corresponded to 445,793 repetitive elements. N50, L50 and complete BUSCOs reached 1.02 Mb, 91 contigs and 97.1%, respectively. We predicted that the genome of T. angustula comprises 17,459 protein-coding genes and 4,108 non-coding RNAs. The mitogenome consisted of 17,410 bp, and all 37 genes were found to be on the positive strand, an unusual feature among bees. A phylogenomic analysis of 26 hymenopteran species revealed that six odorant receptor orthogroups of T. angustula were found to be experiencing rapid evolution, four of them undergoing significant contractions. CONCLUSIONS: Here, we provided the first nuclear and mitochondrial genome assemblies for the ecologically and economically important T. angustula, the fourth stingless bee species to be sequenced with LR technology thus far. We demonstrated that even relatively small amounts of LR data in combination with sufficient SR data can yield high-quality genome assemblies for bees.


Assuntos
Genoma Mitocondrial , Filogenia , Animais , Abelhas/genética , Núcleo Celular/genética , Anotação de Sequência Molecular , Polinização , Genômica/métodos , Genoma de Inseto , Análise de Sequência de DNA
9.
BMC Genomics ; 25(1): 6, 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38166563

RESUMO

BACKGROUND: Microsporidia are a large taxon of intracellular pathogens characterized by extraordinarily streamlined genomes with unusually high sequence divergence and many species-specific adaptations. These unique factors pose challenges for traditional genome annotation methods based on sequence similarity. As a result, many of the microsporidian genomes sequenced to date contain numerous genes of unknown function. Recent innovations in rapid and accurate structure prediction and comparison, together with the growing amount of data in structural databases, provide new opportunities to assist in the functional annotation of newly sequenced genomes. RESULTS: In this study, we established a workflow that combines sequence and structure-based functional gene annotation approaches employing a ChimeraX plugin named ANNOTEX (Annotation Extension for ChimeraX), allowing for visual inspection and manual curation. We employed this workflow on a high-quality telomere-to-telomere sequenced tetraploid genome of Vairimorpha necatrix. First, the 3080 predicted protein-coding DNA sequences, of which 89% were confirmed with RNA sequencing data, were used as input. Next, ColabFold was used to create protein structure predictions, followed by a Foldseek search for structural matching to the PDB and AlphaFold databases. The subsequent manual curation, using sequence and structure-based hits, increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools. Our workflow resulted in a comprehensive description of the V. necatrix genome, along with a structural summary of the most prevalent protein groups, such as the ricin B lectin family. In addition, and to test our tool, we identified the functions of several previously uncharacterized Encephalitozoon cuniculi genes. CONCLUSION: We provide a new functional annotation tool for divergent organisms and employ it on a newly sequenced, high-quality microsporidian genome to shed light on this uncharacterized intracellular pathogen of Lepidoptera. The addition of a structure-based annotation approach can serve as a valuable template for studying other microsporidian or similarly divergent species.


Assuntos
Genoma , Genômica , Anotação de Sequência Molecular
10.
Trends Genet ; 37(12): 1081-1094, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34315631

RESUMO

Human large-scale genetic association studies have identified sequence variations at thousands of genetic risk loci that are more common in patients with diverse metabolic disease compared with healthy controls. While these genetic associations have been replicated in multiple large cohorts and sometimes can explain up to 50% of heritability, the molecular and cellular mechanisms affected by common genetic variation associated with metabolic disease remains mostly unknown. A variety of new genome-wide data types, in conjunction with novel biostatistical and computational analytical methodologies and foundational experimental technologies, are paving the way for a principled approach to systematic variant-to-function (V2F) studies for metabolic diseases, turning associated regions into causal variants, cell types and states of action, effector genes, and cellular and physiological mechanisms. Identification of new target genes and cellular programs for metabolic risk loci will improve mechanistic understanding of disease biology and identification of novel therapeutic strategies.


Assuntos
Estudo de Associação Genômica Ampla , Doenças Metabólicas , Estudos de Associação Genética , Loci Gênicos , Predisposição Genética para Doença , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Genética Humana , Humanos , Doenças Metabólicas/genética , Polimorfismo de Nucleotídeo Único
11.
Am J Hum Genet ; 108(7): 1190-1203, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-34146516

RESUMO

A combination of genetic and functional approaches has identified three independent breast cancer risk loci at 2q35. A recent fine-scale mapping analysis to refine these associations resulted in 1 (signal 1), 5 (signal 2), and 42 (signal 3) credible causal variants at these loci. We used publicly available in silico DNase I and ChIP-seq data with in vitro reporter gene and CRISPR assays to annotate signals 2 and 3. We identified putative regulatory elements that enhanced cell-type-specific transcription from the IGFBP5 promoter at both signals (30- to 40-fold increased expression by the putative regulatory element at signal 2, 2- to 3-fold by the putative regulatory element at signal 3). We further identified one of the five credible causal variants at signal 2, a 1.4 kb deletion (esv3594306), as the likely causal variant; the deletion allele of this variant was associated with an average additional increase in IGFBP5 expression of 1.3-fold (MCF-7) and 2.2-fold (T-47D). We propose a model in which the deletion allele of esv3594306 juxtaposes two transcription factor binding regions (annotated by estrogen receptor alpha ChIP-seq peaks) to generate a single extended regulatory element. This regulatory element increases cell-type-specific expression of the tumor suppressor gene IGFBP5 and, thereby, reduces risk of estrogen receptor-positive breast cancer (odds ratio = 0.77, 95% CI 0.74-0.81, p = 3.1 × 10-31).


Assuntos
Proteína 5 de Ligação a Fator de Crescimento Semelhante à Insulina/genética , Anotação de Sequência Molecular , Regiões Promotoras Genéticas , Neoplasias da Mama/genética , Sistemas CRISPR-Cas , Linhagem Celular , Mapeamento Cromossômico , Cromossomos Humanos Par 2 , Feminino , Estudos de Associação Genética , Variação Genética , Humanos , Fatores de Risco , Deleção de Sequência
12.
Am J Hum Genet ; 108(8): 1488-1501, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34214457

RESUMO

Across species, offspring of related individuals often exhibit significant reduction in fitness-related traits, known as inbreeding depression (ID), yet the genetic and molecular basis for ID remains elusive. Here, we develop a method to quantify enrichment of ID within specific genomic annotations and apply it to human data. We analyzed the phenomes and genomes of ∼350,000 unrelated participants of the UK Biobank and found, on average of over 11 traits, significant enrichment of ID within genomic regions with high recombination rates (>21-fold; p < 10-5), with conserved function across species (>19-fold; p < 10-4), and within regulatory elements such as DNase I hypersensitive sites (∼5-fold; p = 8.9 × 10-7). We also quantified enrichment of ID within trait-associated regions and found suggestive evidence that genomic regions contributing to additive genetic variance in the population are enriched for ID signal. We find strong correlations between functional enrichment of SNP-based heritability and that of ID (r = 0.8, standard error: 0.1). These findings provide empirical evidence that ID is most likely due to many partially recessive deleterious alleles in low linkage disequilibrium regions of the genome. Our study suggests that functional characterization of ID may further elucidate the genetic architectures and biological mechanisms underlying complex traits and diseases.


Assuntos
Estudo de Associação Genômica Ampla , Genômica/métodos , Depressão por Endogamia/genética , Desequilíbrio de Ligação , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Feminino , Humanos , Masculino
13.
BMC Plant Biol ; 24(1): 410, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38760710

RESUMO

Rosa roxburghii Tratt, a valuable plant in China with long history, is famous for its fruit. It possesses various secondary metabolites, such as L-ascorbic acid (vitamin C), alkaloids and poly saccharides, which make it a high nutritional and medicinal value. Here we characterized the chromosome-level genome sequence of R. roxburghii, comprising seven pseudo-chromosomes with a total size of 531 Mb and a heterozygosity of 0.25%. We also annotated 45,226 coding gene loci after masking repeat elements. Orthologs for 90.1% of the Complete Single-Copy BUSCOs were found in the R. roxburghii annotation. By aligning with protein sequences from public platform, we annotated 85.89% genes from R. roxburghii. Comparative genomic analysis revealed that R. roxburghii diverged from Rosa chinensis approximately 5.58 to 13.17 million years ago, and no whole-genome duplication event occurred after the divergence from eudicots. To fully utilize this genomic resource, we constructed a genomic database RroFGD with various analysis tools. Otherwise, 69 enzyme genes involved in L-ascorbate biosynthesis were identified and a key enzyme in the biosynthesis of vitamin C, GDH (L-Gal-1-dehydrogenase), is used as an example to introduce the functions of the database. This genome and database will facilitate the future investigations into gene function and molecular breeding in R. roxburghii.


Assuntos
Cromossomos de Plantas , Genoma de Planta , Rosa , Rosa/genética , Rosa/metabolismo , Cromossomos de Plantas/genética , Bases de Dados Genéticas , Metabolismo Secundário/genética , Ácido Ascórbico/metabolismo , Ácido Ascórbico/biossíntese
14.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34643213

RESUMO

Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genoma Humano , Genômica , Humanos , Aprendizado de Máquina
15.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36168811

RESUMO

Time-course single-cell RNA sequencing (scRNA-seq) data have been widely used to explore dynamic changes in gene expression of transcription factors (TFs) and their target genes. This information is useful to reconstruct cell-type-specific gene regulatory networks (GRNs). However, the existing tools are commonly designed to analyze either time-course bulk gene expression data or static scRNA-seq data via pseudo-time cell ordering. A few methods successfully utilize the information from multiple time points while also considering the characteristics of scRNA-seq data. We proposed dynDeepDRIM, a novel deep learning model to reconstruct GRNs using time-course scRNA-seq data. It represents the joint expression of a gene pair as an image and utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRNs from time-course scRNA-seq data. dynDeepDRIM can effectively remove the transitive TF-gene interactions by considering neighborhood context and model the gene expression dynamics using high-dimensional tensors. We compared dynDeepDRIM with six GRN reconstruction methods on both simulation and four real time-course scRNA-seq data. dynDeepDRIM achieved substantially better performance than the other methods in inferring TF-gene interactions and eliminated the false positives effectively. We also applied dynDeepDRIM to annotate gene functions and found it achieved evidently better performance than the other tools due to considering the neighbor genes.


Assuntos
Aprendizado Profundo , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Expressão Gênica
16.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36088548

RESUMO

A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.


Assuntos
Genômica , Software , Bases de Dados Factuais , Bases de Dados Genéticas , Genômica/métodos , Humanos
17.
Hum Genomics ; 17(1): 65, 2023 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-37461066

RESUMO

BACKGROUND: A pathogenic filamentous fungus causing eyelid cellulitis was isolated from the secretion from a patient's left eyelid, and a phylogenetic analysis based on the rDNA internal transcribed spacer region (ITS) and single-copy gene families identified the isolated strain as Paraconiothyrium brasiliense. The genus Paraconiothyrium contains the major plant pathogenic fungi, and in our study, P. brasiliense was identified for the first time as causing human infection. To comprehensively analyze the pathogenicity, and proteomics of the isolated strain from a genetic perspective, whole-genome sequencing was performed with the Illumina NovaSeq and Oxford Nanopore Technologies platforms, and a bioinformatics analysis was performed with BLAST against genome sequences in various publicly available databases. RESULTS: The genome of P. brasiliense GGX 413 is 39.49 Mb in length, with a 51.2% GC content, and encodes 13,057 protein-coding genes and 181 noncoding RNAs. Functional annotation showed that 592 genes encode virulence factors that are involved in human disease, including 61 lethal virulence factors and 30 hypervirulence factors. Fifty-four of these 592 virulence genes are related to carbohydrate-active enzymes, including 46 genes encoding secretory CAZymes, and 119 associated with peptidases, including 70 genes encoding secretory peptidases, and 27 are involved in secondary metabolite synthesis, including four that are associated with terpenoid metabolism. CONCLUSIONS: This study establishes the genomic resources of P. brasiliense and provides a theoretical basis for future studies of the pathogenic mechanism of its infection of humans, the treatment of the diseases caused, and related research.


Assuntos
Celulite (Flegmão) , Fatores de Virulência , Humanos , Filogenia , Peptídeo Hidrolases/genética
18.
Mol Biol Rep ; 51(1): 406, 2024 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-38459415

RESUMO

BACKGROUND: Bursera trees are conspicuous elements of the tropical dry forests in the Neotropics that have significant cultural value due to their fragrant resins (incense), wood sources (handcrafts), and ecological benefits. Despite their relevance, genetic resources developed for the genus are scarce. METHODS AND RESULTS: We obtained the complete chloroplast (Cp) genome sequence, analyzed the genome structure, and performed functional annotation of three Bursera species of the Bullockia section: Bursera cuneata, B. palmeri, and B. bipinnata. The Cp genome sizes ranged from 159,824 to 159,872 bp in length, including a large single-copy (LSC) region from 87,668 to 87,656 bp, a small single-copy (SSC) from 18,581 to 18,571 bp, and two inverted repeats regions (IRa and IRb) of 26,814 bp each. The three Cp genomes consisted of 135 genes, of which 90 were protein-coding, 37 tRNAs, and 8 rRNAs. The Cp genomes were relatively conserved, with the LSC region exhibiting the greatest nucleotide divergence (psbJ, trnQ-UCC, trnG-UCC, and petL genes), whereas few changes were observed in the IR border regions. Between 589 and 591 simple sequence repeats were identified. Analysis of phylogenetic relationships using our data for each Cp region (LSC, SSC, IRa, and IRb) and of seven species within Burseraceae confirmed that Commiphora is the sister genus of Bursera. Only the phylogenetic trees based on the SSC and LSC regions resolved the close relationship between B. bipinnata and B. palmeri. CONCLUSION: Our work contributes to the development of Bursera's genomic resources for taxonomic, evolutionary, and ecological-genetic studies.


Assuntos
Bursera , Genoma de Cloroplastos , Filogenia , Bursera/genética , Sulindaco , Genoma de Cloroplastos/genética , Genômica/métodos
19.
Biochem Genet ; 62(2): 621-632, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37507643

RESUMO

Metagenomics has now evolved as a promising technology for understanding the microbial population in the environment. By metagenomics, a number of extreme and complex environment has been explored for their microbial population. Using this technology, researchers have brought out novel genes and their potential characteristics, which have robust applications in food, pharmaceutical, scientific research, and other biotechnological fields. A sequencing platform can provide a sequence of microbial populations in any given environment. The sequence needs to be analysed computationally to derive meaningful information. It is presumed that only bioinformaticians with extensive computational skills can process the sequencing data till the downstream end. However, numerous open-source software and online servers are available to analyse the metagenomic data developed for a biologist with less computational skills. This review is focused on bioinformatics tools such as Galaxy, CSI-NGS portal, ANASTASIA and SHAMAN, EBI- metagenomics, IDseq, and MG-RAST for analysing metagenomic data.

20.
Genomics ; 115(2): 110576, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36758876

RESUMO

Many fungal members of the Diatrypaceae family are pathogenic towards plants and are widely distributed globally. Cryptosphaeria pullmanensis is a pathogenic fungus that infects populus and walnut trees, causing their death. We sequenced the genome of C. pullmanensis based on a combination of Nanopore PromethION and Illumina NovaSeq PE150 platforms, and functionally annotated the sequences using a number of open-access databases. This is the first report of the genome-scale assembly and annotation for C. pullmanensis, the first species of the genus Cryptosphaeria to be sequenced. We obtained 13 contigs with an N50 contig size of 7,095,780 bp, a GC content ratio of 43.23% and a genome size of 56.72 Mb with 10,474 putative coding genes. Comparative genomic analysis against the genomes of seven Ascomycetes fungal strains was performed. Among the seven species tested, the Eutypa lata genome displayed the highest similarity to the C. pullmanensis genome in terms of collinearity and homologous gene content. This study has provided a genetic resource that offers extensive information and a framework for future investigations into the transcriptome, proteome, and metabonome of C. pullmanensis to understand its molecular pathogenesis.


Assuntos
Ascomicetos , Agricultura Florestal , Ascomicetos/genética , Genoma Fúngico , Sequenciamento Completo do Genoma , Anotação de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA