Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 145
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 21(7): 1349-1363, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38849569

RESUMO

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.


Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Humanos , Animais , Camundongos , RNA-Seq/métodos , Perfilação da Expressão Gênica/métodos , Transcriptoma , Análise de Sequência de RNA/métodos , Anotação de Sequência Molecular/métodos
2.
Sci Data ; 11(1): 447, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38702363

RESUMO

Cinnamomum chago is a tree species endemic to Yunnan province, China, with potential economic value, phylogenetic importance, and conservation priority. We assembled the genome of C. chago using multiple sequencing technologies, resulting in a high-quality, chromosomal-level genome with annotation information. The assembled genome size is approximately 1.06 Gb, with a contig N50 length of 92.10 Mb. About 99.92% of the assembled sequences could be anchored to 12 pseudo-chromosomes, with only one gap, and 63.73% of the assembled genome consists of repeat sequences. In total, 30,497 genes were recognized according to annotation, including 28,681 protein-coding genes. This high-quality chromosome-level assembly and annotation of C. chago will assist us in the conservation and utilization of this valuable resource, while also providing crucial data for studying the evolutionary relationships within the Cinnamomum genus, offering opportunities for further research and exploration of its diverse applications.


Assuntos
Cinnamomum , Genoma de Planta , Cinnamomum/genética , Cromossomos de Plantas/genética , China , Anotação de Sequência Molecular , Espécies em Perigo de Extinção
3.
Mar Genomics ; 70: 101044, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37196472

RESUMO

Haliotis midae or "perlemoen" is one of five abalone species endemic to South Africa, and being palatable, the only commercially important abalone species with a high international demand. The higher demand for this abalone species has resulted in the decrease of natural stocks due to overexploitation by capture fisheries and poaching. Facilitating aquaculture production of H. midae should assist in minimising the pressure on the wild populations. Here, the draft genome of H. midae has been sequenced, assembled, and annotated. The draft assembly resulted in a total length of 1.5 Gb, contig N50 of 0.238 Mb, scaffold N50 of 0. 238 Mb and GC level of 40%. Gene annotation, combining ab initio and evidence-based pipelines identified 52,280 genes with protein coding potential. The genes identified were used to predict orthologous genes shared among the four other abalone species (H. laevigata, H. rubra, H. discus hannai and H. rufescens) and 4702 orthologous genes were shared across the five species. Among the orthologous genes in abalones, single copy genes were further analysed for signatures of selection and several molecular regulatory proteins involved in developmental functions were found to be under positive selection in specific abalone lineages. Furthermore, whole genome SNP-based phylogenomic assessment was performed to confirm the evolutionary relationship among the considered abalone species with draft genomes, reaffirming that H. midae is closely related to the Australian Greenlip (H. laevigata) and Blacklip (H. rubra). The study assists in the understanding of genes related to various biological systems underscoring the evolution and development of abalones, with potential applications for genetic improvement of commercial stocks.


Assuntos
Gastrópodes , Genômica , Animais , Austrália , Genoma , Anotação de Sequência Molecular , Aquicultura/métodos , Gastrópodes/genética
4.
G3 (Bethesda) ; 12(8)2022 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-35758619

RESUMO

Brachymystax tsinlingensis Li, 1966 is an endangered freshwater fish with economic, ecological, and scientific values. Study of the genome of B. tsinlingensis might be particularly insightful given that this is the only Brachymystax species with genome. We present a high-quality chromosome-level genome assembly and protein-coding gene annotation for B. tsinlingensis with Illumina short reads, Nanopore long reads, Hi-C sequencing reads, and RNA-seq reads from 5 tissues/organs. The final chromosome-level genome size is 2,031,709,341 bp with 40 chromosomes. We found that the salmonids have a unique GC content and codon usage, have a slower evolutionary rate, and possess specific positively selected genes. We also confirmed the salmonids have undergone a whole-genome duplication event and a burst of transposon-mediated repeat expansion, and lost HoxAbß Hox cluster, highly expressed genes in muscle may partially explain the migratory habits of B. tsinlingensis. The high-quality B. tsinlingensis assembled genome could provide a valuable reference for the study of other salmonids as well as aid the conservation of this endangered species.


Assuntos
Salmonidae , Animais , Composição de Bases , Cromossomos/genética , Tamanho do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Filogenia , Salmonidae/genética
5.
Sci Rep ; 12(1): 11075, 2022 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-35773379

RESUMO

The genes associated with fruiting body formation of Sparasis latifolia are valuable for improving mushroom breeding. To investigate this process, 4.8 × 108 RNA-Seq reads were acquired from three stages: hyphal knot (SM), primordium (SP), and primordium differentiation (SPD). The de novo assembly generated a total of 48,549 unigenes, of which 71.53% (34,728) unigenes could be annotated by at least one of the KEGG (Kyoto Encyclopedia of Genes and Genomes), GO (Gene Ontology), and KOG (Eukaryotic Orthologous Group) databases. KEGG and KOG analyses respectively mapped 32,765 unigenes to 202 pathways and 19,408 unigenes to 25 categories. KEGG pathway enrichment analysis of DEGs (differentially expressed genes) indicated primordium initiation was significantly related to 66 pathways, such as "Ribosome", "metabolism of xenobiotics by cytochrome P450", and "glutathione metabolism" (among others). The MAPK and mTOR signal transduction pathways underwent significant adjustments during the SM to SP transition. Further, our research revealed the PI3K-Akt signaling pathway related to cell proliferation could play crucial functions during the development of SP and SPD. These findings provide crucial candidate genes and pathways related to primordium differentiation and development in S. latifolia, and advances our knowledge about mushroom morphogenesis.


Assuntos
Agaricales , Transcriptoma , Agaricales/genética , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Fosfatidilinositol 3-Quinases/genética , Melhoramento Vegetal , Polyporales
7.
Am J Hum Genet ; 109(1): 50-65, 2022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34919805

RESUMO

Lack of diversity in human genomics limits our understanding of the genetic underpinnings of complex traits, hinders precision medicine, and contributes to health disparities. To map genetic effects on gene regulation in the underrepresented Indonesian population, we have integrated genotype, gene expression, and CpG methylation data from 115 participants across three island populations that capture the major sources of genomic diversity in the region. In a comparison with European datasets, we identify eQTLs shared between Indonesia and Europe as well as population-specific eQTLs that exhibit differences in allele frequencies and/or overall expression levels between populations. By combining local ancestry and archaic introgression inference with eQTLs and methylQTLs, we identify regulatory loci driven by modern Papuan ancestry as well as introgressed Denisovan and Neanderthal variation. GWAS colocalization connects QTLs detected here to hematological traits, and further comparison with European datasets reflects the poor overall transferability of GWAS statistics across diverse populations. Our findings illustrate how population-specific genetic architecture, local ancestry, and archaic introgression drive variation in gene regulation across genetically distinct and in admixed populations and highlight the need for performing association studies on non-European populations.


Assuntos
Regulação da Expressão Gênica , Genética Populacional , Genoma Humano , Locos de Características Quantitativas , Biologia Computacional/métodos , Metilação de DNA , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Indonésia , Masculino , Modelos Genéticos , Anotação de Sequência Molecular , Herança Multifatorial , Característica Quantitativa Herdável , Seleção Genética , Sequenciamento Completo do Genoma
8.
PLoS Comput Biol ; 17(10): e1009463, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34710081

RESUMO

Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.


Assuntos
Crowdsourcing/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Biologia Computacional , Bases de Dados Genéticas , Humanos , Proteínas/genética , Proteínas/fisiologia
9.
Curr Genet ; 67(6): 891-907, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34195871

RESUMO

Diverse agriculturally important microbes have been studied with known potential in plant growth promotion. Providing several opportunities, Stenotrophomonas species are characterized as promising plant enhancers, inducers, and protectors against environmental stressors. The S. indicatrix BOVIS40 isolated from the sunflower root endosphere possessed unique features, as genome insights into the Stenotrophomonas species isolated from oilseed crops in Southern Africa have not been reported. Plant growth-promotion screening and genome analysis of S. indicatrix BOVIS40 were presented in this study. The genomic information reveals various genes underlining plant growth promotion and resistance to environmental stressors. The genome of S. indicatrix BOVIS40 harbors genes involved in the degradation and biotransformation of organic molecules. Also, other genes involved in biofilm production, chemotaxis, and flagellation that facilitate bacterial colonization in the root endosphere and phytohormone genes that modulate root development and stress response in plants were detected in strain BOVIS40. IAA activity of the bacterial strain may be a factor responsible for root formation. A measurable approach to the S. indicatrix BOVIS40 lifestyle can strategically provide several opportunities in their use as bioinoculants in developing environmentally friendly agriculture sustainably. The findings presented here provide insights into the genomic functions of S. indicatrix BOVIS40, which has set a foundation for future comparative studies for a better understanding of the synergism among microbes inhabiting plant endosphere. Hence, highlighting the potential of S. indicatrix BOVIS40 upon inoculation under greenhouse experiment, thus suggesting its application in enhancing plant and soil health sustainably.


Assuntos
Genoma Bacteriano , Genômica , Helianthus/fisiologia , Desenvolvimento Vegetal , Stenotrophomonas/fisiologia , Simbiose , Biologia Computacional/métodos , Endófitos , Meio Ambiente , Regulação Bacteriana da Expressão Gênica , Concentração de Íons de Hidrogênio , Anotação de Sequência Molecular , Fenótipo , Filogenia , Metabolismo Secundário/genética , Microbiologia do Solo
10.
Sci Rep ; 11(1): 12358, 2021 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-34117303

RESUMO

Novel platelet and megakaryocyte transcriptome analysis allows prediction of the full or theoretical proteome of a representative human platelet. Here, we integrated the established platelet proteomes from six cohorts of healthy subjects, encompassing 5.2 k proteins, with two novel genome-wide transcriptomes (57.8 k mRNAs). For 14.8 k protein-coding transcripts, we assigned the proteins to 21 UniProt-based classes, based on their preferential intracellular localization and presumed function. This classified transcriptome-proteome profile of platelets revealed: (i) Absence of 37.2 k genome-wide transcripts. (ii) High quantitative similarity of platelet and megakaryocyte transcriptomes (R = 0.75) for 14.8 k protein-coding genes, but not for 3.8 k RNA genes or 1.9 k pseudogenes (R = 0.43-0.54), suggesting redistribution of mRNAs upon platelet shedding from megakaryocytes. (iii) Copy numbers of 3.5 k proteins that were restricted in size by the corresponding transcript levels (iv) Near complete coverage of identified proteins in the relevant transcriptome (log2fpkm > 0.20) except for plasma-derived secretory proteins, pointing to adhesion and uptake of such proteins. (v) Underrepresentation in the identified proteome of nuclear-related, membrane and signaling proteins, as well proteins with low-level transcripts. We then constructed a prediction model, based on protein function, transcript level and (peri)nuclear localization, and calculated the achievable proteome at ~ 10 k proteins. Model validation identified 1.0 k additional proteins in the predicted classes. Network and database analysis revealed the presence of 2.4 k proteins with a possible role in thrombosis and hemostasis, and 138 proteins linked to platelet-related disorders. This genome-wide platelet transcriptome and (non)identified proteome database thus provides a scaffold for discovering the roles of unknown platelet proteins in health and disease.


Assuntos
Plaquetas/metabolismo , Doenças Hematológicas/genética , Megacariócitos/metabolismo , Proteoma/genética , Transcriptoma , Humanos , Anotação de Sequência Molecular , Proteoma/classificação , Proteoma/metabolismo
11.
Nat Commun ; 12(1): 2845, 2021 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-33990588

RESUMO

Quantifying the overall magnitude of every single locus' genetic effect on the widely measured human phenome is of great challenge. We introduce a unified modelling technique that can consistently provide a total genetic contribution assessment (TGCA) of a gene or genetic variant without thresholding genetic association signals. Genome-wide TGCA in five UK Biobank phenotype domains highlights loci such as the HLA locus for medical conditions, the bone mineral density locus WNT16 for physical measures, and the skin tanning locus MC1R and smoking behaviour locus CHRNA3 for lifestyle. Tissue-specificity investigation reveals several tissues associated with total genetic contributions, including the brain tissues for mental health. Such associations are driven by tissue-specific gene expressions, which share genetic basis with the total genetic contributions. TGCA can provide a genome-wide atlas for the overall genetic contributions in each particular domain of human complex traits.


Assuntos
Genoma Humano , Modelos Genéticos , Bancos de Espécimes Biológicos/estatística & dados numéricos , Simulação por Computador , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Anotação de Sequência Molecular/estatística & dados numéricos , Herança Multifatorial/genética , Especificidade de Órgãos/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
12.
Nucleic Acids Res ; 49(W1): W60-W66, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33963861

RESUMO

The Bologna ENZyme Web Server (BENZ WS) annotates four-level Enzyme Commission numbers (EC numbers) as defined by the International Union of Biochemistry and Molecular Biology (IUBMB). BENZ WS filters a target sequence with a combined system of Hidden Markov Models, modelling protein sequences annotated with the same molecular function, and Pfams, carrying along conserved protein domains. BENZ returns, when successful, for any enzyme target sequence an associated four-level EC number. Our system can annotate both monofunctional and polyfunctional enzymes, and it can be a valuable resource for sequence functional annotation.


Assuntos
Enzimas/química , Anotação de Sequência Molecular/métodos , Análise de Sequência de Proteína/métodos , Software , Internet , Cadeias de Markov , Domínios Proteicos , Alinhamento de Sequência
13.
PLoS Comput Biol ; 17(2): e1007948, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33600408

RESUMO

Gene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge. We have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out cross-validation, and we further validated some of the predictions in the experimental scientific literature.


Assuntos
Modelos Genéticos , Anotação de Sequência Molecular/métodos , Filogenia , Algoritmos , Animais , Teorema de Bayes , Biologia Computacional , Bases de Dados Genéticas , Evolução Molecular , Ontologia Genética/estatística & dados numéricos , Humanos , Funções Verossimilhança , Cadeias de Markov , Camundongos , Modelos Estatísticos , Anotação de Sequência Molecular/estatística & dados numéricos , Método de Monte Carlo , Família Multigênica
14.
G3 (Bethesda) ; 11(2)2021 02 09.
Artigo em Inglês | MEDLINE | ID: mdl-33604669

RESUMO

Roan antelope (Hippotragus equinus) is the second-largest member of the Hippotraginae (Bovidae), and is widely distributed across sub-Saharan mesic woodlands. Despite being listed as "Least Concern" across its African range, population numbers are decreasing with many regional Red List statuses varying between Endangered and Locally Extinct. Although the roan antelope has become an economically-important game species in Southern Africa, the vast majority of wild populations are found only in fragmented protected areas, which is of conservation concern. Genomic information is crucial in devising optimal management plans. To this end, we report here the first de novo assembly and annotation of the whole-genome sequence of a male roan antelope from a captive-breeding program. Additionally, we uncover single-nucleotide variants (SNVs) through re-sequencing of five wild individuals representing five of the six described subspecies. We used 10X Genomics Chromium chemistry to produce a draft genome of 2.56 Gb consisting of 16,880 scaffolds with N50 = 8.42 Mb and a BUSCO completeness of 91.2%. The draft roan genome includes 1.1 Gbp (42.2%) repetitive sequences. De novo annotation identified 20,518 protein-coding genes. Genome synteny to the domestic cow showed an average identity of 92.7%. Re-sequencing of five wild individuals to an average sequencing depth of 9.8x resulted in the identification of a filtered set of 3.4x106 bi-allelic SNVs. The proportion of alternative homozygous SNVs for the individuals representing different subspecies, as well as differentiation as measured by PCA, were consistent with expected divergence from the reference genome and among samples. The roan antelope genome is a valuable resource for evolutionary and population genomic questions, as well as management and conservation actions.


Assuntos
Antílopes , África do Norte , Animais , Antílopes/genética , Evolução Biológica , Genoma , Genômica , Masculino , Anotação de Sequência Molecular
15.
Int J Biol Macromol ; 167: 151-159, 2021 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-33249160

RESUMO

Poly(3-hydroxybutyrate) (PHB) is a common polyhydroxyalkanoate (PHA) with potential as an alternative for petroleum-based plastics. Previously, we reported a new strain, Halomonas sp. YLGW01, which hyperproduces PHB with 94% yield using fructose. In this study, we examined the PHB production machinery of Halomonas sp. YLGW01 in more detail by deep-genome sequencing, which revealed a 3,453,067-bp genome with 65.1% guanine-cytosine content and 3054 genes. We found two acetyl-CoA acetyltransferases (Acetoacetyl-CoA thiolase, PhaA), one acetoacetyl-CoA reductase (PhaB), two PHB synthases (PhaC1, PhaC2), PHB depolymerase (PhaZ), and Enoyl-CoA hydratase (PhaJ) in the genome, along with two fructose kinases and fructose transporter systems, including the phosphotransferase system (PTS) and ATP-binding transport genes. We then examined the PHB production by Halomonas sp. YLGW01 using high-fructose corn syrup (HFCS) containing fructose, glucose, and sucrose in sea water medium, resulting in 7.95 ± 0.11 g/L PHB (content, 67.39 ± 0.34%). PHB was recovered from Halomonas sp. YLGW01 using different detergents; the use of Tween 20 and SDS yielded micro-sized granules with high purity. Overall, these results reveal the distribution of PHB synthetic genes and the sugar utilization system in Halomonas sp. YLGW01 and suggest a possible method for PHB recovery.


Assuntos
Meios de Cultura , Fermentação , Halomonas/metabolismo , Hidroxibutiratos/metabolismo , Poliésteres/metabolismo , Açúcares/química , Açúcares/metabolismo , Biomassa , Vias Biossintéticas/genética , Biologia Computacional/métodos , Genoma Bacteriano , Halomonas/genética , Anotação de Sequência Molecular , Sequenciamento Completo do Genoma
16.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-33080021

RESUMO

Recent advances in transcriptomics have uncovered lots of novel transcripts in plants. To annotate such transcripts, dissecting their coding potential is a critical step. Computational approaches have been proven fruitful in this task; however, most current tools are designed/optimized for mammals and only a few of them have been tested on a limited number of plant species. In this work, we present NAMS webserver, which contains a novel coding potential classifier, NAMS, specifically optimized for plants. We have evaluated the performance of NAMS using a comprehensive dataset containing more than 3 million transcripts from various plant species, where NAMS demonstrates high accuracy and remarkable performance improvements over state-of-the-art software. Moreover, our webserver also furnishes functional annotations, aiming to provide users informative clues to the functions of their transcripts. Considering that most plant species are poorly characterized, our NAMS webserver could serve as a valuable resource to facilitate the transcriptomic studies. The webserver with testing dataset is freely available at http://sunlab.cpy.cuhk.edu.hk/NAMS/.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas , Internet , Anotação de Sequência Molecular/métodos , Plantas/genética , Código Genético/genética , Plantas/classificação , RNA Mensageiro/genética , RNA de Plantas/genética , Reprodutibilidade dos Testes , Especificidade da Espécie , Máquina de Vetores de Suporte
17.
Cell Host Microbe ; 29(1): 121-131.e4, 2021 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-33290720

RESUMO

Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.


Assuntos
Bactérias/genética , Genoma Bacteriano , Anotação de Sequência Molecular , Fases de Leitura Aberta , Proteínas de Bactérias/genética , Biologia Computacional , Aprendizado Profundo , Humanos , Cadeias de Markov , Microbiota , Modelos Teóricos
18.
Mol Plant Microbe Interact ; 33(8): 1022-1024, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32364420

RESUMO

The genus Stagonosporopsis is classified within the Didymellaceae family and has around 40 associated species. Among them, several species are important plant pathogens responsible for significant losses in economically important crops worldwide. Stagonosporopsis vannaccii is a newly described species pathogenic to soybean. Here, we present the draft whole-genome sequence, gene prediction, and annotation of S. vannaccii isolate LFN0148 (also known as IMI 507030). To our knowledge, this is the first genome sequenced of this species and represents a new useful source for future research on fungal comparative genomics studies.


Assuntos
Ascomicetos , Genoma Fúngico , Glycine max/microbiologia , Doenças das Plantas/microbiologia , Ascomicetos/genética , Ascomicetos/patogenicidade , Genômica , Anotação de Sequência Molecular
19.
Epigenetics Chromatin ; 13(1): 20, 2020 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-32264931

RESUMO

BACKGROUND: Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, however, require large quantities of mRNA rendering the identification of inherently unstable TUs, e.g. miRNA precursors, difficult. This problem can be alleviated by chromatin-based approaches due to a correlation between histone modifications and transcription. RESULTS: Here, we introduce EPIGENE, a novel chromatin segmentation method for the identification of active TUs using transcription-associated histone modifications. Unlike the existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate hidden Markov model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables, to identify active TUs. Our results show that EPIGENE can identify genome-wide TUs in an unbiased manner. EPIGENE-predicted TUs show an enrichment of RNA Polymerase II at the transcription start site and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE outperformed the existing RNA-seq-based approaches in TU prediction precision across human cell lines. Finally, we identified 232 novel TUs in K562 and 43 novel cell-specific TUs all of which were supported by RNA Polymerase II ChIP-seq and Nascent RNA-seq data. CONCLUSION: We demonstrate the applicability of EPIGENE to identify genome-wide active TUs and to provide valuable information about unannotated TUs. EPIGENE is an open-source method and is freely available at: https://github.com/imbbLab/EPIGENE.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/métodos , Código das Histonas , Anotação de Sequência Molecular/métodos , Software , Sítio de Iniciação de Transcrição , Epigenômica/métodos , Células Hep G2 , Humanos , Células K562 , Cadeias de Markov , Transcriptoma
20.
Microb Genom ; 6(3)2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32124724

RESUMO

Although gene-finding in bacterial genomes is relatively straightforward, the automated assignment of gene function is still challenging, resulting in a vast quantity of hypothetical sequences of unknown function. But how prevalent are hypothetical sequences across bacteria, what proportion of genes in different bacterial genomes remain unannotated, and what factors affect annotation completeness? To address these questions, we surveyed over 27 000 bacterial genomes from the Genome Taxonomy Database, and measured genome annotation completeness as a function of annotation method, taxonomy, genome size, 'research bias' and publication date. Our analysis revealed that 52 and 79 % of the average bacterial proteome could be functionally annotated based on protein and domain-based homology searches, respectively. Annotation coverage using protein homology search varied significantly from as low as 14 % in some species to as high as 98 % in others. We found that taxonomy is a major factor influencing annotation completeness, with distinct trends observed across the microbial tree (e.g. the lowest level of completeness was found in the Patescibacteria lineage). Most lineages showed a significant association between genome size and annotation incompleteness, likely reflecting a greater degree of uncharacterized sequences in 'accessory' proteomes than in 'core' proteomes. Finally, research bias, as measured by publication volume, was also an important factor influencing genome annotation completeness, with early model organisms showing high completeness levels relative to other genomes in their own taxonomic lineages. Our work highlights the disparity in annotation coverage across the bacterial tree of life and emphasizes a need for more experimental characterization of accessory proteomes as well as understudied lineages.


Assuntos
Genoma Bacteriano , Anotação de Sequência Molecular , Bactérias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA