Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39251347

RESUMO

Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess of 35.1 Mb of TE sequences per genome in tropical maize relative to temperate maize. A small number (n = 216) of TE families, mainly LTR retrotransposons, drive these differences. Evidence from the methylome, transcriptome, LTR age distribution, and LTR insertional polymorphisms reveals that 64.7% of the variability is contributed by LTR families that are young, less methylated, and more expressed in tropical maize, whereas 18.5% is driven by LTR families with removal or loss in temperate maize. Additionally, we find enrichment for Young LTR families adjacent to nucleotide-binding and leucine-rich repeat (NLR) clusters of varying copy number across lines, suggesting TE activity may be associated with disease resistance in maize.

3.
Theor Appl Genet ; 137(5): 117, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38700534

RESUMO

KEY MESSAGE: A large-effect QTL was fine mapped, which revealed 79 gene models, with 10 promising candidate genes, along with a novel inversion. In commercial maize breeding, doubled haploid (DH) technology is arguably the most efficient resource for rapidly developing novel, completely homozygous lines. However, the DH strategy, using in vivo haploid induction, currently requires the use of mutagenic agents which can be not only hazardous, but laborious. This study focuses on an alternative approach to develop DH lines-spontaneous haploid genome duplication (SHGD) via naturally restored haploid male fertility (HMF). Inbred lines A427 and Wf9, the former with high HMF and the latter with low HMF, were selected to fine-map a large-effect QTL associated with SHGD-qshgd1. SHGD alleles were derived from A427, with novel haploid recombinant groups having varying levels of the A427 chromosomal region recovered. The chromosomal region of interest is composed of 45 megabases (Mb) of genetic information on chromosome 5. Significant differences between haploid recombinant groups for HMF were identified, signaling the possibility of mapping the QTL more closely. Due to suppression of recombination from the proximity of the centromere, and a newly discovered inversion region, the associated QTL was only confined to a 25 Mb region, within which only a single recombinant was observed among ca. 9,000 BC1 individuals. Nevertheless, 79 gene models were identified within this 25 Mb region. Additionally, 10 promising candidate genes, based on RNA-seq data, are described for future evaluation, while the narrowed down genome region is accessible for straightforward introgression into elite germplasm by BC methods.


Assuntos
Mapeamento Cromossômico , Haploidia , Locos de Características Quantitativas , Zea mays , Zea mays/genética , Mapeamento Cromossômico/métodos , Melhoramento Vegetal , Genoma de Planta , Fenótipo , Alelos , Cromossomos de Plantas/genética , Genes de Plantas
4.
Biol Reprod ; 110(2): 310-328, 2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-37883444

RESUMO

The fetal brain of the mouse is thought to be dependent upon the placenta as a source of serotonin (5-hydroxytryptamine; 5-HT) and other factors. How factors reach the developing brain remains uncertain but are postulated here to be part of the cargo carried by placental extracellular vesicles (EV). We have analyzed the protein, catecholamine, and small RNA content of EV from mouse trophoblast stem cells (TSC) and TSC differentiated into parietal trophoblast giant cells (pTGC), potential primary purveyors of 5-HT. Current studies examined how exposure of mouse neural progenitor cells (NPC) to EV from either TSC or pTGC affect their transcriptome profiles. The EV from trophoblast cells contained relatively high amounts of 5-HT, as well as dopamine and norepinephrine, but there were no significant differences between EV derived from pTGC and from TSC. Content of miRNA and small nucleolar (sno)RNA, however, did differ according to EV source, and snoRNA were upregulated in EV from pTGC. The primary inferred targets of the microRNA (miRNA) from both pTGC and TSC were mRNA enriched in the fetal brain. NPC readily internalized EV, leading to changes in their transcriptome profiles. Transcripts regulated were mainly ones enriched in neural tissues. The transcripts in EV-treated NPC that demonstrated a likely complementarity with miRNA in EV were mainly up- rather than downregulated, with functions linked to neuronal processes. Our results are consistent with placenta-derived EV providing direct support for fetal brain development and being an integral part of the placenta-brain axis.


Assuntos
Vesículas Extracelulares , MicroRNAs , Humanos , Gravidez , Feminino , Animais , Camundongos , Serotonina/metabolismo , Placenta/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Vesículas Extracelulares/metabolismo , Encéfalo/metabolismo , Trofoblastos/metabolismo , Células-Tronco/metabolismo
5.
Genome Biol ; 24(1): 108, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-37158941

RESUMO

BACKGROUND: Genetic variation in regulatory sequences that alter transcription factor (TF) binding is a major cause of phenotypic diversity. Brassinosteroid is a growth hormone that has major effects on plant phenotypes. Genetic variation in brassinosteroid-responsive cis-elements likely contributes to trait variation. Pinpointing such regulatory variations and quantitative genomic analysis of the variation in TF-target binding, however, remains challenging. How variation in transcriptional targets of signaling pathways such as the brassinosteroid pathway contributes to phenotypic variation is an important question to be investigated with innovative approaches. RESULTS: Here, we use a hybrid allele-specific chromatin binding sequencing (HASCh-seq) approach and identify variations in target binding of the brassinosteroid-responsive TF ZmBZR1 in maize. HASCh-seq in the B73xMo17 F1s identifies thousands of target genes of ZmBZR1. Allele-specific ZmBZR1 binding (ASB) has been observed for 18.3% of target genes and is enriched in promoter and enhancer regions. About a quarter of the ASB sites correlate with sequence variation in BZR1-binding motifs and another quarter correlate with haplotype-specific DNA methylation, suggesting that both genetic and epigenetic variations contribute to the high level of variation in ZmBZR1 occupancy. Comparison with GWAS data shows linkage of hundreds of ASB loci to important yield and disease-related traits. CONCLUSION: Our study provides a robust method for analyzing genome-wide variations of TF occupancy and identifies genetic and epigenetic variations of the brassinosteroid response transcription network in maize.


Assuntos
Brassinosteroides , Zea mays , Zea mays/genética , Alelos , Sequenciamento de Cromatina por Imunoprecipitação , Fenótipo , Fatores de Transcrição/genética
6.
G3 (Bethesda) ; 13(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37002915

RESUMO

Poa pratensis, commonly known as Kentucky bluegrass, is a popular cool-season grass species used as turf in lawns and recreation areas globally. Despite its substantial economic value, a reference genome had not previously been assembled due to the genome's relatively large size and biological complexity that includes apomixis, polyploidy, and interspecific hybridization. We report here a fortuitous de novo assembly and annotation of a P. pratensis genome. Instead of sequencing the genome of a C4 grass, we accidentally sampled and sequenced tissue from a weedy P. pratensis whose stolon was intertwined with that of the C4 grass. The draft assembly consists of 6.09 Gbp with an N50 scaffold length of 65.1 Mbp, and a total of 118 scaffolds, generated using PacBio long reads and Bionano optical map technology. We annotated 256K gene models and found 58% of the genome to be composed of transposable elements. To demonstrate the applicability of the reference genome, we evaluated population structure and estimated genetic diversity in P. pratensis collected from three North American prairies, two in Manitoba, Canada and one in Colorado, USA. Our results support previous studies that found high genetic diversity and population structure within the species. The reference genome and annotation will be an important resource for turfgrass breeding and study of bluegrasses.


Assuntos
Melhoramento Vegetal , Poa , Genoma , Poa/genética , Plantas Daninhas/genética , Sequência de Bases , Anotação de Sequência Molecular
7.
Stem Cell Reports ; 17(6): 1289-1302, 2022 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-35594861

RESUMO

The observation that trophoblast (TB) can be generated from primed pluripotent stem cells (PSCs) by exposure to bone morphogenetic protein-4 (BMP4) when FGF2 and ACTIVIN signaling is minimized has recently been challenged with the suggestion that the procedure instead produces amnion. Here, by analyzing transcriptome data from multiple sources, including bulk and single-cell data, we show that the BMP4 procedure generates bona fide TB with similarities to both placental villous TB and TB generated from TB stem cells. The analyses also suggest that the transcriptomic signatures between embryonic amnion and different forms of TB have commonalities. Our data provide justification for the continued use of TB derived from PSCs as a model for investigating placental development.


Assuntos
Células-Tronco Pluripotentes , Trofoblastos , Âmnio , Diferenciação Celular , Células-Tronco Embrionárias , Feminino , Humanos , Placenta , Gravidez
8.
J Vasc Access ; : 11297298221080369, 2022 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-35220831

RESUMO

BACKGROUND: Peripheral intravenous catheters (PIVCs) are frequently used in clinical settings for intravenous access. Multiple attempts of PIVC insertions leads to patient discomfort, delay in treatment, associated complications, and extensive expenditure cost. Reduced number of attempts causes patient/nursing personnel satisfaction and expenditure costs. The present study evaluated performance efficacy of BD Venflon™ I with Instaflash needle technology (investigational device) as compared to the BD Venflon™ without Instaflash needle technology (control device). METHODOLOGY: The PIVC insertions were randomized in the ratio 1:1 using either investigational or control device and were monitored for first stick success rate, ease of insertion, and patient satisfaction. Data was analyzed using R 4.0.3 and Microsoft Excel. Chi square test was used to establish association between two categorical variables. RESULTS: In total, 1402 patients were analyzed for first attempt insertion success which showed 98.72% success rate in investigational device as compared to 88.87% success rate in case of the control device (p = 0.0004). Marginal differences were observed in ease of insertion in investigational (98.71%) and control devices (99%) signifying high satisfaction levels of nursing personnels. Positive responses were observed in investigational (98.01%) and control devices (99%) underlining satisfactory performances of overall patient experiences. CONCLUSION: The present study showed that BD Venflon™ I with Instaflash needle technology enhanced first attempt insertion success rate along with marginal differences in its efficacy in comparison with the BD Venflon™ without Instaflash needle technology thus enhancing patient and nursing personnel satisfaction in turn making it a better alternative to be used in hospitals.

9.
Nucleic Acids Res ; 50(7): e37, 2022 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-34928390

RESUMO

Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.


Assuntos
Arabidopsis , Oryza , Arabidopsis/genética , Genoma , Oryza/genética , RNA-Seq , Software
10.
Front Plant Sci ; 12: 710383, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34671369

RESUMO

In this work, we sequenced and annotated the genome of Streptochaeta angustifolia, one of two genera in the grass subfamily Anomochlooideae, a lineage sister to all other grasses. The final assembly size is over 99% of the estimated genome size. We find good collinearity with the rice genome and have captured most of the gene space. Streptochaeta is similar to other grasses in the structure of its fruit (a caryopsis or grain) but has peculiar flowers and inflorescences that are distinct from those in the outgroups and in other grasses. To provide tools for investigations of floral structure, we analyzed two large families of transcription factors, AP2-like and R2R3 MYBs, that are known to control floral and spikelet development in rice and maize among other grasses. Many of these are also regulated by small RNAs. Structure of the gene trees showed that the well documented whole genome duplication at the origin of the grasses (ρ) occurred before the divergence of the Anomochlooideae lineage from the lineage leading to the rest of the grasses (the spikelet clade) and thus that the common ancestor of all grasses probably had two copies of the developmental genes. However, Streptochaeta (and by inference other members of Anomochlooideae) has lost one copy of many genes. The peculiar floral morphology of Streptochaeta may thus have derived from an ancestral plant that was morphologically similar to the spikelet-bearing grasses. We further identify 114 loci producing microRNAs and 89 loci generating phased, secondary siRNAs, classes of small RNAs known to be influential in transcriptional and post-transcriptional regulation of several plant functions.

11.
Front Cell Dev Biol ; 9: 695248, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34368143

RESUMO

One model to study the emergence of the human trophoblast (TB) has been the exposure of pluripotent stem cells to bone morphogenetic protein 4 (BMP4) in presence of inhibitors of ACTIVIN/TGFB; A83-01 and FGF2; PD173074 (BAP), which generates a mixture of cytotrophoblast, syncytiotrophoblast, and cells with similarities to extravillous trophoblast. Here, H1 human embryonic stem cells were BAP-exposed under two O2 conditions (20% and 5%, respectively). At day 8, single nuclei RNA sequencing was used for transcriptomics analysis, thereby allowing profiling of fragile syncytial structures as well as the more resilient mononucleated cells. Following cluster analysis, two major groupings, one comprised of five (2,4,6,7,8) and the second of three (1,3,5) clusters were evident, all of which displayed recognized TB markers. Of these, two (2 and 3) weakly resembled extravillous trophoblast, two (5 and 6) strongly carried the hallmark transcripts of syncytiotrophoblast, while the remaining five were likely different kinds of mononucleated cytotrophoblast. We suggest that the two populations of nuclei within syncytiotrophoblast may have arisen from fusion events involving two distinct species of precursor cells. The number of differentially expressed genes between O2 conditions varied among the clusters, and the number of genes upregulated in cells cultured under 5% O2 was highest in syncytiotrophoblast cluster 6. In summary, the BAP model reveals an unexpectedly complex picture of trophoblast lineage emergence that will need to be resolved further in time-course studies.

12.
Science ; 373(6555): 655-662, 2021 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-34353948

RESUMO

We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.


Assuntos
Genoma de Planta , Anotação de Sequência Molecular , Zea mays/genética , Centrômero/genética , Mapeamento Cromossômico , Cromossomos de Plantas , Metilação de DNA , Resistência à Doença/genética , Genes de Plantas , Variação Genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Herança Multifatorial/genética , Fenótipo , Doenças das Plantas , Polimorfismo de Nucleotídeo Único , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA , Tetraploidia , Transcriptoma , Sequenciamento Completo do Genoma
13.
Plant Genome ; 14(3): e20114, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34275202

RESUMO

The stiff-stalk heterotic group in Maize (Zea mays L.) is an important source of inbreds used in U.S. commercial hybrid production. Founder inbreds B14, B37, B73, and, to a lesser extent, B84, are found in the pedigrees of a majority of commercial seed parent inbred lines. We created high-quality genome assemblies of B84 and four expired Plant Variety Protection (ex-PVP) lines LH145 representing B14, NKH8431 of mixed descent, PHB47 representing B37, and PHJ40, which is a Pioneer Hi-Bred International (PHI) early stiff-stalk type. Sequence was generated using long-read sequencing achieving highly contiguous assemblies of 2.13-2.18 Gbp with N50 scaffold lengths >200 Mbp. Inbred-specific gene annotations were generated using a core five-tissue gene expression atlas, whereas transposable element (TE) annotation was conducted using de novo and homology-directed methodologies. Compared with the reference inbred B73, synteny analyses revealed extensive collinearity across the five stiff-stalk genomes, although unique components of the maize pangenome were detected. Comparison of this set of stiff-stalk inbreds with the original Iowa Stiff Stalk Synthetic breeding population revealed that these inbreds represent only a proportion of variation in the original stiff-stalk pool and there are highly conserved haplotypes in released public and ex-Plant Variety Protection inbreds. Despite the reduction in variation from the original stiff-stalk population, substantial genetic and genomic variation was identified supporting the potential for continued breeding success in this pool. The assemblies described here represent stiff-stalk inbreds that have historical and commercial relevance and provide further insight into the emerging maize pangenome.


Assuntos
Melhoramento Vegetal , Zea mays , Genômica , Haplótipos , Vigor Híbrido , Zea mays/genética
14.
NAR Genom Bioinform ; 3(2): lqab049, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34085037

RESUMO

The availability of terabytes of RNA-Seq data and continuous emergence of new analysis tools, enable unprecedented biological insight. There is a pressing requirement for a framework that allows for fast, efficient, manageable, and reproducible RNA-Seq analysis. We have developed a Python package, (pyrpipe), that enables straightforward development of flexible, reproducible and easy-to-debug computational pipelines purely in Python, in an object-oriented manner. pyrpipe provides access to popular RNA-Seq tools, within Python, via high-level APIs. Pipelines can be customized by integrating new Python code, third-party programs, or Python libraries. Users can create checkpoints in the pipeline or integrate pyrpipe into a workflow management system, thus allowing execution on multiple computing environments, and enabling efficient resource management. pyrpipe produces detailed analysis, and benchmark reports which can be shared or included in publications. pyrpipe is implemented in Python and is compatible with Python versions 3.6 and higher. To illustrate the rich functionality of pyrpipe, we provide case studies using RNA-Seq data from GTEx, SARS-CoV-2-infected human cells, and Zea mays. All source code is freely available at https://github.com/urmi-21/pyrpipe; the package can be installed from the source, from PyPI (https://pypi.org/project/pyrpipe), or from bioconda (https://anaconda.org/bioconda/pyrpipe). Documentation is available at (http://pyrpipe.rtfd.io).

15.
Nucleic Acids Res ; 49(7): 4037-4053, 2021 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-33744974

RESUMO

Cas9 is an RNA-guided endonuclease in the bacterial CRISPR-Cas immune system and a popular tool for genome editing. The commonly used Streptococcus pyogenes Cas9 (SpCas9) is relatively non-specific and prone to off-target genome editing. Other Cas9 orthologs and engineered variants of SpCas9 have been reported to be more specific. However, previous studies have focused on specificity of double-strand break (DSB) or indel formation, potentially overlooking alternative cleavage activities of these Cas9 variants. In this study, we employed in vitro cleavage assays of target libraries coupled with high-throughput sequencing to systematically compare cleavage activities and specificities of two natural Cas9 variants (SpCas9 and Staphylococcus aureus Cas9) and three engineered SpCas9 variants (SpCas9 HF1, HypaCas9 and HiFi Cas9). We observed that all Cas9s tested could cleave target sequences with up to five mismatches. However, the rate of cleavage of both on-target and off-target sequences varied based on target sequence and Cas9 variant. In addition, SaCas9 and engineered SpCas9 variants nick targets with multiple mismatches but have a defect in generating a DSB, while SpCas9 creates DSBs at these targets. Overall, these differences in cleavage rates and DSB formation may contribute to varied specificities observed in genome editing studies.


Assuntos
Proteína 9 Associada à CRISPR , Sistemas CRISPR-Cas , Staphylococcus aureus/genética , Proteína 9 Associada à CRISPR/genética , Proteína 9 Associada à CRISPR/metabolismo , Edição de Genes , Especificidade por Substrato
16.
Commun Biol ; 4(1): 253, 2021 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-33637860

RESUMO

While it is well known that the genome can affect social behavior, recent models posit that social lifestyles can, in turn, influence genome evolution. Here, we perform the most phylogenetically comprehensive comparative analysis of 16 bee genomes to date: incorporating two published and four new carpenter bee genomes (Apidae: Xylocopinae) for a first-ever genomic comparison with a monophyletic clade containing solitary through advanced eusocial taxa. We find that eusocial lineages have undergone more gene family expansions, feature more signatures of positive selection, and have higher counts of taxonomically restricted genes than solitary and weakly social lineages. Transcriptomic data reveal that caste-affiliated genes are deeply-conserved; gene regulatory and functional elements are more closely tied to social phenotype than phylogenetic lineage; and regulatory complexity increases steadily with social complexity. Overall, our study provides robust empirical evidence that social evolution can act as a major and surprisingly consistent driver of macroevolutionary genomic change.


Assuntos
Abelhas/genética , Comportamento Animal , Evolução Molecular , Genes de Insetos , Genoma de Inseto , Comportamento Social , Animais , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Interação Gene-Ambiente , Genômica , Filogenia , Especificidade da Espécie , Transcriptoma
17.
BMC Bioinformatics ; 21(1): 429, 2020 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-33004007

RESUMO

BACKGROUND: PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. RESULTS: Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. CONCLUSIONS: SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools .


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Arabidopsis/genética , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala/normas , Controle de Qualidade
18.
Nat Commun ; 11(1): 2288, 2020 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-32385271

RESUMO

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Endogamia , Zea mays/genética , Sequência de Bases , Elementos de DNA Transponíveis/genética , Genoma de Planta , Sequências Repetitivas de Ácido Nucleico/genética
19.
Genome Biol ; 21(1): 121, 2020 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32434565

RESUMO

Creating gapless telomere-to-telomere assemblies of complex genomes is one of the ultimate challenges in genomics. We use two independent assemblies and an optical map-based merging pipeline to produce a maize genome (B73-Ab10) composed of 63 contigs and a contig N50 of 162 Mb. This genome includes gapless assemblies of chromosome 3 (236 Mb) and chromosome 9 (162 Mb), and 53 Mb of the Ab10 meiotic drive haplotype. The data also reveal the internal structure of seven centromeres and five heterochromatic knobs, showing that the major tandem repeat arrays (CentC, knob180, and TR-1) are discontinuous and frequently interspersed with retroelements.


Assuntos
Cromossomos de Plantas , Genoma de Planta , Genômica/métodos , Mapeamento Físico do Cromossomo/métodos , Zea mays/genética
20.
BMC Genomics ; 21(1): 193, 2020 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-32122303

RESUMO

BACKGROUND: Genome assemblies are foundational for understanding the biology of a species. They provide a physical framework for mapping additional sequences, thereby enabling characterization of, for example, genomic diversity and differences in gene expression across individuals and tissue types. Quality metrics for genome assemblies gauge both the completeness and contiguity of an assembly and help provide confidence in downstream biological insights. To compare quality across multiple assemblies, a set of common metrics are typically calculated and then compared to one or more gold standard reference genomes. While several tools exist for calculating individual metrics, applications providing comprehensive evaluations of multiple assembly features are, perhaps surprisingly, lacking. Here, we describe a new toolkit that integrates multiple metrics to characterize both assembly and gene annotation quality in a way that enables comparison across multiple assemblies and assembly types. RESULTS: Our application, named GenomeQC, is an easy-to-use and interactive web framework that integrates various quantitative measures to characterize genome assemblies and annotations. GenomeQC provides researchers with a comprehensive summary of these statistics and allows for benchmarking against gold standard reference assemblies. CONCLUSIONS: The GenomeQC web application is implemented in R/Shiny version 1.5.9 and Python 3.6 and is freely available at https://genomeqc.maizegdb.org/ under the GPL license. All source code and a containerized version of the GenomeQC pipeline is available in the GitHub repository https://github.com/HuffordLab/GenomeQC.


Assuntos
Genômica/métodos , Mapeamento Cromossômico , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA