RESUMO
Lima bean, Phaseolus lunatus, is closely related to common bean and is high in fiber and protein, with a low glycemic index. Lima bean is widely grown in the state of Delaware, where late summer and early fall weather are conducive to pod production. The same weather conditions also promote diseases such as pod rot and downy mildew, the latter of which has caused previous epidemics. A better understanding of the genes underlying resistance to this and other pathogens is needed to keep this industry thriving in the region. Our current study sought to sequence, assemble, and annotate a commercially available cultivar called Bridgeton, which could then serve as a reference genome, a basis of comparison to other Phaseolus taxa, and a resource for the identification of potential resistance genes. Combined efforts of sequencing, linkage, and comparative analysis resulted in a 623 Mb annotated assembly for lima bean, as well as a better understanding of an evolutionarily dynamic resistance locus in legumes.
Assuntos
Phaseolus , Ligação Genética , Phaseolus/genéticaRESUMO
BACKGROUND: Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method. RESULTS: A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing. CONCLUSIONS: C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.
Assuntos
Testes Genéticos/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , HumanosRESUMO
Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments - thermoalignments - across the genome to identify primers predicted to bind specifically to the target site. For amplification-based resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (≈85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.
Assuntos
Primers do DNA/metabolismo , Genoma de Planta , Hibridização de Ácido Nucleico/métodos , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/métodos , Zea mays/genética , Sequência de Bases , Primers do DNA/síntese química , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência , Software , TermodinâmicaRESUMO
Viral and bacterial pathogens are a significant economic concern to the US broiler industry and the ecological epicenter for poultry pathogens is the mixture of bedding material, chicken excrement and feathers that comprises the litter of a poultry house. This study used high-throughput sequencing to assess the richness and diversity of poultry litter bacterial communities, and to look for connections between these communities and the environmental characteristics of a poultry house including its history of gangrenous dermatitis (GD). Cluster analysis of 16S rRNA gene sequences revealed differences in the distribution of bacterial phylotypes between Wet and Dry litter samples and between houses. Wet litter contained greater diversity with 90% of total bacterial abundance occurring within the top 214 OTU clusters. In contrast, only 50 clusters accounted for 90% of Dry litter bacterial abundance. The sixth largest OTU cluster across all samples classified as an Arcobacter sp., an emerging human pathogen, occurring in only the Wet litter samples of a house with a modern evaporative cooling system. Ironically, the primary pathogenic clostridial and staphylococcal species associated with GD were not found in any house; however, there were thirteen 16S rRNA gene phylotypes of mostly gram-positive phyla that were unique to GD-affected houses and primarily occurred in Wet litter samples. Overall, the poultry house environment appeared to substantially impact the composition of litter bacterial communities and may play a key role in the emergence of food-borne pathogens.