RESUMO
Mutant populations are crucial for functional genomics and discovering novel traits for crop breeding. Sorghum, a drought and heat-tolerant C4 species, requires a vast, large-scale, annotated, and sequenced mutant resource to enhance crop improvement through functional genomics research. Here, we report a sorghum large-scale sequenced mutant population with 9.5 million ethyl methane sulfonate (EMS)-induced mutations that covered 98% of sorghum's annotated genes using inbred line BTx623. Remarkably, a total of 610 320 mutations within the promoter and enhancer regions of 18 000 and 11 790 genes, respectively, can be leveraged for novel research of cis-regulatory elements. A comparison of the distribution of mutations in the large-scale mutant library and sorghum association panel (SAP) provides insights into the influence of selection. EMS-induced mutations appeared to be random across different regions of the genome without significant enrichment in different sections of a gene, including the 5' UTR, gene body, and 3'-UTR. In contrast, there were low variation density in the coding and UTR regions in the SAP. Based on the Ka /Ks value, the mutant library (~1) experienced little selection, unlike the SAP (0.40), which has been strongly selected through breeding. All mutation data are publicly searchable through SorbMutDB (https://www.depts.ttu.edu/igcast/sorbmutdb.php) and SorghumBase (https://sorghumbase.org/). This current large-scale sequence-indexed sorghum mutant population is a crucial resource that enriched the sorghum gene pool with novel diversity and a highly valuable tool for the Poaceae family, that will advance plant biology research and crop breeding.
Assuntos
Sorghum , Sorghum/genética , Genética Reversa , Melhoramento Vegetal , Mutação , Fenótipo , Grão Comestível/genética , Metanossulfonato de Etila/farmacologia , Genoma de Planta/genéticaRESUMO
BACKGROUND: Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable. RESULTS: Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a "subpopulation aware" 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq). CONCLUSIONS: This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment.
Assuntos
Genoma de Planta , Polimorfismo de Nucleotídeo Único , Fluxo de Trabalho , Melhoramento Vegetal , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Genômica/métodos , Proteínas de Plantas/genética , Plantas/genética , Produtos Agrícolas , Elementos de DNA Transponíveis , Duplicação Gênica , Ontologia Genética , Redes Reguladoras de Genes , Internet , Bases de Conhecimento , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , Plantas/classificação , Plantas/metabolismo , Poliploidia , Mapeamento de Interação de Proteínas , Software , Zea mays/genética , Zea mays/metabolismoRESUMO
MAIN CONCLUSION: SorghumBase provides a community portal that integrates genetic, genomic, and breeding resources for sorghum germplasm improvement. Public research and development in agriculture rely on proper data and resource sharing within stakeholder communities. For plant breeders, agronomists, molecular biologists, geneticists, and bioinformaticians, centralizing desirable data into a user-friendly hub for crop systems is essential for successful collaborations and breakthroughs in germplasm development. Here, we present the SorghumBase web portal ( https://www.sorghumbase.org ), a resource for the sorghum research community. SorghumBase hosts a wide range of sorghum genomic information in a modular framework, built with open-source software, to provide a sustainable platform. This initial release of SorghumBase includes: (1) five sorghum reference genome assemblies in a pan-genome browser; (2) genetic variant information for natural diversity panels and ethyl methanesulfonate (EMS)-induced mutant populations; (3) search interface and integrated views of various data types; (4) links supporting interconnectivity with other repositories including genebank, QTL, and gene expression databases; and (5) a content management system to support access to community news and training materials. SorghumBase offers sorghum investigators improved data collation and access that will facilitate the growth of a robust research community to support genomics-assisted breeding.
Assuntos
Sorghum , Bases de Dados Genéticas , Grão Comestível , Genoma de Planta/genética , Genômica , Internet , Melhoramento Vegetal , Sorghum/genéticaRESUMO
Plant Reactome (https://plantreactome.gramene.org) is an open-source, comparative plant pathway knowledgebase of the Gramene project. It uses Oryza sativa (rice) as a reference species for manual curation of pathways and extends pathway knowledge to another 82 plant species via gene-orthology projection using the Reactome data model and framework. It currently hosts 298 reference pathways, including metabolic and transport pathways, transcriptional networks, hormone signaling pathways, and plant developmental processes. In addition to browsing plant pathways, users can upload and analyze their omics data, such as the gene-expression data, and overlay curated or experimental gene-gene interaction data to extend pathway knowledge. The curation team actively engages researchers and students on gene and pathway curation by offering workshops and online tutorials. The Plant Reactome supports, implements and collaborates with the wider community to make data and tools related to genes, genomes, and pathways Findable, Accessible, Interoperable and Re-usable (FAIR).
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica , Metabolômica , Plantas/genética , Plantas/metabolismo , Proteômica , Redes Reguladoras de Genes , Genômica/métodos , Humanos , Redes e Vias Metabólicas , Metabolômica/métodos , Proteômica/métodos , Transdução de Sinais , NavegadorRESUMO
Heterophile antibody assays have been used to aid the diagnosis of infectious mononucleosis caused by the Epstein-Barr virus. Seven commercially available assays currently widely utilized in clinical laboratories were compared in this study. Variable performance characteristics and assay times are observed, and these pieces of data may assist clinical laboratories in assay selection and result interpretation.
Assuntos
Anticorpos Heterófilos/sangue , Anticorpos Antivirais/sangue , Técnicas de Laboratório Clínico/normas , Infecções por Vírus Epstein-Barr/diagnóstico , Mononucleose Infecciosa/diagnóstico , Mononucleose Infecciosa/imunologia , Kit de Reagentes para Diagnóstico/normas , Adolescente , Anticorpos Heterófilos/imunologia , Criança , Técnicas de Laboratório Clínico/métodos , Infecções por Vírus Epstein-Barr/sangue , Humanos , Imunoglobulina M/sangue , Mononucleose Infecciosa/sangue , Adulto JovemRESUMO
Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genômica/métodos , Bases de Conhecimento , Plantas/genética , Epigênese Genética , Ontologia Genética , Pesquisa em Genética , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/metabolismo , Software , Interface Usuário-ComputadorRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.
Assuntos
Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Bases de Dados de Proteínas , Eucariotos/genética , Genômica , Sequência de Aminoácidos , Animais , Sequência de Bases , Mineração de Dados , Previsões , Genoma , Anotação de Sequência Molecular , RNA/genética , Interface Usuário-ComputadorRESUMO
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to â¼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Plantas/metabolismo , Expressão Gênica , Variação Genética , Genômica , Internet , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Plantas/genéticaRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animais , Diploide , Eucariotos/genética , Variação Genética , Genoma , Poliploidia , Alinhamento de SequênciaRESUMO
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Genômica , Produtos Agrícolas/genética , Variação Genética , Internet , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/genética , Plantas/metabolismoRESUMO
INTRODUCTION: Epilepsy is a neurological disorder characterized by the predisposition for recurrent unprovoked seizures. It can broadly be classified as focal, generalized, unclassified, and unknown in its onset. Focal epilepsy originates in and involves networks localized to one region of the brain. Generalized epilepsy engages broader, more diffuse networks. The etiology of epilepsy can be structural, genetic, infectious, metabolic, immune, or unknown. Many generalized epilepsies have presumed genetic etiologies. The aim of this study is to compare the role of genetic testing to brain MRI as diagnostic tools for identifying the underlying causes of idiopathic (genetic) generalized epilepsy (IGE). METHODS: We evaluated the diagnostic yield of these two categories in children diagnosed with IGE. Data collection was completed using ICD10 codes filtered by TriNetX to select 982 individual electronic medical records (EMRs) of children in the Penn State Children's Hospital who received a diagnosis of IGE. The diagnosis was confirmed after reviewing the clinical history and electroencephalogram (EEG) data for each patient. RESULTS: From this dataset, neuroimaging and genetic testing results were gathered. A retrospective chart review was done on 982 children with epilepsy, of which 143 (14.5%) met the criteria for IGE. Only 18 patients underwent genetic testing. Abnormalities that could be a potential cause for epilepsy were seen in 72.2% (13/18) of patients with IGE and abnormal genetic testing, compared to 30% (37/123) for patients who had a brain MRI with genetic testing. CONCLUSION: This study suggests that genetic testing may be more useful than neuroimaging for identifying an etiological diagnosis of pediatric patients with IGE.
RESUMO
Now in its 10th year, the Gramene database (http://www.gramene.org) has grown from its primary focus on rice, the first fully-sequenced grass genome, to become a resource for major model and crop plants including Arabidopsis, Brachypodium, maize, sorghum, poplar and grape in addition to several species of rice. Gramene began with the addition of an Ensembl genome browser and has expanded in the last decade to become a robust resource for plant genomics hosting a wide array of data sets including quantitative trait loci (QTL), metabolic pathways, genetic diversity, genes, proteins, germplasm, literature, ontologies and a fully-structured markers and sequences database integrated with genome browsers and maps from various published studies (genetic, physical, bin, etc.). In addition, Gramene now hosts a variety of web services including a Distributed Annotation Server (DAS), BLAST and a public MySQL database. Twice a year, Gramene releases a major build of the database and makes interim releases to correct errors or to make important updates to software and/or data.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Plantas/genética , Mapeamento Cromossômico , Genes de Plantas , Variação Genética , Genômica , Redes e Vias Metabólicas , Plantas/metabolismo , Locos de Características Quantitativas , SinteniaRESUMO
After the completion of a draft human genome sequence, the International Human Genome Sequencing Consortium has proceeded to finish and annotate each of the 24 chromosomes comprising the human genome. Here we describe the sequencing and analysis of human chromosome 3, one of the largest human chromosomes. Chromosome 3 comprises just four contigs, one of which currently represents the longest unbroken stretch of finished DNA sequence known so far. The chromosome is remarkable in having the lowest rate of segmental duplication in the genome. It also includes a chemokine receptor gene cluster as well as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion that occurred some time after the split of Homininae from Ponginae, and propose an evolutionary history of the inversion.
Assuntos
Cromossomos Humanos Par 3/genética , Animais , Sequência de Bases , Quebra Cromossômica/genética , Inversão Cromossômica/genética , Mapeamento de Sequências Contíguas , Ilhas de CpG/genética , DNA Complementar/genética , Evolução Molecular , Etiquetas de Sequências Expressas , Projeto Genoma Humano , Humanos , Macaca mulatta/genética , Dados de Sequência Molecular , Pan troglodytes/genética , Análise de Sequência de DNA , Sintenia/genéticaRESUMO
We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
Assuntos
Genoma de Planta , Anotação de Sequência Molecular , Zea mays/genética , Centrômero/genética , Mapeamento Cromossômico , Cromossomos de Plantas , Metilação de DNA , Resistência à Doença/genética , Genes de Plantas , Variação Genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Herança Multifatorial/genética , Fenótipo , Doenças das Plantas , Polimorfismo de Nucleotídeo Único , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA , Tetraploidia , Transcriptoma , Sequenciamento Completo do GenomaRESUMO
Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Endogamia , Zea mays/genética , Sequência de Bases , Elementos de DNA Transponíveis/genética , Genoma de Planta , Sequências Repetitivas de Ácido Nucleico/genéticaRESUMO
The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.
Assuntos
Elementos de DNA Transponíveis/genética , Genes de Plantas/genética , Genoma de Planta/genética , Zea mays/genética , Cromatina/genética , Cromossomos de Plantas/genética , Variações do Número de Cópias de DNA/genética , Metilação de DNA/genética , DNA de Plantas/genética , Genômica/métodos , Fases de Leitura Aberta/genética , Análise de Sequência de DNA/métodosRESUMO
This article was not made open access when initially published online, which was corrected before print publication. In addition, ORCID links were missing for 12 authors and have been added to the HTML and PDF versions of the article.
RESUMO
The genus Oryza is a model system for the study of molecular evolution over time scales ranging from a few thousand to 15 million years. Using 13 reference genomes spanning the Oryza species tree, we show that despite few large-scale chromosomal rearrangements rapid species diversification is mirrored by lineage-specific emergence and turnover of many novel elements, including transposons, and potential new coding and noncoding genes. Our study resolves controversial areas of the Oryza phylogeny, showing a complex history of introgression among different chromosomes in the young 'AA' subclade containing the two domesticated species. This study highlights the prevalence of functionally coupled disease resistance genes and identifies many new haplotypes of potential use for future crop protection. Finally, this study marks a milestone in modern rice research with the release of a complete long-read assembly of IR 8 'Miracle Rice', which relieved famine and drove the Green Revolution in Asia 50 years ago.
Assuntos
Produtos Agrícolas/genética , Evolução Molecular , Variação Genética , Oryza/classificação , Oryza/genética , Sequência Conservada , Domesticação , Especiação Genética , Genoma de Planta , FilogeniaRESUMO
Identification of single nucleotide polymorphisms (SNPs) and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool), and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozygosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency) in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov).