RESUMO
Massively parallel sequencing (MPS) is gaining attention as a new technology for routine forensic casework, including paternity testing. Recently released MPS multiplex panels provide many more loci compared to CE methods, plus provide sequence-based alleles that together improve the statistical power of the genetic testing. Here, an MPS system (PowerSeq™ AUTO/Y) was applied for STR sequencing in the study of first-degree STR sequence allele inheritance from families in Southern Brazil. In 29 trios (mother-child-father) analyzed, the paternity index values generally increased when data from sequence-based analysis were used in comparison to length-based data. Further, allele inconsistencies (e.g., single repeat mutation events) between child and parents could be resolved with MPS by assessing the core repeat and flanking region sequences. Lastly, the sequence information allowed for identification of isoalleles (alleles of the same size, but different sequence) to determine specific paternal and maternal inheritances. The results from this study showed advantages of implementing sequence-based analysis, MPS, in paternity testing with improved statistical calculations and a greater resolution for the trios/families tested.
Assuntos
DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Repetições de Microssatélites , Paternidade , Alelos , Brasil , Impressões Digitais de DNA/métodos , Feminino , Humanos , Masculino , Análise de Sequência de DNA/métodosRESUMO
BACKGROUND: A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. RESULTS: We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. CONCLUSIONS: The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing.
Assuntos
Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Haplótipos , Humanos , RobóticaRESUMO
Forensically relevant single nucleotide polymorphisms (SNPs) can provide valuable supplemental information to short tandem repeats (STRs) for investigative leads, and genotyping can now be streamlined using massively parallel sequencing (MPS). Dust is an attractive evidence source, as it accumulates on undisturbed surfaces, often is overlooked by perpetrators, and contains sufficient human DNA for analysis. To assess whether SNPs genotyped from indoor dust using MPS could be used to detect known household occupants, 13 households were recruited and provided buccal samples from each occupant and dust from five predefined indoor locations. Thermo Fisher Scientific Precision ID Identity and Ancestry Panels were utilized for SNP genotyping, and sequencing was completed using Illumina® chemistry. FastID, a software developed to permit mixture analysis and identity searching, was used to assess whether known occupants could be detected from associated household dust samples. A modified "subtraction" method was also used in FastID to estimate the percentage of alleles in each dust sample contributed by known and unknown occupants. On average, 72% of autosomal SNPs were recovered from dust samples. When using FastID, (a) 93% of known occupants were detected in at least one indoor dust sample and could not be excluded as contributors to the mixture, and (b) non-contributor alleles were detected in 54% of dust samples (29 ± 11 alleles per dust sample). Overall, this study highlights the potential of analyzing human DNA present in indoor dust to detect known household occupants, which could be valuable for investigative leads.
Assuntos
Impressões Digitais de DNA , Polimorfismo de Nucleotídeo Único , Humanos , Impressões Digitais de DNA/métodos , Genótipo , DNA/análise , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA , Repetições de MicrossatélitesRESUMO
Molecular-based taxonomy, specifically DNA barcoding, has streamlined organism identification. For land plants, the recommended 2-locus barcode of rbcL and matK is not suitable for all groups, thus the second subunit of the nuclear internal transcribed spacer (ITS2) has received attention as a possible alternative. To date, evaluations of ITS2 have mostly been limited in scope to specific plant orders/families and single source material. Prior to using ITS2 to routinely characterize land plants present in environmental samples (i.e., DNA metabarcoding), a wet lab protocol optimized for bulk sample types is needed. To address this gap, in this study we determined the broad recoverability across land plants when using published ITS2 primer pairs, and subsequently optimized the PCR reaction constituents and cycling conditions for the best two performing primer pairs (ITS2F/ITSp4 and ITSp3/ITSu4). Using these conditions, both primer pairs were used to characterize land plants present in 17 diverse soils collected from across the US. The resulting PCR amplicons were prepared into libraries and pooled for sequencing on an Illumina® MiniSeq. Our existing bioinformatics workflow was used to process raw sequencing data and taxonomically assign unique ITS2 plant sequences by comparison to GenBank. Given strict quality criteria were imposed on sequences for inclusion in data analysis, only 43.6% and 7.5% of sequences from ITS2F/ITSp4 and ITSp3/ITSu4 respectively remained for taxonomic comparisons; ~7-11% of sequences originated from fungal co-amplification. The number of orders and families recovered did differ between primer pairs, with ITS2F/ITSp4 consistently outperforming ITSp3/ITSu4 by >15%. Primer pair bias was observed in the recovery of certain taxonomic groups; ITS2F/ITSp4 preferentially recovered flowering plants and grasses, whereas ITSp3/ITSu4 recovered more moss taxa. To maximize data recovery and reduce potential bias, we advocate that studies using ITS2 to characterize land plants from environmental samples such as soil use a multiple primer pair approach.
Assuntos
Código de Barras de DNA Taxonômico/métodos , DNA Intergênico/genética , DNA de Plantas/genética , Metagenômica/métodos , Briófitas/classificação , Briófitas/genética , Código de Barras de DNA Taxonômico/normas , DNA Intergênico/química , DNA de Plantas/química , Gleiquênias/classificação , Gleiquênias/genética , Magnoliopsida/classificação , Magnoliopsida/genética , Metagenômica/normas , Reação em Cadeia da Polimerase/métodos , Reação em Cadeia da Polimerase/normas , Solo/químicaRESUMO
The forensic science community is poised to utilize modern advances in massively parallel sequencing (MPS) technologies to better characterize biological samples with higher resolution. A critical component towards the advancement of forensic DNA analysis with these technologies is a comprehensive understanding of the diversity and population distribution of sequence-based short tandem repeat (STR) alleles. Here we analyzed 786 samples of individuals from different population groups, including four of the mostly commonly encountered in forensic casework in the USA. DNA samples were ampliï¬ed with the PowerSeq™ Auto/Y System Prototype Kit (Promega Corp.), and sequencing was performed on an Illumina® MiSeq instrument. Sequence data were analyzed using a bioinformatics processing tool, Altius. For additional data analysis and profile comparison, capillary electrophoresis (CE) size-based STR genotypes were generated for a subset of individuals, and where possible, also with a second commercially available MPS STR assay. Autosomal STR loci were analyzed and frequencies were calculated based on sequence composition. Also, population genetics studies were performed, with Hardy-Weinberg equilibrium, polymorphic information content (PIC), and observed and expected heterozygosity all assessed. Overall, sequence-based allelic variants of the repeat region were observed in 20 out of 22 different STR loci commonly used in forensic DNA genotyping, with the highest number of sequence variation observed at locus D12S391. The highest increase in allelic diversity and in PIC through sequence-based genotyping was observed at loci D3S1358 and D8S1179. Such detailed sequence analysis, as the one performed in the present study, is important to help understand the diversity of sequence-based STR alleles across different populations and to demonstrate how such allelic variation can improve statistics used for forensic casework.
Assuntos
Impressões Digitais de DNA , Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Grupos Raciais/genética , Eletroforese Capilar , Feminino , Frequência do Gene , Genótipo , Heterozigoto , Humanos , Masculino , Polimorfismo Genético , Análise de Sequência de DNA , Estados UnidosRESUMO
In forensic geology casework, sample size typically limits routine characterization of material using bulk approaches. To address this, DNA-based characterization of biological taxa has received attention, as the taxa present can be useful for sample-to-sample comparisons and source attribution. In our initial work, low biodiversity was captured when DNA barcodes were Sanger-sequenced from plant and insect fragments isolated from 10 forensic-type surface soils. Considering some forensic laboratories now have access to massively parallel sequencing platforms, we assessed whether biological taxa present in the same surface soils could be better characterized using DNA metabarcoding. To achieve this, plant and animal barcodes were amplified and sequenced on an Illumina MiniSeq for three different DNA sample types (n = 50): individual fragments used in our initial study, and 250 and 100 mg of bulk soil (from the 10 sites used in the initial study). A total of 572 unique target barcode sequences passed quality filtering and were used in downstream statistical analyses: 54, 321, and 285 for individual fragments, 100 mg, and 250 mg bulk soil samples, respectively. Plant barcodes permitted some spatial separation of sample sites in non-metric multidimensional scaling plots; better separation was obtained for samples prepared from bulk soil. This study confirmed that bulk soil DNA metabarcoding is a better approach for characterizing biological taxa present in surface soils, which could supplement traditional geologic examinations.
Assuntos
Código de Barras de DNA Taxonômico/métodos , Genética Forense/métodos , Metagenoma/genética , Solo/química , Animais , Biodiversidade , Impressões Digitais de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Insetos/química , Insetos/genética , Plantas/química , Plantas/genéticaRESUMO
With the advent of Next-Generation Sequencing technology, sequencing of short tandem repeats (STRs) allows for a more detailed analysis when compared to size-based fragment methods (capillary electrophoresis-CE). The implementation of high-throughput sequencing can help uncover deeper genetic diversities of different populations. Subjects from the South region of Brazil present a particular and more homogeneous ancestry background when compared to other regions of the country. Both autosomal and Y- STRs have been analyzed in these individuals; however, all analyses published to date encompass data from CE-based fragment analysis. In this study, a genetic analysis of 59 individuals from Southern Brazil was performed on STR sequences. Forensically relevant STRs were PCR-enriched using a prototype of the PowerSeq™ AUTO/Y system (Promega Corp.). Next-generation sequencing was performed on an Illumina MiSeq instrument. The raw data (FASTQ files) were processed using a custom designed sequence processing tool, Altius. Isoalleles, which are sequence-based allelic variants that do not differ in length, were observed in nine autosomal and in six Y- STRs from the core global forensic marker set. The number of distinctive alleles based on sequence was higher when compared to those based on length, 37.3% higher in autosomal STRs and 13.8% higher in Y-STRs. The most polymorphic autosomal locus was D12S391, which presented 38 different sequence-based alleles. Among the loci in the Y chromosome, DYS389II presented the highest number of isoalleles. In comparison to CE analysis, Observed and Expected Heterozygosity, Polymorphic Information Content (PIC) and Genetic Diversity also presented higher values when the alleles were analyzed based on their sequence. For autosomal loci, Polymorphic Information Content (PIC) was 2.6% higher for sequence-based data. Diversity was 9.3% and 6.5% higher for autosomal and Y markers, respectively. In the analysis of the repeat structures for the STR loci, a new allele variant was found for allele 18 in the vWA locus. The STR flanking regions were also further investigated and sixteen variations were observed at nine autosomal STR loci and one Y-STR locus. The results obtained in this study demonstrate the importance of genetic analysis based on sequencing and highlight the diversity of the South Brazilian population when characterized by STR sequencing.
Assuntos
Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Análise de Sequência de DNA , Brasil , Cromossomos Humanos Y , Impressões Digitais de DNA , Feminino , Frequência do Gene , Variação Genética , Humanos , Masculino , Reação em Cadeia da PolimeraseRESUMO
Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity.
Assuntos
Computação em Nuvem , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Análise de Sequência de DNA , Segurança Computacional , Processamento Eletrônico de Dados , Humanos , Interface Usuário-ComputadorRESUMO
Though investigations into the use of massively parallel sequencing technologies for the generation of complete mitochondrial genome (mtGenome) profiles from difficult forensic specimens are well underway in multiple laboratories, the high quality population reference data necessary to support full mtGenome typing in the forensic context are lacking. To address this deficiency, we have developed 588 complete mtGenome haplotypes, spanning three U.S. population groups (African American, Caucasian and Hispanic) from anonymized, randomly-sampled specimens. Data production utilized an 8-amplicon, 135 sequencing reaction Sanger-based protocol, performed in semi-automated fashion on robotic instrumentation. Data review followed an intensive multi-step strategy that included a minimum of three independent reviews of the raw data at two laboratories; repeat screenings of all insertions, deletions, heteroplasmies, transversions and any additional private mutations; and a check for phylogenetic feasibility. For all three populations, nearly complete resolution of the haplotypes was achieved with full mtGenome sequences: 90.3-98.8% of haplotypes were unique per population, an improvement of 7.7-29.2% over control region sequencing alone, and zero haplotypes overlapped between populations. Inferred maternal biogeographic ancestry frequencies for each population and heteroplasmy rates in the control region were generally consistent with published datasets. In the coding region, nearly 90% of individuals exhibited length heteroplasmy in the 12418-12425 adenine homopolymer; and despite a relatively high rate of point heteroplasmy (23.8% of individuals across the entire molecule), coding region point heteroplasmies shared by more than one individual were notably absent, and transversion-type heteroplasmies were extremely rare. The ratio of nonsynonymous to synonymous changes among point heteroplasmies in the protein-coding genes (1:1.3) and average pathogenicity scores in comparison to data reported for complete substitutions in previous studies seem to provide some additional support for the role of purifying selection in the evolution of the human mtGenome. Overall, these thoroughly vetted full mtGenome population reference data can serve as a standard against which the quality and features of future mtGenome datasets (especially those developed via massively parallel sequencing) may be evaluated, and will provide a solid foundation for the generation of complete mtGenome haplotype frequency estimates for forensic applications.
Assuntos
Genética Forense , Genoma Mitocondrial , Haplótipos , Humanos , Estados UnidosRESUMO
Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows.