RESUMO
Since 2013, STRait Razor has enabled analysis of massively parallel sequencing (MPS) data from various marker systems such as short tandem repeats, single nucleotide polymorphisms, insertion/deletions, and mitochondrial DNA. In this paper, STRait Razor Online (SRO), available at https://www.unthsc.edu/straitrazor, is introduced as an interactive, Shiny-based user interface for primary analysis of MPS data and secondary analysis of STRait Razor haplotype pileups. This software can be accessed from any common browser via desktop, tablet, or smartphone device. SRO is available also as a standalone application and open-source R script available at https://github.com/ExpectationsManaged/STRaitRazorOnline. The local application is capable of batch processing of both fastq files and primary analysis output. Processed batches generate individual report folders and summary reports at the locus- and haplotype-level in a matter of minutes. For example, the processing of data from â¼700 samples generated with the ForenSeq Signature Preparation Kit from allsequences.txt to a final table can be performed in â¼40 min whereas the Excel-based workbooks can take 35-60 h to compile a subset of the tables generated by SRO. To facilitate analysis of single-source, reference samples, a preliminary triaging system was implemented that calls potential alleles and flags loci suspected of severe heterozygote imbalance. When compared to published, manually curated data sets, 98.72 % of software-assigned allele calls without manual interpretation were consistent with curated data sets, 0.99 % loci were presented to the user for interpretation due to heterozygote imbalance, and the remaining 0.29 % of loci were inconsistent due to the analytical thresholds used across the studies.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Interface Usuário-Computador , Impressões Digitais de DNA , Humanos , Internet , Repetições de Microssatélites , Análise de Sequência de DNARESUMO
The short tandem repeat allele identification tool (STRait Razor), a program used to characterize the haplotypes of short tandem repeats (STRs) in massively parallel sequencing (MPS) data, was redesigned. STRait Razor v3.0 performs â¼660× faster allele identification than its previous version (v2s), a speedup that is largely due to a novel indexing strategy used to perform "fuzzy" (approximate) string matching of anchor sequences. Written in a portable compiled language, C++, STRait Razor v3.0 functions on all major operating systems including Microsoft Windows, and it has cross-platform multithreading support. In silico estimates of precision and accuracy of STRait Razor v3.0 were 100% in this evaluation and results were highly concordant with those of Strait Razor v2s. STRait Razor v3.0 adds several key features that simplify the haplotype reporting process, including simple filters to remove low frequency haplotypes as well as merging haplotypes within a locus encoded on opposite strands of the DNA molecule.
Assuntos
Alelos , Haplótipos , Repetições de Microssatélites , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
STRait Razor has provided the forensic community a free-to-use, open-source tool for short tandem repeat (STR) analysis of massively parallel sequencing (MPS) data. STRait Razor v2s (SRv2s) allows users to capture physically phased haplotypes within the full amplicon of both commercial (ForenSeq) and "early access" panels (PowerSeq, Mixture ID). STRait Razor v2s may be run in batch mode to facilitate population-level analysis and is supported by all Unix distributions (including MAC OS). Data are reported in tables in string (haplotype), length-based (e.g., vWA allele 14), and International Society of Forensic Genetics (ISFG)-recommended (vWA [CE 14]-GRCh38-chr12:5983950-5984049 (TAGA)10 (CAGA)3 TAGA) formats. STRait Razor v2s currently contains a database of â¼2500 unique sequences. This database is used by SRv2s to match strings to the appropriate allele in ISFG-recommended format. In addition to STRs, SRv2s has configuration files necessary to capture and report haplotypes from all marker types included in these multiplexes (e.g., SNPs, InDels, and microhaplotypes). To facilitate mixture interpretation, data may be displayed from all markers in a format similar to that of electropherograms displayed by traditional forensic software. The download package for SRv2s may be found at https://www.unthsc.edu/graduate-school-of-biomedical-sciences/molecular-and-medical-genetics/laboratory-faculty-and-staff/strait-razor.
Assuntos
Alelos , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Software , Haplótipos , Humanos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Massively parallel sequencing (MPS) offers advantages over current capillary electrophoresis-based analysis of short tandem repeat (STR) loci for human identification testing. In particular STR repeat motif sequence information can be obtained, thereby increasing the discrimination power of some loci. While sequence variation within the repeat region is observed relatively frequently in some of the commonly used STRs, there is an additional degree of variation found in the flanking regions adjacent to the repeat motif. Repeat motif and flanking region sequence variation have been described for major population groups, however, not for more isolated populations. Flanking region sequence variation in STR and single nucleotide polymorphism (SNP) loci in the Yavapai population was analyzed using the ForenSeq™ DNA Signature Prep Kit and STRait Razor v2s. Seven and 14 autosomal STRs and identity-informative single nucleotide polymorphisms (iiSNPs), respectively, had some degree of flanking region variation. Three and four of these identity-informative loci, respectively, showed ≥5% increase in expected heterozygosity. The combined length- and sequence-based random match probabilities (RMPs) for 27 autosomal STRs were 6.11×10-26 and 2.79×10-29, respectively. When combined with 94 iiSNPs (a subset of which became microhaplotypes) the combined RMP was 5.49×10-63. Analysis of length-based and sequence-based autosomal STRs in STRUCTURE indicated that the Yavapai are most similar to the Hispanic population. While producing minimal increase in X- and Y-STR discrimination potential, access to flanking region data enabled identification of one novel X-STR and three Y-STR alleles relative to previous reports. Five ancestry-informative SNPs (aiSNPs) and two phenotype-informative SNPs (piSNPs) exhibited notable flanking region variation.
Assuntos
Indígenas Norte-Americanos/genética , Repetições de Microssatélites , Polimorfismo de Nucleotídeo Único , Cromossomos Humanos X , Cromossomos Humanos Y , Impressões Digitais de DNA , Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Short tandem repeat (STR) loci are the traditional markers used for kinship, missing persons, and direct comparison human identity testing. These markers hold considerable value due to their highly polymorphic nature, amplicon size, and ability to be multiplexed. However, many STRs are still too large for use in analysis of highly degraded DNA. Small bi-allelic polymorphisms, such as insertions/deletions (INDELs), may be better suited for analyzing compromised samples, and their allele size differences are amenable to analysis by capillary electrophoresis. The INDEL marker allelic states range in size from 2 to 6 base pairs, enabling small amplicon size. In addition, heterozygote balance may be increased by minimizing preferential amplification of the smaller allele, as is more common with STR markers. Multiplexing a large number of INDELs allows for generating panels with high discrimination power. The Nextera™ Rapid Capture Custom Enrichment Kit (Illumina, Inc., San Diego, CA) and massively parallel sequencing (MPS) on the Illumina MiSeq were used to sequence 68 well-characterized INDELs in four major US population groups. In addition, the STR Allele Identification Tool: Razor (STRait Razor) was used in a novel way to analyze INDEL sequences and detect adjacent single nucleotide polymorphisms (SNPs) and other polymorphisms. This application enabled the discovery of unique allelic variants, which increased the discrimination power and decreased the single-locus random match probabilities (RMPs) of 22 of these well-characterized INDELs which can be considered as microhaplotypes. These findings suggest that additional microhaplotypes containing human identification (HID) INDELs may exist elsewhere in the genome.
Assuntos
Impressões Digitais de DNA/métodos , Marcadores Genéticos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Genética Populacional , Heterozigoto , Humanos , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genéticaRESUMO
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data.
Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Grupos Raciais/genética , Cromossomos Humanos X , Cromossomos Humanos Y , Impressões Digitais de DNA , Frequência do Gene , Marcadores Genéticos , Genética Populacional , Humanos , Reação em Cadeia da Polimerase , Estados UnidosRESUMO
STRait Razor (the STR Allele Identification Tool - Razor) was developed as a bioinformatic software tool to detect short tandem repeat (STR) alleles in massively parallel sequencing (MPS) raw data. The method of detection used by STRait Razor allows it to make reliable allele calls for all STR types in a manner that is similar to that of capillary electrophoresis. STRait Razor v2.0 incorporates several new features and improvements upon the original software, such as a larger default locus configuration file that increases the number of detectable loci (now including X-chromosome STRs and Amelogenin), an enhanced custom locus list generator, a novel output sorting method that highlights unique sequences for intra-repeat variation detection, and a genotyping tool that emulates traditional electropherogram data. Users also now have the option to choose whether the program detects autosomal, X-chromosome, Y-chromosome, or all STRs. Concordance testing was performed, and allele calls produced by STRait Razor v2.0 were completely consistent with those made by the original software.
Assuntos
Alelos , Repetições de Microssatélites/genética , Sequência de Bases , Cromossomos Humanos X , DNA/genética , Genótipo , Humanos , Dados de Sequência MolecularRESUMO
Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles.