Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Forensic Sci Int Genet ; 69: 103005, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38171224

RESUMO

The genetic component of forensic genetic genealogy (FGG) is an estimate of kinship, often conducted at genome scales between a great number of individuals. The promise of FGG is substantial: in concert with genealogical records and other nongenetic information, it can indirectly identify a person of interest. A downside of FGG is cost, as it is currently expensive and requires chemistries uncommon to forensic genetic laboratories (microarrays and high throughput sequencing). The more common benchtop sequencers can be coupled with a targeted PCR assay to conduct FGG, though such approaches have limited resolution for kinship. This study evaluates low-pass sequencing, an alternative strategy that is accessible to benchtop sequencers and can produce resolutions comparable to high-pass sequencing. Samples from a three-generation pedigree were augmented to include up to 7th degree relatives (using whole genome pedigree simulations) and the ability to recover the true kinship coefficient was assessed using algorithms qualitatively similar to those found in GEDmatch. We show that up to 7th degree relatives can be reliably inferred from 1 × whole genome sequencing obtainable from desktop sequencers.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linhagem , Polimorfismo de Nucleotídeo Único , Genótipo , Impressões Digitais de DNA
2.
Forensic Sci Int Genet ; 69: 102980, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38016331

RESUMO

The de facto genetic markers of forensics are short tandem repeats (STRs). There are many analytical tools designed to work with STRs, including techniques for analyzing and assessing DNA mixtures. In contrast, the nascent field of forensic genetic genealogy often relies on biallelic single nucleotide polymorphisms (SNPs). Tools designed for the forensic assessment of SNPs are somewhat lacking, especially for DNA mixtures. In this paper we introduce Demixtify, a program that detects DNA mixtures using biallelic SNPs. Demixtify is quite powerful; highly imbalanced mixtures can be detected (≤1:99, considering in silico and in vitro mixtures) when coverage is ample. Demixtify can also detect mixtures in low coverage (∼1×) samples (when the mixture is relatively balanced). Demixtify includes an empirical estimator of sequence error that is specific to the markers assayed, making it especially relevant to the forensic community. Orthogonal techniques are also developed to characterize in vitro mixtures, as well as samples thought to be single source, and the results of these approaches serve to validate the techniques presented.


Assuntos
Impressões Digitais de DNA , DNA , Humanos , DNA/genética , Análise de Sequência de DNA/métodos , Polimorfismo de Nucleotídeo Único , Repetições de Microssatélites , Sequenciamento de Nucleotídeos em Larga Escala
3.
Forensic Sci Int Genet ; 63: 102807, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36462297

RESUMO

PCR artifacts are an ever-present challenge in sequencing applications. These artifacts can seriously limit the analysis and interpretation of low-template samples and mixtures, especially with respect to a minor contributor. In medicine, molecular barcoding techniques have been employed to decrease the impact of PCR error and to allow the examination of low-abundance somatic variation. In principle, it should be possible to apply the same techniques to the forensic analysis of mixtures. To that end, several short tandem repeat loci were selected for targeted sequencing, and a bioinformatic pipeline for analyzing the sequence data was developed. The pipeline notes the relevant unique molecular identifiers (UMIs) attached to each read and, using machine learning, filters the noise products out of the set of potential alleles. To evaluate this pipeline, DNA from pairs of individuals were mixed at different ratios (1-1, 1-9) and sequenced with different starting amounts of DNA (10, 1 and 0.1 ng). Naïvely using the information in the molecular barcodes led to increased performance, with the machine learning resulting in an additional benefit. In concrete terms, using the UMI data results in less noise for a given amount of drop out. For instance, if thresholds are selected that filter out a quarter of the true alleles, using read counts accepts 2381 noise alleles and using raw UMI counts accepts 1726 noise alleles, while the machine learning approach only accepts 307.


Assuntos
DNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alelos , DNA/análise , Impressões Digitais de DNA/métodos , Análise de Sequência de DNA , Repetições de Microssatélites
4.
Forensic Sci Int Genet ; 61: 102785, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36206658

RESUMO

One of the fundamental goals of forensic genetics is sample attribution, i.e., whether an item of evidence can be associated with some person or persons. The most common scenario involves a direct comparison, e.g., between DNA profiles from an evidentiary item and a sample collected from a person of interest. Less common is an indirect comparison in which kinship is used to potentially identify the source of the evidence. Because of the sheer amount of information lost in the hereditary process for comparison purposes, sampling a limited set of loci may not provide enough resolution to accurately resolve a relationship. Instead, whole genome techniques can sample the entirety of the genome or a sufficiently large portion of the genome and as such they may effect better relationship determinations. While relatively common in other areas of study, whole genome techniques have only begun to be explored in the forensic sciences. As such, bioinformatic pipelines are introduced for estimating kinship by massively parallel sequencing of whole genomes using approaches adapted from the medical and population genomic literature. The pipelines are designed to characterize a person's entire genome, not just some set of targeted markers. Two different variant callers are considered, contrasting a classical variant calling algorithm (BCFtools) to a more modern deep convolution neural network (DeepVariant). Two different bioinformatic pipelines specific to each variant caller are introduced and evaluated in a titration series. Filters and thresholds are then optimized specifically for the purposes of estimating kinship as determined by the KING-robust algorithm. With the appropriate filtering and thresholds in place both tools perform similarly, with DeepVariant tending to produce more accurate genotypes, though the resultant types of inaccuracies tended to produce slightly less accurate overall estimates of relatedness.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Genótipo , Algoritmos
5.
Forensic Sci Int Genet ; 61: 102776, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36152508

RESUMO

The recent advent of genetic genealogy has brought about a renewed interest in genome-scale forensic analyses, of which kinship estimation is a critical component. Most genomic kinship estimators consider SNPs (single nucleotide polymorphisms), often leveraging the co-inheritance of shared alleles to inform their analyses. While current estimators cannot directly evaluate mixed samples, there exist well-established SNP-based kinship estimators tailored to considering challenged samples, including low-pass whole genome sequencing. As an example, several studies have shown remarkable success in imputing genotype posterior probabilities in low template samples when linked sites are considered. Critical to these approaches is the ability to account for genotype uncertainty; the lack of an expression for a genotype likelihood in imbalanced mixtures has prevented direct application. This work develops such an expression. The formulation is fully compatible with genotype imputation software, suggesting a genomic pipeline that estimates genotype likelihoods, performs imputation, and then estimates kinship when the sample is a mixture. Further, when framed as an imbalanced mixture, the problem of mixture deconvolution is reducible to the problem of genotyping mixed samples. Herein, the ability to genotype two-person mixtures is assessed through example and in silico settings. While certain mixture scenarios and classes of sites are inherently inseparable, simulations of read depths between 60 and 190 appear to produce likelihoods of sufficient magnitude to deconvolve two-person mixtures whenever the mixture fraction is moderately imbalanced. The described approach and results suggest a path forward for estimating the kinship coefficient (and similar inferences on relatedness) when the sample is a mixture.


Assuntos
Impressões Digitais de DNA , DNA , Humanos , Funções Verossimilhança , Genótipo , Impressões Digitais de DNA/métodos , Alelos , DNA/genética , DNA/análise
6.
Front Genet ; 13: 882268, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35846115

RESUMO

Technological advances in sequencing and single nucleotide polymorphism (SNP) genotyping microarray technology have facilitated advances in forensic analysis beyond short tandem repeat (STR) profiling, enabling the identification of unknown DNA samples and distant relationships. Forensic genetic genealogy (FGG) has facilitated the identification of distant relatives of both unidentified remains and unknown donors of crime scene DNA, invigorating the use of biological samples to resolve open cases. Forensic samples are often degraded or contain only trace amounts of DNA. In this study, the accuracy of genome-wide relatedness methods and identity by descent (IBD) segment approaches was evaluated in the presence of challenges commonly encountered with forensic data: missing data and genotyping error. Pedigree whole-genome simulations were used to estimate the genotypes of thousands of individuals with known relationships using multiple populations with different biogeographic ancestral origins. Simulations were also performed with varying error rates and types. Using these data, the performance of different methods for quantifying relatedness was benchmarked across these scenarios. When the genotyping error was low (<1%), IBD segment methods outperformed genome-wide relatedness methods for close relationships and are more accurate at distant relationship inference. However, with an increasing genotyping error (1-5%), methods that do not rely on IBD segment detection are more robust and outperform IBD segment methods. The reduced call rate had little impact on either class of methods. These results have implications for the use of dense SNP data in forensic genomics for distant kinship analysis and FGG, especially when the sample quality is low.

7.
Forensic Sci Int Genet ; 59: 102719, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35526505

RESUMO

Forensic genetic investigations typically rely on analysis of DNA for attribution purposes. There are times, however, when the amount and/or the quality of the DNA is limited, and thus little or no information can be obtained regarding the source of the sample. An alternative biochemical target that also contains genetic signatures is protein. One class of genetic signatures is protein polymorphisms that are a direct consequence of simple/single/short nucleotide polymorphisms (SNPs) in DNA. However, to interpret protein polymorphisms in a forensic context, certain complexities must be understood and addressed. These complexities include: 1) SNPs can generate 0, 1, or arbitrarily many polymorphisms in a polypeptide; and 2) as an object of expression that is modulated by alleles, genes and interactions with the environment, proteins may be present or absent in a given sample. To address these issues, a novel approach was taken to generate the expected protein alleles in a reference sample based on whole genome (or exome) sequence data and assess the significance of the evidence using a haplotype-based semi-continuous likelihood algorithm that leverages whole proteome data. Converting the genomic information into the proteomic information allows for the zero-to-many relationship between SNPs and GVPs to be abstracted away. When viewed as a haplotype, many GVPs that correspond to the same SNP is equivalent to many SNPs in perfect linkage disequilibrium (LD). As long as the likelihood formulation correctly accounts for LD, the correspondence between the SNP and the proteome can be safely neglected. Tests were performed on simulated samples, including single-source and two-person mixtures, and the power of using a classical semi-continuous likelihood versus one that has been adapted to neglect drop-out was compared. Additionally, summary statistics and a rudimentary set of decision guidelines were introduced to help identify mixtures from protein data.


Assuntos
Proteoma , Proteômica , DNA/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Peptídeos/análise , Peptídeos/genética , Polimorfismo de Nucleotídeo Único , Proteoma/genética , Análise de Sequência de DNA
8.
Appl Environ Microbiol ; 88(7): e0005222, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35285713

RESUMO

The skin microbiome is a highly abundant and relatively stable source of DNA that may be utilized for human identification (HID). In this study, a set of single nucleotide polymorphisms (SNPs) with a high mean estimated Wright's fixation index (FST) (>0.1) and widespread abundance (found in ≥75% of samples compared) were selected from a diverse set of markers in the hidSkinPlex panel. The least absolute shrinkage and selection operator (LASSO) was used in a novel machine learning framework to generate a SNP panel and predict the human host from skin microbiome samples collected from the hand, manubrium, and foot. The framework was devised to emulate a new unknown person introduced to the algorithm and to match samples from that person against a population database. Unknown samples were classified with 96% accuracy (Matthews correlation coefficient [MCC], 0.954) in the test (n = 225 samples) data set. A final panel of informative SNPs was determined for HID (hidSkinPlex+) using all 51 individuals sampled at three body sites in triplicate. The hidSkinPlex+ panel comprises 365 SNPs and yielded prediction accuracy for the correct host of 95% (MCC = 0.949). The accuracy of the hidSkinPlex+ panel may be somewhat overestimated due to using 26 individuals from the training data set for the selection of the final panel. However, this accuracy still provides an indication of performance when tested on new samples. IMPORTANCE One of the fundamental goals in forensic genetics is to identify the source of biological evidence. Methods for detecting human DNA have advanced and can be quite sensitive, but not all DNA samples are amenable to current methods. However, the human skin microbiome is a source of DNA with high copy numbers, and it has the potential for high discriminatory power. The hidSkinPlex panel has been used for HID; however, some aspects of it could be improved. Missing information is ambiguous, as it is unclear if marker drop-out is a by-product of a low-template sample or if the reasons for not observing a marker are biological. Such ambiguity may confound methods for HID, and as such, an improved marker set (hidSkinPlex+) was designed that is considerably smaller and more robust to drop-out (365 SNPs contained in 135 markers) yet still can be used to accurately predict the human host.


Assuntos
Microbiota , Polimorfismo de Nucleotídeo Único , DNA , Antropologia Forense , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Microbiota/genética , Análise de Sequência de DNA
9.
F1000Res ; 11: 18, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35222994

RESUMO

Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.


Assuntos
Genoma , Linhagem , Polimorfismo de Nucleotídeo Único , Biologia Computacional , Humanos , Software
10.
Bioinformatics ; 38(7): 2052-2053, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35020788

RESUMO

MOTIVATION: Read-merging algorithms that look solely at the reads can misalign and mis-merge the reads (especially near repetitive sequences). RESULTS: The C++ program ProSynAR has been written to take the reads' position in the reference into account when performing (and deciding whether to perform) a merge. AVAILABILITY: *Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/prosynar). Windows binary available at https://github.com/Benjamin-Crysup/prosynar/releases/download/1.0/prosynar.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA , Algoritmos , Sequências Repetitivas de Ácido Nucleico
11.
J Proteomics ; 249: 104360, 2021 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-34481086

RESUMO

We present an efficient protein extraction and in-solution enzymatic digestion protocol optimized for mass spectrometry-based proteomics studies of human skin samples. Human skin cells are a proteinaceous matrix that can enable forensic identification of individuals. We performed a systematic optimization of proteomic sample preparation for a protein-based human forensic identification application. Digestion parameters, including incubation duration, temperature, and the type and concentration of surfactant, were systematically varied to maximize digestion completeness. Through replicate digestions, parameter optimization was performed to maximize repeatability and increase the number of identified peptides and proteins. Final digestion conditions were selected based on the parameters that yielded the greatest percent of peptides with zero missed tryptic cleavages, which benefit the analysis of genetically variable peptides (GVPs). We evaluated the final digestion conditions for identification of GVPs by applying MS-based proteomics on a mixed-donor sample. The results were searched against a human proteome database appended with a database of GVPs constructed from known non-synonymous single nucleotide polymorphisms (SNPs) that occur at known population frequencies. The aim of this study was to demonstrate the potential of our proteomics sample preparation for future implementation of GVP analysis by forensic laboratories to facilitate human identification. SIGNIFICANCE: Genetically variable peptides (GVPs) can provide forensic evidence that is complementary to traditional DNA profiling and be potentially used for human identification. An efficient protein extraction and reproducible digestion method of skin proteins is a key contributor for downstream analysis of GVPs and further development of this technology in forensic application. In this study, we optimized the enzymatic digestion conditions, such as incubation time and temperature, for skin samples. Our study is among the first attempts towards optimization of proteomics sample preparation for protein-based skin identification in forensic applications such as touch samples. Our digestion method employs RapiGest (an acid-labile surfactant), trypsin enzymatic digestion, and an incubation time of 16 h at 37 °C.


Assuntos
Peptídeos , Proteômica , Medicina Legal , Humanos , Espectrometria de Massas , Proteoma , Tripsina
12.
Appl Environ Microbiol ; 87(20): e0120821, 2021 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-34379455

RESUMO

Microbial DNA, shed from human skin, can be distinctive to its host and, thus, help individualize donors of forensic biological evidence. Previous studies have utilized single-locus microbial DNA markers (e.g., 16S rRNA) to assess the presence/absence of personal microbiota to profile human hosts. However, since the taxonomic composition of the microbiome is in constant fluctuation, this approach may not be sufficiently robust for human identification (HID). Multimarker approaches may be more powerful. Additionally, genetic differentiation, rather than taxonomic distinction, may be more individualizing. To this end, the nondominant hands of 51 individuals were sampled in triplicate (n = 153). They were analyzed for markers in the hidSkinPlex, a multiplex panel comprising candidate markers for skin microbiome profiling. Single-nucleotide polymorphisms (SNPs) with the highest Wright's fixation index (FST) estimates were then selected for predicting donor identity using a support vector machine (SVM) learning model. FST is an estimate of the genetic differences within and between populations. Three different SNP selection criteria were employed: SNPs with the highest-ranking FST estimates (i) common between any two samples regardless of markers present (termed overall); (ii) each marker common between samples (termed per marker); and (iii) common to all samples used to train the SVM algorithm for HID (termed selected). The SNPs chosen based on criteria for overall, per marker, and selected methods resulted in an accuracy of 92.00%, 94.77%, and 88.00%, respectively. The results support that estimates of FST, combined with SVM, can notably improve forensic HID via skin microbiome profiling. IMPORTANCE There is a need for additional genetic information to help identify the source of biological evidence found at a crime scene. The human skin microbiome is a potentially abundant source of DNA that can enable the identification of a donor of biological evidence. With microbial profiling for human identification, there will be an additional source of DNA to identify individuals as well as to exclude individuals wrongly associated with biological evidence, thereby improving the utility of forensic DNA profiling to support criminal investigations.


Assuntos
Microbiota , Pele/microbiologia , Bactérias/genética , Antropologia Forense , Humanos , Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único , Máquina de Vetores de Suporte
13.
Forensic Sci Int Genet ; 55: 102568, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34416654

RESUMO

Short tandem repeats of the nuclear genome have been the preferred markers for analyzing forensic DNA mixtures. However, when nuclear DNA in a sample is degraded or limited, mitochondrial DNA (mtDNA) markers provide a powerful alternative. Though historically considered challenging, the interpretation and analysis of mtDNA mixtures have recently seen renewed interest with the advent of massively parallel sequencing. However, there are only a few software tools available for mtDNA mixture interpretation. To address this gap, the Mitochondrial Mixture Deconvolution and Interpretation Tool (MMDIT) was developed. MMDIT is an interactive application complete with a graphical user interface that allows users to deconvolve mtDNA (whole or partial genomes) mixtures into constituent donor haplotypes and estimate random match probabilities on these resultant haplotypes. In cases where deconvolution might not be feasible, the software allows mixture analysis directly within a binary framework (i.e. qualitatively, only using data on allele presence/absence). This paper explains the functionality of MMDIT, using an example of an in vitro two-person mtDNA mixture with a ratio of 1:4. The uniqueness of MMDIT lies in its ability to resolve mixtures into complete donor haplotypes using a statistical phasing framework before mixture analysis and evaluating statistical weights employing a novel graph algorithm approach. MMDIT is the first available open-source software that can automate mtDNA mixture deconvolution and analysis. The MMDIT web application can be accessed online at https://www.unthsc.edu/mmdit/. The source code is available at https://github.com/SammedMandape/MMDIT_UI and archived on zenodo (https://doi.org/10.5281/zenodo.4770184).


Assuntos
DNA Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala , DNA Mitocondrial/genética , Haplótipos , Humanos , Análise de Sequência de DNA , Software
14.
Forensic Sci Int Genet ; 53: 102516, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33878618

RESUMO

Forensic DNA typing typically relies on the length-based (LB) separation of PCR products containing short tandem repeat loci (STRs). Massively parallel sequencing (MPS) elucidates an additional level of STR motif and flanking region variation. Also, MPS enables simultaneous analysis of different marker-types - autosomal STRs, SNPs for lineage and identification purposes, reducing both the amount of sample used and the turn-around-time of analysis. Therefore, MPS methodologies are being considered as an additional tool in forensic genetic casework. The PowerSeq™ Auto/Y System (Promega Corp), a multiplex forensic kit for MPS, enables analysis of the 22 autosomal STR markers (plus Amelogenin) from the PowerPlex® Fusion 6C kit and 23 Y-STR markers from the PowerPlex® Y23 kit. Population data were generated from 140 individuals from an admixed sample from Rio de Janeiro, Brazil. All samples were processed according to the manufacturers' recommended protocols. Raw data (FastQ) were generated for each indexed sample and analyzed using STRait Razor v2s and PowerSeqv2.config file. The subsequent population data showed the largest increase in expected heterozygosity (23%), from LB to sequence-based (SB) analyses at the D5S818 locus. Unreported allele was found at the D21S11 locus. The random match probability across all loci decreased from 5.9 × 10-28 to 7.6 × 10-33. Sensitivity studies using 1, 0.25, 0.062 and 0.016 ng of DNA input were analyzed in triplicate. Full Y-STR profiles were detected in all samples, and no autosomal allele drop-out was observed with 62 pg of input DNA. For mixture studies, 1 ng of genomic DNA from a male and female sample at 1:1, 1:4, 1:9, 1:19 and 1:49 proportions were analyzed in triplicate. Clearly resolvable alleles (i.e., no stacking or shared alleles) were obtained at a 1:19 male to female contributor ratio. The minus one stutter (-1) increased with the longest uninterrupted stretch (LUS) allele size reads and according to simple or compound/complex repeats. The haplotype-specific stutter rates add more information for mixed samples interpretation. These data support the use of the PowerSeqTM Auto/Y systems prototype kit (22 autosomal STR loci, 23 Y-STR loci and Amelogenin) for forensic genetics applications.


Assuntos
Impressões Digitais de DNA/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Repetições de Microssatélites , Brasil , Cromossomos Humanos Y , Feminino , Frequência do Gene , Marcadores Genéticos , Humanos , Masculino , Reação em Cadeia da Polimerase , Análise de Sequência de DNA
15.
Genes (Basel) ; 12(2)2021 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-33514030

RESUMO

The scale of genetic methods are presently being expanded: forensic genetic assays previously were limited to tens of loci, but now technologies allow for a transition to forensic genomic approaches that assess thousands to millions of loci. However, there are subtle distinctions between genetic assays and their genomic counterparts (especially in the context of forensics). For instance, forensic genetic approaches tend to describe a locus as a haplotype, be it a microhaplotype or a short tandem repeat with its accompanying flanking information. In contrast, genomic assays tend to provide not haplotypes but sequence variants or differences, variants which in turn describe how the alleles apparently differ from the reference sequence. By the given construction, mitochondrial genetic assays can be thought of as genomic as they often describe genetic differences in a similar way. The mitochondrial genetics literature makes clear that sequence differences, unlike the haplotypes they encode, are not comparable to each other. Different alignment algorithms and different variant calling conventions may cause the same haplotype to be encoded in multiple ways. This ambiguity can affect evidence and reference profile comparisons as well as how "match" statistics are computed. In this study, a graph algorithm is described (and implemented in the MMDIT (Mitochondrial Mixture Database and Interpretation Tool) R package) that permits the assessment of forensic match statistics on mitochondrial DNA mixtures in a way that is invariant to both the variant calling conventions followed and the alignment parameters considered. The algorithm described, given a few modest constraints, can be used to compute the "random man not excluded" statistic or the likelihood ratio. The performance of the approach is assessed in in silico mitochondrial DNA mixtures.


Assuntos
Algoritmos , Biologia Computacional/métodos , DNA Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Software , Alelos , Variação Genética , Genótipo , Haplótipos
16.
Genes (Basel) ; 12(2)2021 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-33498312

RESUMO

Despite the benefits of quantitative data generated by massively parallel sequencing, resolving mitotypes from mixtures occurring in certain ratios remains challenging. In this study, a bioinformatic mixture deconvolution method centered on population-based phasing was developed and validated. The method was first tested on 270 in silico two-person mixtures varying in mixture proportions. An assortment of external reference panels containing information on haplotypic variation (from similar and different haplogroups) was leveraged to assess the effect of panel composition on phasing accuracy. Building on these simulations, mitochondrial genomes from the Human Mitochondrial DataBase were sourced to populate the panels and key parameter values were identified by deconvolving an additional 7290 in silico two-person mixtures. Finally, employing an optimized reference panel and phasing parameters, the approach was validated with in vitro two-person mixtures with differing proportions. Deconvolution was most accurate when the haplotypes in the mixture were similar to haplotypes present in the reference panel and when the mixture ratios were neither highly imbalanced nor subequal (e.g., 4:1). Overall, errors in haplotype estimation were largely bounded by the accuracy of the mixture's genotype results. The proposed framework is the first available approach that automates the reconstruction of complete individual mitotypes from mixtures, even in ratios that have traditionally been considered problematic.


Assuntos
DNA Mitocondrial , Genética Forense/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Estatísticos , Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Genoma Mitocondrial , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
17.
Forensic Sci Int Genet ; 52: 102463, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33493821

RESUMO

Since 2013, STRait Razor has enabled analysis of massively parallel sequencing (MPS) data from various marker systems such as short tandem repeats, single nucleotide polymorphisms, insertion/deletions, and mitochondrial DNA. In this paper, STRait Razor Online (SRO), available at https://www.unthsc.edu/straitrazor, is introduced as an interactive, Shiny-based user interface for primary analysis of MPS data and secondary analysis of STRait Razor haplotype pileups. This software can be accessed from any common browser via desktop, tablet, or smartphone device. SRO is available also as a standalone application and open-source R script available at https://github.com/ExpectationsManaged/STRaitRazorOnline. The local application is capable of batch processing of both fastq files and primary analysis output. Processed batches generate individual report folders and summary reports at the locus- and haplotype-level in a matter of minutes. For example, the processing of data from ∼700 samples generated with the ForenSeq Signature Preparation Kit from allsequences.txt to a final table can be performed in ∼40 min whereas the Excel-based workbooks can take 35-60 h to compile a subset of the tables generated by SRO. To facilitate analysis of single-source, reference samples, a preliminary triaging system was implemented that calls potential alleles and flags loci suspected of severe heterozygote imbalance. When compared to published, manually curated data sets, 98.72 % of software-assigned allele calls without manual interpretation were consistent with curated data sets, 0.99 % loci were presented to the user for interpretation due to heterozygote imbalance, and the remaining 0.29 % of loci were inconsistent due to the analytical thresholds used across the studies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Interface Usuário-Computador , Impressões Digitais de DNA , Humanos , Internet , Repetições de Microssatélites , Análise de Sequência de DNA
18.
Bioinformatics ; 37(16): 2479-2480, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33459758

RESUMO

MOTIVATION: Current read-mapping software uses a singular specification of alignment parameters with respect to the reference. In the presence of varying reference structures (such as the repetitive regions of the human genome), alignments can be improved if those parameters are allowed vary. RESULTS: To that end, the C++ program ProDerAl was written to refine previously generated alignments using varying parameters for these problematic regions. Synthetic benchmarks show that this realignment can result in an order of magnitude fewer misaligned bases. AVAILABILITY AND IMPLEMENTATION: *Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/proderal.git). Windows binary available at https://github.com/Benjamin-Crysup/proderal/releases/download/v1.1/proderal.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
BMC Bioinformatics ; 22(1): 12, 2021 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-33407074

RESUMO

BACKGROUND: Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. RESULTS: This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package "mixIndependR" calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy-Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. CONCLUSION: The package "mixIndependR" is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package "mixIndependR" makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. AVAILABILITY: The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html .


Assuntos
Loci Gênicos/genética , Genômica/métodos , Software , Bases de Dados Genéticas , Genótipo , Desequilíbrio de Ligação/genética
20.
Forensic Sci Int Genet ; 51: 102459, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33429137

RESUMO

Unique molecular identifiers (UMIs) are a promising approach to contend with errors generated during PCR and massively parallel sequencing (MPS). With UMI technology, random molecular barcodes are ligated to template DNA molecules prior to PCR, allowing PCR and sequencing error to be tracked and corrected bioinformatically. UMIs have the potential to be particularly informative for the interpretation of short tandem repeats (STRs). Traditional MPS approaches may simply lead to the observation of alleles that are consistent with the hypotheses of stutter, while with UMIs stutter products bioinformatically may be re-associated with their parental alleles and subsequently removed. Herein, a bioinformatics pipeline named strumi is described that is designed for the analysis of STRs that are tagged with UMIs. Unlike other tools, strumi is an alignment-free machine learning driven algorithm that clusters individual MPS reads into UMI families, infers consensus super-reads that represent each family and provides an estimate the resulting haplotype's accuracy. Super-reads, in turn, approximate independent measurements not of the PCR products, but of the original template molecules, both in terms of quantity and sequence identity. Provisional assessments show that naïve threshold-based approaches generate super-reads that are accurate (∼97 % haplotype accuracy, compared to ∼78 % when UMIs are not used), and the application of a more nuanced machine learning approach increases the accuracy to ∼99.5 % depending on the level of certainty desired. With these features, UMIs may greatly simplify probabilistic genotyping systems and reduce uncertainty. However, the ability to interpret alleles at trace levels also permits the interpretation, characterization and quantification of contamination as well as somatic variation (including somatic stutter), which may present newfound challenges.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Repetições de Microssatélites , Análise de Sequência de DNA/métodos , Impressões Digitais de DNA , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...