RESUMEN
The weight of DNA evidence for forensic applications is typically assessed through the calculation of the likelihood ratio (LR). In the standard workflow, DNA is extracted from a collection of cells where the cells of an unknown number of donors are mixed. The DNA is then genotyped, and the LR is calculated through well-established methods. Recently, a method for calculating the LR from single-cell data has been presented. Rather than extracting the DNA while the cells are still mixed, single-cell data is procured by first isolating each cell. Extraction and fragment analysis of relevant forensic loci follows such that individual cells are genotyped. This workflow leads to significantly stronger weights of evidence, but it does not account for extracellular DNA that could also be present in the sample. In this paper, we present a method for calculation of an LR that combines single-cell and extracellular data. We demonstrate the calculation on example data and show that the combined LR can lead to stronger conclusions than would be obtained from calculating LRs on the single-cell and extracellular DNA separately.
RESUMEN
In the absence of a suspect the forensic aim is investigative, and the focus is one of discerning what genotypes best explain the evidence. In traditional systems, the list of candidate genotypes may become vast if the sample contains DNA from many donors or the information from a minor contributor is swamped by that of major contributors, leading to lower evidential value for a true donor's contribution and, as a result, possibly overlooked or inefficient investigative leads. Recent developments in single-cell analysis offer a way forward, by producing data capable of discriminating genotypes. This is accomplished by first clustering single-cell data by similarity without reference to a known genotype. With good clustering it is reasonable to assume that the scEPGs in a cluster are of a single contributor. With that assumption we determine the probability of a cluster's content given each possible genotype at each locus, which is then used to determine the posterior probability mass distribution for all genotypes by application of Bayes' rule. A decision criterion is then applied such that the sum of the ranked probabilities of all genotypes falling in the set is at least 1-α. This is the credible genotype set and is used to inform database search criteria. Within this work we demonstrate the salience of single-cell analysis by performance testing a set of 630 previously constructed admixtures containing up to 5 donors of balanced and unbalanced contributions. We use scEPGs that were generated by isolating single cells, employing a direct-to-PCR extraction treatment, amplifying STRs that are compliant with existing national databases and applying post-PCR treatments that elicit a detection limit of one DNA copy. We determined that, for these test data, 99.3% of the true genotypes are included in the 99.8% credible set, regardless of the number of donors that comprised the mixture. We also determined that the most probable genotype was the true genotype for 97% of the loci when the number of cells in a cluster was at least two. Since efficient investigative leads will be borne by posterior mass distributions that are narrow and concentrated at the true genotype, we report that, for this test set, 47,900 (86%) loci returned only one credible genotype and of these 47,551 (99%) were the true genotype. When determining the LR for true contributors, 91% of the clusters rendered LR>1018, showing the potential of single-cell data to positively affect investigative reporting.
Asunto(s)
Dermatoglifia del ADN , Repeticiones de Microsatélite , Humanos , Dermatoglifia del ADN/métodos , Teorema de Bayes , Genotipo , ADN/genética , Funciones de VerosimilitudRESUMEN
The consistency between DNA evidence and person(s) of interest (PoI) is summarized by a likelihood ratio (LR): the probability of the data given the PoI contributed divided by the probability given they did not. It is often the case that there are several PoI who may have individually or jointly contributed to the stain. If there is more than one PoI, or the number of contributors (NoC) cannot easily be determined, then several sets of hypotheses are needed, requiring significant resources to complete the interpretation. Recent technological developments in laboratory systems offer a way forward, by enabling production of single cell data. Though single-cell data may be procured by next generation sequencing or capillary electrophoresis workflows, in this work we focus our attention on assessing the consistency between PoIs and a collection of single cell electropherograms (scEPGs) from diploid cells - i.e., leukocytes and epithelial cells. Specifically, we introduce a framework that: I) clusters scEPGs into collections, each originating from one genetic source; II) for each PoI, determines a LR for each cluster of scEPGs; and III) by averaging the likelihood ratios for each PoI across all clusters provides a whole-sample weight of evidence summary. By using Model Based Clustering (MBC) in step I) and an algorithm, named EESCIt for Evidentiary Evaluation of Single Cells, that computes single-cell LRs in step II), we show that 99% of the comparisons rendered log LR values > 0 for true contributors, and of these all but one gave log LR > 5, regardless of the number of donors or whether the smallest contributor donated less than 20% of the cells, greatly expanding the collection of cases for which DNA forensics provides informative results.
Asunto(s)
Dermatoglifia del ADN , Repeticiones de Microsatélite , Humanos , Funciones de Verosimilitud , Dermatoglifia del ADN/métodos , Algoritmos , ADN/genéticaRESUMEN
An urgent need exists for a rapid, cost-effective, facile, and reliable nucleic acid assay for mass screening to control and prevent the spread of emerging pandemic diseases. This urgent need is not fully met by current diagnostic tools. In this review, we summarize the current state-of-the-art research in novel nucleic acid amplification and detection that could be applied to point-of-care (POC) diagnosis and mass screening of diseases. The critical technological breakthroughs will be discussed for their advantages and disadvantages. Finally, we will discuss the future challenges of developing nucleic acid-based POC diagnosis.
Asunto(s)
Ácidos Nucleicos , Técnicas de Amplificación de Ácido Nucleico , Pandemias , Sistemas de Atención de PuntoRESUMEN
Interpreting forensic DNA signal is arduous since the total intensity is a cacophony of signal from noise, artifact, and allele from an unknown number of contributors (NOC). An alternate to traditional bulk-processing pipelines is a single-cell one, where the sample is collected, and each cell is sequestered resulting in n single-source, single-cell EPGs (scEPG) that must be interpreted using applicable strategies. As with all forensic DNA interpretation strategies, high quality electropherograms are required; thus, to enhance the credibility of single-cell forensics, it is necessary to produce an efficient direct-to-PCR treatment that is compatible with prevailing downstream laboratory processes. We incorporated the semi-automated micro-fluidic DEPArray™ technology into the single-cell laboratory and optimized its implementation by testing the effects of four laboratory treatments on single-cell profiles. We focused on testing effects of phosphate buffer saline (PBS) since it is an important reagent that mitigates cell rupture but is also a PCR inhibitor. Specifically, we explored the effect of decreasing PBS concentrations on five electropherogram-quality metrics from 241 leukocytes: profile drop-out, allele drop-out, allele peak heights, peak height ratios, and scEPG sloping. In an effort to improve reagent use, we also assessed two concentrations of proteinase K. The results indicate that decreasing PBS concentrations to 0.5X or 0.25X improves scEPG quality, while modest modifications to proteinase K concentrations did not significantly impact it. We, therefore, conclude that a lower than recommended proteinase K concentration coupled with a lower than recommended PBS concentration results in enhanced scEPGs within the semi-automated single-cell pipeline.
Asunto(s)
Dermatoglifia del ADN , ADN , Endopeptidasa K , Alelos , ADN/análisis , Dermatoglifia del ADN/métodos , Endopeptidasa K/genética , Genética Forense , Repeticiones de Microsatélite , Reacción en Cadena de la Polimerasa/métodosRESUMEN
Forensic DNA signal is notoriously challenging to assess, requiring computational tools to support its interpretation. Over-expressions of stutter, allele drop-out, allele drop-in, degradation, differential degradation, and the like, make forensic DNA profiles too complicated to evaluate by manual methods. In response, computational tools that make point estimates on the Number of Contributors (NOC) to a sample have been developed, as have Bayesian methods that evaluate an A Posteriori Probability (APP) distribution on the NOC. In cases where an overly narrow NOC range is assumed, the downstream strength of evidence may be incomplete insofar as the evidence is evaluated with an inadequate set of propositions. In the current paper, we extend previous work on NOCIt, a Bayesian method that determines an APP on the NOC given an electropherogram, by reporting on an implementation where the user can add assumed contributors. NOCIt is a continuous system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise, and allelic drop-out, while being cognizant of allele frequencies in a reference population. When conditioned on a known contributor, we found that the mode of the APP distribution can shift to one greater when compared with the circumstance where no known contributor is assumed, and that occurred most often when the assumed contributor was the minor constituent to the mixture. In a development of a result of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establishes the NOC can be treated as a nuisance variable in the computation of a likelihood ratio between the prosecution and defense hypotheses, we show that this computation must not only use coincident models, but also coincident contextual information. The results reported here, therefore, illustrate the power of modern probabilistic systems to assess full weights-of-evidence, and to provide information on reasonable NOC ranges across multiple contexts.
Asunto(s)
Dermatoglifia del ADN , Alelos , Teorema de Bayes , ADN , HumanosRESUMEN
Complex DNA mixtures are challenging to interpret and require computational tools that aid in that interpretation. Recently, several computational methods that estimate the number of contributors (NOC) to a sample have been developed. Unlike analogous tools that interpret profiles and report LRs, NOC tools vary widely in their operational principle where some are Bayesian and others are machine learning tools. Conjunctionally, NOC tools may return a single n estimate, or a distribution on n. This vast array of constructs, coupled with a gap in standardized methods by which to validate NOC systems, warrants an exploration into the measures by which differing NOC systems might be tested for operations. In the current paper, we use two exemplar NOC systems: a probabilistic system named NOCIt, which renders an a posteriori probability (APP) distribution on the number of contributors given an electropherogram and an artificial neural network (ANN). NOCIt is a continuous Bayesian inference system incorporating models of peak height, degradation, differential degradation, forward and reverse stutter, noise and allelic drop-out while considering allele frequencies in a reference population. The ANN is also a continuous method, taking all the same features (barring degradation) into account. Unlike its Bayesian counterpart, it demands substantively more data to parameterize, requiring synthetic data. We explore each system's performance by conducting tests on 214 PROVEDIt mixtures where the limit of detection was 1-copy of DNA. We found that after a lengthy training period of approximately 24â¯h, the ANN's evaluation process was very fast and perfectly repeatable. In contrast, NOCIt only took a few minutes to train but took tens of minutes to complete each sample and was less repeatable. In addition, it rendered a probability distribution that was more sensitive and specific, affording a reasonable method by which to report all reasonable n that explain the evidence for a given sample. Whatever the method, by acknowledging the inherent differences between NOC systems, we demonstrate that validation constructs will necessarily be guided by the needs of the forensic domain and be dependent upon whether the laboratory seeks to assign a single n or range of n.
Asunto(s)
Dermatoglifia del ADN , Repeticiones de Microsatélite , Teorema de Bayes , ADN/genética , Humanos , Redes Neurales de la ComputaciónRESUMEN
Current analysis of forensic DNA stains relies on the probabilistic interpretation of bulk-processed samples that represent mixed profiles consisting of an unknown number of potentially partial representations of each contributor. Single-cell methods, in contrast, offer a solution to the forensic DNA mixture problem by incorporating a step that separates cells before extraction. A forensically relevant single-cell pipeline relies on efficient direct-to-PCR extractions that are compatible with standard downstream forensic reagents. Here we demonstrate the feasibility of implementing single-cell pipelines into the forensic process by exploring four metrics of electropherogram (EPG) signal quality-i.e., allele detection rates, peak heights, peak height ratios, and peak height balance across low- to high-molecular-weight short tandem repeat (STR) markers-obtained with four direct-to-PCR extraction treatments and a common post-PCR laboratory procedure. Each treatment was used to extract DNA from 102 single buccal cells, whereupon the amplification reagents were immediately added to the tube and the DNA was amplified/injected using post-PCR conditions known to elicit a limit of detection (LoD) of one DNA molecule. The results show that most cells, regardless of extraction treatment, rendered EPGs with at least a 50% true positive allele detection rate and that allele drop-out was not cell independent. Statistical tests demonstrated that extraction treatments significantly impacted all metrics of EPG quality, where the Arcturus® PicoPure™ extraction method resulted in the lowest median allele drop-out rate, highest median average peak height, highest median average peak height ratio, and least negative median values of EPG sloping for GlobalFiler™ STR loci amplified at half volume. We, therefore, conclude the feasibility of implementing single-cell pipelines for casework purposes and demonstrate that inferential systems assuming cell independence will not be appropriate in the probabilistic interpretation of a collection of single-cell EPGs.
Asunto(s)
Alelos , Dermatoglifia del ADN/métodos , ADN/análisis , ADN/aislamiento & purificación , Reacción en Cadena de la Polimerasa/métodos , Análisis de la Célula Individual , Electroforesis Capilar , Humanos , Límite de Detección , Repeticiones de Microsatélite , Mucosa BucalRESUMEN
Forensic DNA signal is notoriously challenging to interpret and requires the implementation of computational tools that support its interpretation. While data from high-copy, low-contributor samples result in electropherogram signal that is readily interpreted by probabilistic methods, electropherogram signal from forensic stains is often garnered from low-copy, high-contributor-number samples and is frequently obfuscated by allele sharing, allele drop-out, stutter and noise. Since forensic DNA profiles are too complicated to quantitatively assess by manual methods, continuous, probabilistic frameworks that draw inferences on the Number of Contributors (NOC) and compute the Likelihood Ratio (LR) given the prosecution's and defense's hypotheses have been developed. In the current paper, we validate a new version of the NOCIt inference platform that determines an A Posteriori Probability (APP) distribution of the number of contributors given an electropherogram. NOCIt is a continuous inference system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise and allelic drop-out while taking into account allele frequencies in a reference population. We established the algorithm's performance by conducting tests on samples that were representative of types often encountered in practice. In total, we tested NOCIt's performance on 815 degraded, UV-damaged, inhibited, differentially degraded, or uncompromised DNA mixture samples containing up to 5 contributors. We found that the model makes accurate, repeatable and reliable inferences about the NOCs and significantly outperformed methods that rely on signal filtering. By leveraging recent theoretical results of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establish the NOC can be treated as a nuisance variable, we demonstrated that when NOCIt's APP is used in conjunction with a downstream likelihood ratio (LR) inference system that employs the same probabilistic model, a full evaluation across multiple contributor numbers is rendered. This work, therefore, illustrates the power of modern probabilistic systems to report holistic and interpretable weights-of-evidence to the trier-of-fact without assigning a specified number of contributors or filtering signal.
Asunto(s)
Dermatoglifia del ADN , ADN/genética , Funciones de Verosimilitud , Genética Forense/métodos , Humanos , Modelos EstadísticosRESUMEN
BACKGROUND: In order to isolate an individual's genotype from a sample of biological material, most laboratories use PCR and Capillary Electrophoresis (CE) to construct a genetic profile based on polymorphic loci known as Short Tandem Repeats (STRs). The resulting profile consists of CE signal which contains information about the length and number of STR units amplified. For samples collected from the environment, interpretation of the signal can be challenging given that information regarding the quality and quantity of the DNA is often limited. The signal can be further compounded by the presence of noise and PCR artifacts such as stutter which can mask or mimic biological alleles. Because manual interpretation methods cannot comprehensively account for such nuances, it would be valuable to develop a signal model that can effectively characterize the various components of STR signal independent of a priori knowledge of the quantity or quality of DNA. RESULTS: First, we seek to mathematically characterize the quality of the profile by measuring changes in the signal with respect to amplicon size. Next, we examine the noise, allele, and stutter components of the signal and develop distinct models for each. Using cross-validation and model selection, we identify a model that can be effectively utilized for downstream interpretation. Finally, we show an implementation of the model in NOCIt, a software system that calculates the a posteriori probability distribution on the number of contributors. CONCLUSION: The model was selected using a large, diverse set of DNA samples obtained from 144 different laboratory conditions; with DNA amounts ranging from a single copy of DNA to hundreds of copies, and the quality of the profiles ranging from pristine to highly degraded. Implemented in NOCIt, the model enables a probabilisitc approach to estimating the number of contributors to complex, environmental samples.
Asunto(s)
Electroforesis Capilar/métodos , Repeticiones de Microsatélite/genética , Modelos Estadísticos , Alelos , ADN/genética , Humanos , Probabilidad , Programas InformáticosRESUMEN
Continuous mixture interpretation methods that employ probabilistic genotyping to compute the Likelihood Ratio (LR) utilize more information than threshold-based systems. The continuous interpretation schemes described in the literature, however, do not all use the same underlying probabilistic model and standards outlining which probabilistic models may or may not be implemented into casework do not exist; thus, it is the individual forensic laboratory or expert that decides which model and corresponding software program to implement. For countries, such as the United States, with an adversarial legal system, one can envision a scenario where two probabilistic models are used to present the weight of evidence, and two LRs are presented by two experts. Conversely, if no independent review of the evidence is requested, one expert using one model may present one LR as there is no standard or guideline requiring the uncertainty in the LR estimate be presented. The choice of model determines the underlying probability calculation, and changes to it can result in non-negligible differences in the reported LR or corresponding verbal categorization presented to the trier-of-fact. In this paper, we study the impact of model differences on the LR and on the corresponding verbal expression computed using four variants of a continuous mixture interpretation method. The four models were tested five times each on 101, 1-, 2- and 3-person experimental samples with known contributors. For each sample, LRs were computed using the known contributor as the person of interest. In all four models, intra-model variability increased with an increase in the number of contributors and with a decrease in the contributor's template mass. Inter-model variability in the associated verbal expression of the LR was observed in 32 of the 195 LRs used for comparison. Moreover, in 11 of these profiles there was a change from LR > 1 to LR < 1. These results indicate that modifications to existing continuous models do have the potential to significantly impact the final statistic, justifying the continuation of broad-based, large-scale, independent studies to quantify the limits of reliability and variability of existing forensically relevant systems.
Asunto(s)
Dermatoglifia del ADN/métodos , Genética Forense/métodos , Algoritmos , Humanos , Funciones de Verosimilitud , Modelos Estadísticos , Programas Informáticos , Estados UnidosRESUMEN
The interpretation of DNA evidence may rely upon the assumption that the forensic short tandem repeat (STR) profile is composed of multiple genotypes, or partial genotypes, originating from n contributors. In cases where the number of contributors (NOC) is in dispute, it may be justifiable to compute likelihood ratios that utilize different NOC parameters in the numerator and denominator, or present different likelihoods separately. Therefore, in this work, we evaluate the impact of allele dropout on estimating the NOC for simulated mixtures with up to six contributors in the presence or absence of a major contributor. These simulations demonstrate that in the presence of dropout, or with the application of an analytical threshold (AT), estimating the NOC using counting methods was unreliable for mixtures containing one or more minor contributors present at low levels. The number of misidentifications was only slightly reduced when we expand the number of STR loci from 16 to 21. In many of the simulations tested herein, the minimum and actual NOC differed by more than two, suggesting that low-template, high-order mixtures with allele counts fewer than six may be originating from as many as four-, five-, or six-persons. Thus, there is justification for the use of differing or multiple assumptions on the NOC when computing the weight of DNA evidence for low-template mixtures, particularly when the peak heights are in the vicinity of the signal threshold or allele counting methods are the mechanism by which the NOC is assessed.
Asunto(s)
Mezclas Complejas/genética , Dermatoglifia del ADN/métodos , ADN/genética , Genética Forense/métodos , Algoritmos , Alelos , Genotipo , Humanos , Funciones de Verosimilitud , Repeticiones de Microsatélite , Manejo de EspecímenesRESUMEN
DNA-based human identity testing is conducted by comparison of PCR-amplified polymorphic Short Tandem Repeat (STR) motifs from a known source with the STR profiles obtained from uncertain sources. Samples such as those found at crime scenes often result in signal that is a composite of incomplete STR profiles from an unknown number of unknown contributors, making interpretation an arduous task. To facilitate advancement in STR interpretation challenges we provide over 25,000 multiplex STR profiles produced from one to five known individuals at target levels ranging from one to 160 copies of DNA. The data, generated under 144 laboratory conditions, are classified by total copy number and contributor proportions. For the 70% of samples that were synthetically compromised, we report the level of DNA damage using quantitative and end-point PCR. In addition, we characterize the complexity of the signal by exploring the number of detected alleles in each profile.
Asunto(s)
Dermatoglifia del ADN , Conjuntos de Datos como Asunto , Repeticiones de Microsatélite , Alelos , Daño del ADN , Genética Forense , Genotipo , Humanos , Reacción en Cadena de la PolimerasaRESUMEN
Samples containing low-copy numbers of DNA are routinely encountered in casework. The signal acquired from these sample types can be difficult to interpret as they do not always contain all of the genotypic information from each contributor, where the loss of genetic information is associated with sampling and detection effects. The present work focuses on developing a validation scheme to aid in mitigating the effects of the latter. We establish a scheme designed to simultaneously improve signal resolution and detection rates without costly large-scale experimental validation studies by applying a combined simulation and experimental based approach. Specifically, we parameterize an in silico DNA pipeline with experimental data acquired from the laboratory and use this to evaluate multifarious scenarios in a cost-effective manner. Metrics such as signal1copy-to-noise resolution, false positive and false negative signal detection rates are used to select tenable laboratory parameters that result in high-fidelity signal in the single-copy regime. We demonstrate that the metrics acquired from simulation are consistent with experimental data obtained from two capillary electrophoresis platforms and various injection parameters. Once good resolution is obtained, analytical thresholds can be determined using detection error tradeoff analysis, if necessary. Decreasing the limit of detection of the forensic process to one copy of DNA is a powerful mechanism by which to increase the information content on minor components of a mixture, which is particularly important for probabilistic system inference. If the forensic pipeline is engineered such that high-fidelity electropherogram signal is obtained, then the likelihood ratio (LR) of a true contributor increases and the probability that the LR of a randomly chosen person is greater than one decreases. This is, potentially, the first step towards standardization of the analytical pipeline across operational laboratories.
Asunto(s)
Dermatoglifia del ADN/normas , Electroforesis Capilar , Humanos , Funciones de Verosimilitud , Límite de Detección , Repeticiones de Microsatélite , Método de Montecarlo , Reproducibilidad de los ResultadosRESUMEN
In forensic DNA casework, the interpretation of an evidentiary profile may be dependent upon the assumption on the number of individuals from whom the evidence arose. Three methods of inferring the number of contributors-NOCIt, maximum likelihood estimator, and maximum allele count, were evaluated using 100 test samples consisting of one to five contributors and 0.5-0.016 ng template DNA amplified with Identifiler® Plus and PowerPlex® 16 HS. Results indicate that NOCIt was the most accurate method of the three, requiring 0.07 ng template DNA from any one contributor to consistently estimate the true number of contributors. Additionally, NOCIt returned repeatable results for 91% of samples analyzed in quintuplicate, while 50 single-source standards proved sufficient to calibrate the software. The data indicate that computational methods that employ a quantitative, probabilistic approach provide improved accuracy and additional pertinent information such as the uncertainty associated with the inferred number of contributors.
Asunto(s)
Dermatoglifia del ADN/métodos , ADN/genética , Alelos , ADN/análisis , Frecuencia de los Genes , Humanos , Funciones de Verosimilitud , Repeticiones de Microsatélite , Método de Montecarlo , Reacción en Cadena de la Polimerasa , Reproducibilidad de los ResultadosRESUMEN
Short tandem repeat (STR) profiling from DNA samples has long been the bedrock of human identification. The laboratory process is composed of multiple procedures that include quantification, sample dilution, PCR, electrophoresis, and fragment analysis. The end product is a short tandem repeat electropherogram comprised of signal from allele, artifacts, and instrument noise. In order to optimize or alter laboratory protocols, a large number of validation samples must be created at significant expense. As a tool to support that process and to enable the exploration of complex scenarios without costly sample creation, a mechanistic stochastic model that incorporates each of the aforementioned processing features is described herein. The model allows rapid in silico simulation of electropherograms from multicontributor samples and enables detailed investigations of involved scenarios. An implementation of the model that is parameterized by extensive laboratory data is publically available. To illustrate its utility, the model was employed in order to evaluate the effects of sample dilutions, injection time, and cycle number on peak height, and the nature of stutter ratios at low template. We verify the model's findings by comparison with experimentally generated data.
Asunto(s)
Simulación por Computador , Variaciones en el Número de Copia de ADN , ADN/análisis , Electroforesis Capilar/métodos , Reacción en Cadena de la Polimerasa/métodos , Alelos , Dermatoglifia del ADN , Humanos , Repeticiones de Microsatélite , Sensibilidad y EspecificidadRESUMEN
In forensic DNA interpretation, the likelihood ratio (LR) is often used to convey the strength of a match. Expanding on binary and semi-continuous methods that do not use all of the quantitative data contained in an electropherogram, fully continuous methods to calculate the LR have been created. These fully continuous methods utilize all of the information captured in the electropherogram, including the peak heights. Recently, methods that calculate the distribution of the LR using semi-continuous methods have also been developed. The LR distribution has been proposed as a way of studying the robustness of the LR, which varies depending on the probabilistic model used for its calculation. For example, the LR distribution can be used to calculate the p-value, which is the probability that a randomly chosen individual results in a LR greater than the LR obtained from the person-of-interest (POI). Hence, the p-value is a statistic that is different from, but related to, the LR; and it may be interpreted as the false positive rate resulting from a binary hypothesis test between the prosecution and defense hypotheses. Here, we present CEESIt, a method that combines the twin features of a fully continuous model to calculate the LR and its distribution, conditioned on the defense hypothesis, along with an associated p-value. CEESIt incorporates dropout, noise and stutter (reverse and forward) in its calculation. As calibration data, CEESIt uses single source samples with known genotypes and calculates a LR for a specified POI on a question sample, along with the LR distribution and a p-value. The method was tested on 303 files representing 1-, 2- and 3-person samples injected using three injection times containing between 0.016 and 1 ng of template DNA. Our data allows us to evaluate changes in the LR and p-value with respect to the complexity of the sample and to facilitate discussions regarding complex DNA mixture interpretation. We observed that the amount of template DNA from the contributor impacted the LR--small LRs resulted from contributors with low template masses. Moreover, as expected, we observed a decrease of p-values as the LR increased. A p-value of 10(-9) or lower was achieved in all the cases where the LR was greater than 10(8). We tested the repeatability of CEESIt by running all samples in duplicate and found the results to be repeatable.
Asunto(s)
Mezclas Complejas/análisis , Mezclas Complejas/genética , Dermatoglifia del ADN/métodos , ADN/análisis , ADN/genética , Repeticiones de Microsatélite , Genotipo , Humanos , Funciones de Verosimilitud , Modelos Genéticos , Modelos EstadísticosRESUMEN
Impacts of validation design on DNA signal were explored, and the level of variation introduced by injection, capillary changes, amplification, and kit lot was surveyed by examining a set of replicate samples ranging in mass from 0.25 to 0.008 ng. The variations in peak height, heterozygous balance, dropout probabilities, and baseline noise were compared using common statistical techniques. Data indicate that amplification is the source of the majority of the variation observed in the peak heights, followed by capillary lots. The use of different amplification kit lots did not introduce variability into the peak heights, heterozygous balance, dropout, or baseline. Thus, if data from case samples run over a significant time period are not available during validation, the validation must be designed to, at a minimum, include the amplification of multiple samples of varying quantity, with known genotype, amplified and run over an extended period of time using multiple pipettes and capillaries.
Asunto(s)
ADN/genética , Manejo de Especímenes/métodos , Dermatoglifia del ADN , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodosRESUMEN
There are three dominant contributing factors that distort short tandem repeat profile measurements, two of which, stutter and variations in the allelic peak heights, have been described extensively. Here we characterise the remaining component, baseline noise. A probabilistic characterisation of the non-allelic noise peaks is not only inherently useful for statistical inference but is also significant for establishing a detection threshold. We do this by analysing the data from 643 single person profiles for the Identifiler Plus kit and 303 for the PowerPlex 16 HS kit. This investigation reveals that although the dye colour is a significant factor, it is not sufficient to have a per-dye colour description of the noise. Furthermore, we show that at a per-locus basis, out of the Gaussian, log-normal, and gamma distribution classes, baseline noise is best described by log-normal distributions and provide a methodology for setting an analytical threshold based on that deduction. In the PowerPlex 16 HS kit, we observe evidence of significant stutter at two repeat units shorter than the allelic peak, which has implications for the definition of baseline noise and signal interpretation. In general, the DNA input mass has an influence on the noise distribution. Thus, it is advisable to study noise and, consequently, to infer quantities like the analytical threshold from data with a DNA input mass comparable to the DNA input mass of the samples to be analysed.