RESUMO
The number of contributors (NOC) to (complex) autosomal STR profiles cannot be determined with absolute certainty due to complicating factors such as allele sharing and allelic drop-out. The precision of NOC estimations can be improved by increasing the number of (highly polymorphic) markers, the use of massively parallel sequencing instead of capillary electrophoresis, and/or using more profile information than only the allele counts. In this study, we focussed on machine learning approaches in order to make maximum use of the profile information. To this end, a set of 590 PowerPlex® Fusion 6C profiles with one up to five contributors were generated from a total of 1174 different donors. This set varied for the template amount of DNA, mixture proportion, levels of allele sharing, allelic drop-out and degradation. The dataset contained labels with known NOC and was split into a training, test and hold-out set. The training set was used to optimize ten different algorithms with selection of profile characteristics. Per profile, over 250 characteristics, denoted 'features', were calculated. These features were based on allele counts, peak heights and allele frequencies. The features that were most related to the NOC were selected based on partial correlation using the training set. Next, the performance of each model (=combination of features plus algorithm) was examined using the test set. A random forest classifier with 19 features, denoted the 'RFC19-model' showed best performance and was selected for further validation. Results showed improved accuracy compared to the conventional maximum allele count approach and an in-house nC-tool based on the total allele count. The method is extremely fast and regarded useful for application in forensic casework.
Assuntos
Impressões Digitais de DNA/métodos , DNA/genética , Aprendizado de Máquina , Repetições de Microssatélites , Algoritmos , Alelos , Degradação Necrótica do DNA , Frequência do Gene , HumanosRESUMO
Searching a national DNA database with complex and incomplete profiles usually yields very large numbers of possible matches that can present many candidate suspects to be further investigated by the forensic scientist and/or police. Current practice in most forensic laboratories consists of ordering these 'hits' based on the number of matching alleles with the searched profile. Thus, candidate profiles that share the same number of matching alleles are not differentiated and due to the lack of other ranking criteria for the candidate list it may be difficult to discern a true match from the false positives or notice that all candidates are in fact false positives. SmartRank was developed to put forward only relevant candidates and rank them accordingly. The SmartRank software computes a likelihood ratio (LR) for the searched profile and each profile in the DNA database and ranks database entries above a defined LR threshold according to the calculated LR. In this study, we examined for mixed DNA profiles of variable complexity whether the true donors are retrieved, what the number of false positives above an LR threshold is and the ranking position of the true donors. Using 343 mixed DNA profiles over 750 SmartRank searches were performed. In addition, the performance of SmartRank and CODIS were compared regarding DNA database searches and SmartRank was found complementary to CODIS. We also describe the applicable domain of SmartRank and provide guidelines. The SmartRank software is open-source and freely available. Using the best practice guidelines, SmartRank enables obtaining investigative leads in criminal cases lacking a suspect.
Assuntos
Impressões Digitais de DNA , Bases de Dados de Ácidos Nucleicos , Funções Verossimilhança , Software , Genética Forense , HumanosRESUMO
A number of new computer programs have recently been developed to facilitate the interpretation and statistical weighting of complex DNA profiles in forensic casework. Acceptance of such software in the user community, and subsequent acceptance by the court, relies heavily upon their validation. To date, few guidelines exist that describe the appropriate and sufficient validation of such software used in forensic DNA casework. In this paper, we discuss general principles of software validation and how they could be applied to the interpretation software now being introduced into the forensic community. Importantly, we clarify the relationship between a statistical model and its implementation via software. We use the LRmix program to provide specific examples of how these principles can be implemented.
Assuntos
Impressões Digitais de DNA , Genótipo , Funções Verossimilhança , Software , HumanosRESUMO
Minute amounts of DNA representing only few diploid cells, may be interrogated using enhanced DNA profiling, which will be accompanied by stochastic amplification effects. Notwithstanding, a weight of evidence statistic may be calculated using current interpretation software. In this study, we profiled single donor, two- and three-person samples having only 3 pg to 12 pg of DNA per contributor using both standard and enhanced capillary electrophoresis (CE) injection settings. Likelihood ratios (LRs) were computed using LRmix Studio, compared for both types of profiles and examined in relation to the amount of DNA, drop-out level, number of detected alleles, peak heights and reproducibility of alleles. Especially for DNA profiles that were generated using enhanced CE, the obtained LRs could indicate strong evidence in favour of the prosecution (log10(LR)>6), also when the amount of DNA represented about half of a diploid cell equivalent in the amplification. These results illustrate that an assessment of the criminalistic relevance of a sample carrying minute amounts of DNA is essential prior to applying enhanced interrogation techniques and/or calculating a weight of evidence statistic.
Assuntos
Impressões Digitais de DNA/métodos , DNA/análise , Repetições de Microssatélites , DNA/genética , Frequência do Gene , Humanos , Funções VerossimilhançaRESUMO
Interpretation of DNA mixtures with three or more contributors, defined here as high order mixtures, is difficult because of the inevitability of allele sharing. Allele sharing complicates the estimation of the number of contributors, which is an important parameter to assess the probative value. Consequently, these mixtures may not be deemed suitable for interpretation and reporting. In this study, we generated three-, four- and five-person mixtures with little or no drop-out and with varying levels of allele sharing. For these DNA mixtures we computed likelihood ratios (LRs) using the LRmix model, and always using persons of interest that are true contributors. We assessed the influence of different scenarios on the LR, and used (1) the true or an incorrect number of contributors, (2) zero, one or two anchored individuals and (3) an equal number of contributors under Hp and Hd or an extra contributor under Hd. It was shown that the LR varied considerably when the hypotheses used an incorrect number of contributors, especially when individuals were anchored under the hypotheses. Overall, when analysing high order mixtures, there may occur a transition from LR greater than one to less than one if an incorrect number of contributors is conditioned. This is a result of allele sharing among the multiple contributors rather than allele drop-out, since this study only utilised samples with little or no drop-out.
Assuntos
DNA/genética , Alelos , Humanos , Funções Verossimilhança , Repetições de Microssatélites/genéticaRESUMO
The introduction of Short Tandem Repeat (STR) DNA was a revolution within a revolution that transformed forensic DNA profiling into a tool that could be used, for the first time, to create National DNA databases. This transformation would not have been possible without the concurrent development of fluorescent automated sequencers, combined with the ability to multiplex several loci together. Use of the polymerase chain reaction (PCR) increased the sensitivity of the method to enable the analysis of a handful of cells. The first multiplexes were simple: 'the quad', introduced by the defunct UK Forensic Science Service (FSS) in 1994, rapidly followed by a more discriminating 'six-plex' (Second Generation Multiplex) in 1995 that was used to create the world's first national DNA database. The success of the database rapidly outgrew the functionality of the original system - by the year 2000 a new multiplex of ten-loci was introduced to reduce the chance of adventitious matches. The technology was adopted world-wide, albeit with different loci. The political requirement to introduce pan-European databases encouraged standardisation - the development of European Standard Set (ESS) of markers comprising twelve-loci is the latest iteration. Although development has been impressive, the methods used to interpret evidence have lagged behind. For example, the theory to interpret complex DNA profiles (low-level mixtures), had been developed fifteen years ago, but only in the past year or so, are the concepts starting to be widely adopted. A plethora of different models (some commercial and others non-commercial) have appeared. This has led to a confusing 'debate' about the 'best' to use. The different models available are described along with their advantages and disadvantages. A section discusses the development of national DNA databases, along with details of an associated controversy to estimate the strength of evidence of matches. Current methodology is limited to searches of complete profiles - another example where the interpretation of matches has not kept pace with development of theory. STRs have also transformed the area of Disaster Victim Identification (DVI) which frequently requires kinship analysis. However, genotyping efficiency is complicated by complex, degraded DNA profiles. Finally, there is now a detailed understanding of the causes of stochastic effects that cause DNA profiles to exhibit the phenomena of drop-out and drop-in, along with artefacts such as stutters. The phenomena discussed include: heterozygote balance; stutter; degradation; the effect of decreasing quantities of DNA; the dilution effect.
Assuntos
Impressões Digitais de DNA/métodos , DNA/genética , Genética Forense/métodos , Técnicas de Genotipagem/tendências , Repetições de Microssatélites , Reação em Cadeia da Polimerase Multiplex/métodos , DNA/análise , Impressões Digitais de DNA/tendências , Bases de Dados de Ácidos Nucleicos , Genética Forense/tendências , Genética Populacional , Humanos , Reação em Cadeia da Polimerase Multiplex/tendências , População BrancaRESUMO
The interpretation of mixed DNA profiles obtained from low template DNA samples has proven to be a particularly difficult task in forensic casework. Newly developed likelihood ratio (LR) models that account for PCR-related stochastic effects, such as allelic drop-out, drop-in and stutters, have enabled the analysis of complex cases that would otherwise have been reported as inconclusive. In such samples, there are uncertainties about the number of contributors, and the correct sets of propositions to consider. Using experimental samples, where the genotypes of the donors are known, we evaluated the feasibility and the relevance of the interpretation of high order mixtures, of three, four and five donors. The relative risks of analyzing high order mixtures of three, four, and five donors, were established by comparison of a 'gold standard' LR, to the LR that would be obtained in casework. The 'gold standard' LR is the ideal LR: since the genotypes and number of contributors are known, it follows that the parameters needed to compute the LR can be determined per contributor. The 'casework LR' was calculated as used in standard practice, where unknown donors are assumed; the parameters were estimated from the available data. Both LRs were calculated using the basic standard model, also termed the drop-out/drop-in model, implemented in the LRmix module of the R package Forensim. We show how our results furthered the understanding of the relevance of analyzing high order mixtures in a forensic context. Limitations are highlighted, and it is illustrated how our study serves as a guide to implement likelihood ratio interpretation of complex DNA profiles in forensic casework.
Assuntos
Misturas Complexas/análise , DNA/análise , Genética Forense/métodos , Misturas Complexas/genética , DNA/sangue , DNA/genética , DNA/isolamento & purificação , Humanos , Funções Verossimilhança , Masculino , Modelos Genéticos , Modelos Estatísticos , ProbabilidadeRESUMO
If complex DNA profiles, conditioned on multiple individuals are evaluated, it may be difficult to assess the strength of the evidence based on the likelihood ratio. A likelihood ratio does not give information about the relative weights that are provided by separate contributors. Alternatively, the observed likelihood ratio can be evaluated with respect to the distribution of the likelihood ratio under the defense hypothesis. We present an efficient algorithm to compute an exact distribution of likelihood ratios that can be applied to any LR-based model. The distribution may have several applications, but is used here to compute a p-value that corresponds to the observed likelihood ratio. The p-value is the probability that a profile under the defense hypothesis, substituted for a questioned contributor e.g. suspect, would attain a likelihood ratio which is at least the same magnitude as that observed. The p-value can be thought of as a scaled version of the likelihood ratio, giving a quantitative measure of the strength of the evidence relative to the specified hypotheses and the model used for the analysis. The algorithm is demonstrated on examples based on real data. R code for the algorithm is freely available in the R package euroMix.
Assuntos
Algoritmos , Impressões Digitais de DNA , Funções Verossimilhança , Modelos Genéticos , Frequência do Gene , Genótipo , HumanosRESUMO
Often in forensic cases, the profile of at least one of the contributors to a DNA evidence sample is unknown and a database search is needed to discover possible perpetrators. In this article we consider two types of search strategies to extract suspects from a database using methods based on probability arguments. The performance of the proposed match scores is demonstrated by carrying out a study of each match score relative to the level of allele drop-out in the crime sample, simulating low-template DNA. The efficiency was measured by random man simulation and we compared the performance using the SGM Plus kit and the ESX 17 kit for the Norwegian population, demonstrating that the latter has greatly enhanced power to discover perpetrators of crime in large national DNA databases. The code for the database extraction strategies will be prepared for release in the R-package forensim.
Assuntos
Impressões Digitais de DNA , DNA/análise , Bases de Dados de Ácidos Nucleicos , Armazenamento e Recuperação da Informação/métodos , Modelos Genéticos , Frequência do Gene , Genótipo , Humanos , Funções Verossimilhança , Repetições de MicrossatélitesRESUMO
Forensic analysis of low template (LT) DNA mixtures is particularly complicated when (1) LT components concur with high template components, (2) more than three contributors are present, or (3) contributors are related. In this study, we generated a set of such complex LT mixtures and examined two methods to assist in DNA profile analysis and interpretation: the "n/2" consensus method (Benschop et al. 2011) and the pool profile approach. N/2 consensus profiles include alleles that are reproducibly amplified in at least half of the replications. Pool profiles are generated by injecting a blend of independently amplified PCR products on a capillary electrophoresis instrument. Both approaches resulted in a similar increase in the percentage of detected alleles compared to individual profiles, and both rarely included drop-in alleles in case mixtures of pristine DNAs were used. Interestingly, the consensus and the pool profiles often showed differences for the actual alleles detected for the LT component(s). We estimated the number of contributors using different methods. Better approximations were obtained with data in the consensus and pool profiles compared to the data of the individual profiles. Consensus profiles contain allele calls only, while pool profiles consist of both allele calls and peak height information, which can be of use in (statistical) profile analysis. All advantages and limitations of the various types of profiles were assessed, and based on the results we infer that both consensus and pool profiles (or a combination thereof) are helpful in the interpretation of complex LT DNA mixtures.
Assuntos
Impressões Digitais de DNA/métodos , DNA/análise , DNA/genética , Alelos , Eletroforese Capilar , Humanos , Repetições de Microssatélites , Reação em Cadeia da Polimerase MultiplexRESUMO
The autosomal short tandem repeat (STR) kits that are currently used in forensic science have a high discrimination power. However, this discrimination power is sometimes not sufficient for complex kinship analyses or decreases when alleles are missing due to degradation of the DNA. The Investigator HDplex kit contains nine STRs that are additional to the commonly used forensic markers, and we validated this kit to assist human identification. With the increasing number of markers it becomes inevitable that forensic and kinship analyses include two or more STRs present on the same chromosome. To examine whether such markers can be regarded as independent, we evaluated the 30 STRs present in NGM, Identifiler and HDplex. Among these 30 markers, 17 syntenic STR pairs can be formed. Allelic association between these pairs was examined using 335 Dutch reference samples and no linkage disequilibrium was detected, which makes it possible to use the product rule for profile probability calculations in unrelated individuals. Linkage between syntenic STRs was studied by determining the recombination fraction between them in five three-generation CEPH families. The recombination fractions were compared to the physical and genetic distances between the markers. For most types of pedigrees, the kinship analyses can be performed using the product rule, and for those cases that require an alternative calculation method (Gill et al., Forensic Sci Int Genet 6:477-486, 2011), the recombination fractions as determined in this study can be used. Finally, we calculated the (combined) match probabilities, for the supplementary genotyping results of HDplex, NGM and Identifiler.
Assuntos
Alelos , Impressões Digitais de DNA/métodos , Genética Forense/métodos , Marcadores Genéticos/genética , Genética Populacional/métodos , Análise Heteroduplex/métodos , Repetições de Microssatélites/genética , Adulto , Idoso , Amelogenina/genética , Criança , Feminino , Frequência do Gene , Loci Gênicos/genética , Genótipo , Projeto HapMap , Humanos , Desequilíbrio de Ligação , Masculino , Países BaixosRESUMO
Complex DNA mixtures with low template (LT) components provide the most challenging cases to interpret and report. In this study, we designed such mixtures and we describe how reporting officers (ROs) at the Netherlands Forensic Institute (NFI) assess these when embedded in a mock case setting. DNA mixtures containing LT DNA from two to four contributors, sporadic contamination (mimicked by adding 6pg of DNA, which represents once cell equivalent) and/or DNA of relatives (brothers), were amplified four-fold using the AmpFlSTR(®) NGM™ PCR Amplification Kit. Consensus profiles were then generated which included the alleles detected in at least half of the replicates. Four mock cases were created by including reference profiles of a hypothetical victim and suspect. The mock cases were assessed by eight ROs following the stepwise interpretation approach currently in use at the NFI. With this approach, the results of the comparisons between the DNA profiles of the evidentiary trace and the reference profiles are classified into four categories of evidential value [1]. The interpretations by the ROs were compared to the likelihood ratios (LRs) obtained from a probabilistic model that allows a calculation of LRs to assist the interpretation of LT DNA evidence and both were compared to the true composition of the designed mixtures.
Assuntos
Alelos , Impressões Digitais de DNA/métodos , DNA/análise , DNA/genética , Funções Verossimilhança , Feminino , Humanos , Masculino , Repetições de Microssatélites , Modelos Genéticos , Reação em Cadeia da Polimerase MultiplexRESUMO
Determining the number of contributors to a forensic DNA mixture using maximum allele count is a common practice in many forensic laboratories. In this paper, we compare this method to a maximum likelihood estimator, previously proposed by Egeland et al., that we extend to the cases of multiallelic loci and population subdivision. We compared both methods' efficiency for identifying mixtures of two to five individuals in the case of uncertainty about the population allele frequencies and partial profiles. The proportion of correctly resolved mixtures was >90% for both estimators for two- and three-person mixtures, while likelihood maximization yielded success rates 2- to 15-fold higher for four- and five-person mixtures. Comparable results were obtained in the cases of uncertain allele frequencies and partial profiles. Our results support the use of the maximum likelihood estimator to report the number of contributors when dealing with complex DNA mixtures.
Assuntos
Impressões Digitais de DNA/métodos , Frequência do Gene , Funções Verossimilhança , Humanos , Grupos Raciais/genéticaRESUMO
Forensim is a new package for the R statistical software that is dedicated to forensic DNA evidence interpretation. As far as we know, forensim is the first open-source tool that allows for the simulation of data encountered in forensic genetics studies. The package also implements common statistical methods used for reporting the weight of DNA evidence. Forensim is written in the R language and is freely available from http://forensim.r-forge.r-project.org. This paper presents an overview of the software's functionalities.