RESUMO
Forensic genetic investigations typically rely on analysis of DNA for attribution purposes. There are times, however, when the amount and/or the quality of the DNA is limited, and thus little or no information can be obtained regarding the source of the sample. An alternative biochemical target that also contains genetic signatures is protein. One class of genetic signatures is protein polymorphisms that are a direct consequence of simple/single/short nucleotide polymorphisms (SNPs) in DNA. However, to interpret protein polymorphisms in a forensic context, certain complexities must be understood and addressed. These complexities include: 1) SNPs can generate 0, 1, or arbitrarily many polymorphisms in a polypeptide; and 2) as an object of expression that is modulated by alleles, genes and interactions with the environment, proteins may be present or absent in a given sample. To address these issues, a novel approach was taken to generate the expected protein alleles in a reference sample based on whole genome (or exome) sequence data and assess the significance of the evidence using a haplotype-based semi-continuous likelihood algorithm that leverages whole proteome data. Converting the genomic information into the proteomic information allows for the zero-to-many relationship between SNPs and GVPs to be abstracted away. When viewed as a haplotype, many GVPs that correspond to the same SNP is equivalent to many SNPs in perfect linkage disequilibrium (LD). As long as the likelihood formulation correctly accounts for LD, the correspondence between the SNP and the proteome can be safely neglected. Tests were performed on simulated samples, including single-source and two-person mixtures, and the power of using a classical semi-continuous likelihood versus one that has been adapted to neglect drop-out was compared. Additionally, summary statistics and a rudimentary set of decision guidelines were introduced to help identify mixtures from protein data.
Assuntos
Proteoma , Proteômica , DNA/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Peptídeos/análise , Peptídeos/genética , Polimorfismo de Nucleotídeo Único , Proteoma/genética , Análise de Sequência de DNARESUMO
We present an efficient protein extraction and in-solution enzymatic digestion protocol optimized for mass spectrometry-based proteomics studies of human skin samples. Human skin cells are a proteinaceous matrix that can enable forensic identification of individuals. We performed a systematic optimization of proteomic sample preparation for a protein-based human forensic identification application. Digestion parameters, including incubation duration, temperature, and the type and concentration of surfactant, were systematically varied to maximize digestion completeness. Through replicate digestions, parameter optimization was performed to maximize repeatability and increase the number of identified peptides and proteins. Final digestion conditions were selected based on the parameters that yielded the greatest percent of peptides with zero missed tryptic cleavages, which benefit the analysis of genetically variable peptides (GVPs). We evaluated the final digestion conditions for identification of GVPs by applying MS-based proteomics on a mixed-donor sample. The results were searched against a human proteome database appended with a database of GVPs constructed from known non-synonymous single nucleotide polymorphisms (SNPs) that occur at known population frequencies. The aim of this study was to demonstrate the potential of our proteomics sample preparation for future implementation of GVP analysis by forensic laboratories to facilitate human identification. SIGNIFICANCE: Genetically variable peptides (GVPs) can provide forensic evidence that is complementary to traditional DNA profiling and be potentially used for human identification. An efficient protein extraction and reproducible digestion method of skin proteins is a key contributor for downstream analysis of GVPs and further development of this technology in forensic application. In this study, we optimized the enzymatic digestion conditions, such as incubation time and temperature, for skin samples. Our study is among the first attempts towards optimization of proteomics sample preparation for protein-based skin identification in forensic applications such as touch samples. Our digestion method employs RapiGest (an acid-labile surfactant), trypsin enzymatic digestion, and an incubation time of 16 h at 37 °C.
Assuntos
Peptídeos , Proteômica , Medicina Legal , Humanos , Espectrometria de Massas , Proteoma , TripsinaRESUMO
For the past three decades, forensic genetic investigations have focused on elucidating DNA signatures. While DNA has a number of desirable properties (e.g., presence in most biological materials, an amenable chemistry for analysis and well-developed statistics), DNA also has limitations. DNA may be in low quantity in some tissues, such as hair, and in some tissues it may degrade more readily than its protein counterparts. Recent research efforts have shown the feasibility of performing protein-based human identification in cases in which recovery of DNA is challenged; however, the methods involved in assessing the rarity of a given protein profile have not been addressed adequately. In this paper an algorithm is proposed that describes the computation of a random match probability (RMP) resulting from a genetically variable peptide signature. The approach described herein explicitly models proteomic error and genetic linkage, makes no assumptions as to allelic drop-out, and maps the observed proteomic alleles to their expected protein products from DNA which, in turn, permits standard corrections for population structure and finite database sizes. To assess the feasibility of this approach, RMPs were estimated from peptide profiles of skin samples from 25 individuals of European ancestry. 126 common peptide alleles were used in this approach, yielding a mean RMP of approximately 10-2.