RESUMO
Investigating DNA methylation (DNAm) in cardiac tissues is vital for epigenetic research in cardiovascular diseases (CVDs). During cardiac surgery, biopsies may not be immediately stored due to a lack of human or technical resources at the collection site. Assessing DNAm stability in cardiac samples left in suboptimal conditions is crucial for applying DNAm analysis. We investigated the stability of DNAm in human cardiac tissues kept at 4 °C and 22 °C for periods of 1, 7, 14, and 28 days (exposed samples) using the Illumina Infinium MethylationEPIC v1.0 BeadChip Array. We observed high correlations between samples analysed immediately after tissue collection and exposed ones (R2 > 0.992). Methylation levels were measured as ß-values and median absolute ß-value differences (|∆ß|) ranged from 0.0093 to 0.0119 in all exposed samples. Pairwise differentially methylated position (DMP) analysis revealed no DMPs under 4 °C (fridge temperature) exposure for up to 28 days and 22 °C (room temperature) exposure for one day, while 3,437, 6,918, and 3,824 DMPs were observed for 22 °C samples at 7, 14, and 28 days, respectively. This study provides insights into the stability of genome-wide DNAm, showing that cardiac tissue can be used for reliable DNAm analysis even when stored suboptimally after surgery.
Assuntos
Metilação de DNA , Miocárdio , Temperatura , Humanos , Miocárdio/metabolismo , Masculino , Fatores de Tempo , Epigênese Genética , Feminino , Idoso , Pessoa de Meia-IdadeRESUMO
Linear regression (LR) is vastly used in data analysis for continuous outcomes in biomedicine and epidemiology. Despite its popularity, LR is incompatible with missing data, which frequently occur in health sciences. For parameter estimation, this shortcoming is usually resolved by complete-case analysis or imputation. Both work-arounds, however, are inadequate for prediction, since they either fail to predict on incomplete records or ignore missingness-induced reduction in prediction accuracy and rely on (unrealistic) assumptions about the missing mechanism. Here, we derive adaptive predictor-set linear model (aps-lm), capable of making predictions for incomplete data without the need for imputation. It is derived by using a predictor-selection operation, the Moore-Penrose pseudoinverse, and the reduced QR decomposition. aps-lm is an LR generalization that inherently handles missing values. It is applied on a reference data set, where complete predictors and outcome are available, and yields a set of privacy-preserving parameters. In a second stage, these are shared for making predictions of the outcome on external data sets with missing entries for predictors without imputation. Moreover, aps-lm computes prediction errors that account for the pattern of missing values even under extreme missingness. We benchmark aps-lm in a simulation study. aps-lm showed greater prediction accuracy and reduced bias compared to popular imputation strategies under a wide range of scenarios including variation of sample size, goodness of fit, missing value type, and covariance structure. Finally, as a proof-of-principle, we apply aps-lm in the context of epigenetic aging clocks, linear models that predict a person's biological age from epigenetic data with promising clinical applications.
Assuntos
Biometria , Modelos Lineares , Biometria/métodos , HumanosRESUMO
Tobacco smoking is a frequent habit sustained by > 1.3 billion people in 2020 and the leading preventable factor for health risk and premature mortality worldwide. In the forensic context, predicting smoking habits from biological samples may allow broadening DNA phenotyping. In this study, we aimed to implement previously published smoking habit classification models based on blood DNA methylation at 13 CpGs. First, we developed a matching lab tool based on bisulfite conversion and multiplex PCR followed by amplification-free library preparation and targeted paired-end massively parallel sequencing (MPS). Analysis of six technical duplicates revealed high reproducibility of methylation measurements (Pearson correlation of 0.983). Artificially methylated standards uncovered marker-specific amplification bias, which we corrected via bi-exponential models. We then applied our MPS tool to 232 blood samples from Europeans of a wide age range, of which 90 were current, 71 former and 71 never smokers. On average, we obtained 189,000 reads/sample and 15,000 reads/CpG, without marker drop-out. Methylation distributions per smoking category roughly corresponded to previous microarray analysis, showcasing large inter-individual variation but with technology-driven bias. Methylation at 11 out of 13 smoking-CpGs correlated with daily cigarettes in current smokers, while solely one was weakly correlated with time since cessation in former smokers. Interestingly, eight smoking-CpGs correlated with age, and one displayed weak but significant sex-associated methylation differences. Using bias-uncorrected MPS data, smoking habits were relatively accurately predicted using both two- (current/non-current) and three- (never/former/current) category model, but bias correction resulted in worse prediction performance for both models. Finally, to account for technology-driven variation, we built new, joint models with inter-technology corrections, which resulted in improved prediction results for both models, with or without PCR bias correction (e.g. MPS cross-validation F1-score > 0.8; 2-categories). Overall, our novel assay takes us one step closer towards the forensic application of viable smoking habit prediction from blood traces. However, future research is needed towards forensically validating the assay, especially in terms of sensitivity. We also need to further shed light on the employed biomarkers, particularly on the mechanistics, tissue specificity and putative confounders of smoking epigenetic signatures.
Assuntos
Metilação de DNA , Fumar , Humanos , Reprodutibilidade dos Testes , Fumar/genética , Reação em Cadeia da Polimerase , Sequenciamento de Nucleotídeos em Larga Escala , Ilhas de CpG/genéticaRESUMO
Human microbiome research is moving from characterization and association studies to translational applications in medical research, clinical diagnostics, and others. One of these applications is the prediction of human traits, where machine learning (ML) methods are often employed, but face practical challenges. Class imbalance in available microbiome data is one of the major problems, which, if unaccounted for, leads to spurious prediction accuracies and limits the classifier's generalization. Here, we investigated the predictability of smoking habits from class-imbalanced saliva microbiome data by combining data augmentation techniques to account for class imbalance with ML methods for prediction. We collected publicly available saliva 16S rRNA gene sequencing data and smoking habit metadata demonstrating a serious class imbalance problem, i.e., 175 current vs. 1,070 non-current smokers. Three data augmentation techniques (synthetic minority over-sampling technique, adaptive synthetic, and tree-based associative data augmentation) were applied together with seven ML methods: logistic regression, k-nearest neighbors, support vector machine with linear and radial kernels, decision trees, random forest, and extreme gradient boosting. K-fold nested cross-validation was used with the different augmented data types and baseline non-augmented data to validate the prediction outcome. Combining data augmentation with ML generally outperformed baseline methods in our dataset. The final prediction model combined tree-based associative data augmentation and support vector machine with linear kernel, and achieved a classification performance expressed as Matthews correlation coefficient of 0.36 and AUC of 0.81. Our method successfully addresses the problem of class imbalance in microbiome data for reliable prediction of smoking habits.
RESUMO
Pseudoclostridium thermosuccinogenes is a thermophilic bacterium capable of producing succinate from lignocellulosic-derived sugars and has the potential to be exploited as a platform organism. However, exploitation of P. thermosuccinogenes has been limited partly due to the genetic inaccessibility and lack of genome engineering tools. In this study, we established the genetic accessibility for P. thermosuccinogenes DSM 5809. By overcoming restriction barriers, transformation efficiencies of 102 CFU/µg plasmid DNA were achieved. To this end, the plasmid DNA was methylated in vivo when transformed into an engineered E. coli HST04 strain expressing three native methylation systems of the thermophile. This protocol was used to introduce a ThermodCas9-based CRISPRi tool targeting the gene encoding malic enzyme in P. thermosuccinogenes, demonstrating the principle of gene silencing. This resulted in 75% downregulation of its expression and had an impact on the strain's fermentation profile. Although the details of the functioning of the restriction modification systems require further study, in vivo methylation can already be applied to improve transformation efficiency of P. thermosuccinogenes. Making use of the ThermodCas9-based CRISPRi, this is the first example demonstrating that genetic engineering in P. thermosuccinogenes is feasible and establishing the way for metabolic engineering of this bacterium.
RESUMO
DNA methylation has become one of the most useful biomarkers for age prediction and body fluid identification in the forensic field. Therefore, several assays have been developed to detect age-associated and body fluid-specific DNA methylation changes. Among the many methods developed, SNaPshot-based assays should be particularly useful in forensic laboratories, as they permit multiplex analysis and use the same capillary electrophoresis instrumentation as STR analysis. However, technical validation of any developed assays is crucial for their proper integration into routine forensic workflow. In the present collaborative exercise, two SNaPshot multiplex assays for age prediction and a SNaPshot multiplex for body fluid identification were tested in twelve laboratories. The experimental set-up of the exercise was designed to reflect the entire workflow of SNaPshot-based methylation analysis and involved four increasingly complex tasks designed to detect potential factors influencing methylation measurements. The results of body fluid identification from each laboratory provided sufficient information to determine appropriate age prediction methods in subsequent analysis. In age prediction, systematic measurement differences resulting from the type of genetic analyzer used were identified as the biggest cause of DNA methylation variation between laboratories. Also, the use of a buffer that ensures a high ratio of specific to non-specific primer binding resulted in changes in DNA methylation measurement, especially when using degenerate primers in the PCR reaction. In addition, high input volumes of bisulfite-converted DNA often caused PCR failure, presumably due to carry-over of PCR inhibitors from the bisulfite conversion reaction. The proficiency of the analysts and experimental conditions for efficient SNaPshot reactions were also important for consistent DNA methylation measurement. Several bisulfite conversion kits were used for this study, but differences resulting from the use of any specific kit were not clearly discerned. Even when different experimental settings were used in each laboratory, a positive outcome of the study was a mean absolute age prediction error amongst participant's data of only 2.7 years for semen, 5.0 years for blood and 3.8 years for saliva.
Assuntos
Líquidos Corporais , Metilação de DNA , Pré-Escolar , Ilhas de CpG/genética , Genética Forense/métodos , Humanos , SalivaRESUMO
Over the last few years, advances in massively parallel sequencing technologies (also referred to next generation sequencing) and bioinformatics analysis tools have boosted our knowledge on the human microbiome. Such insights have brought new perspectives and possibilities to apply human microbiome analysis in many areas, particularly in medicine. In the forensic field, the use of microbial DNA obtained from human materials is still in its infancy but has been suggested as a potential alternative in situations when other human (non-microbial) approaches present limitations. More specifically, DNA analysis of a wide variety of microorganisms that live in and on the human body offers promises to answer various forensically relevant questions, such as post-mortem interval estimation, individual identification, and tissue/body fluid identification, among others. However, human microbiome analysis currently faces significant challenges that need to be considered and overcome via future forensically oriented human microbiome research to provide the necessary solutions. In this perspective article, we discuss the most relevant biological, technical and data-related issues and propose future solutions that will pave the way towards the integration of human microbiome analysis in the forensic toolkit.
Assuntos
Microbiota , Biologia Computacional , DNA/genética , Medicina Legal , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Single nucleotide polymorphism (SNP) data generated with microarray technologies have been used to solve murder cases via investigative leads obtained from identifying relatives of the unknown perpetrator included in accessible genomic databases, an approach referred to as investigative genetic genealogy (IGG). However, SNP microarrays were developed for relatively high input DNA quantity and quality, while DNA typically obtainable from crime scene stains is of low DNA quantity and quality, and SNP microarray data obtained from compromised DNA are largely missing. By applying the Illumina Global Screening Array (GSA) to 264 DNA samples with systematically altered quantity and quality, we empirically tested the impact of SNP microarray analysis of compromised DNA on kinship classification success, as relevant in IGG. Reference data from manufacturer-recommended input DNA quality and quantity were used to estimate genotype accuracy in the compromised DNA samples and for simulating data of different degree relatives. Although stepwise decrease of input DNA amount from 200 ng to 6.25 pg led to decreased SNP call rates and increased genotyping errors, kinship classification success did not decrease down to 250 pg for siblings and 1st cousins, 1 ng for 2nd cousins, while at 25 pg and below kinship classification success was zero. Stepwise decrease of input DNA quality via increased DNA fragmentation resulted in the decrease of genotyping accuracy as well as kinship classification success, which went down to zero at the average DNA fragment size of 150 base pairs. Combining decreased DNA quantity and quality in mock casework and skeletal samples further highlighted possibilities and limitations. Overall, GSA analysis achieved maximal kinship classification success from 800 to 200 times lower input DNA quantities than manufacturer-recommended, although DNA quality plays a key role too, while compromised DNA produced false negative kinship classifications rather than false positive ones.
Assuntos
Impressões Digitais de DNA , DNA , DNA/genética , Genótipo , Humanos , Análise em Microsséries , Linhagem , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Information on long-term alcohol consumption is relevant for medical and public health research, disease therapy, and other areas. Recently, DNA methylation-based inference of alcohol consumption from blood was reported with high accuracy, but these results were based on employing the same dataset for model training and testing, which can lead to accuracy overestimation. Moreover, only subsets of alcohol consumption categories were used, which makes it impossible to extrapolate such models to the general population. By using data from eight population-based European cohorts (N = 4677), we internally and externally validated the previously reported biomarkers and models for epigenetic inference of alcohol consumption from blood and developed new models comprising all data from all categories. RESULTS: By employing data from six European cohorts (N = 2883), we empirically tested the reproducibility of the previously suggested biomarkers and prediction models via ten-fold internal cross-validation. In contrast to previous findings, all seven models based on 144-CpGs yielded lower mean AUCs compared to the models with less CpGs. For instance, the 144-CpG heavy versus non-drinkers model gave an AUC of 0.78 ± 0.06, while the 5 and 23 CpG models achieved 0.83 ± 0.05, respectively. The transportability of the models was empirically tested via external validation in three independent European cohorts (N = 1794), revealing high AUC variance between datasets within models. For instance, the 144-CpG heavy versus non-drinkers model yielded AUCs ranging from 0.60 to 0.84 between datasets. The newly developed models that considered data from all categories showed low AUCs but gave low AUC variation in the external validation. For instance, the 144-CpG heavy and at-risk versus light and non-drinkers model achieved AUCs of 0.67 ± 0.02 in the internal cross-validation and 0.61-0.66 in the external validation datasets. CONCLUSIONS: The outcomes of our internal and external validation demonstrate that the previously reported prediction models suffer from both overfitting and accuracy overestimation. Our results show that the previously proposed biomarkers are not yet sufficient for accurate and robust inference of alcohol consumption from blood. Overall, our findings imply that DNA methylation prediction biomarkers and models need to be improved considerably before epigenetic inference of alcohol consumption from blood can be considered for practical applications.
Assuntos
Consumo de Bebidas Alcoólicas/sangue , Biomarcadores/análise , Epigênese Genética/genética , Consumo de Bebidas Alcoólicas/genética , Área Sob a Curva , Biomarcadores/sangue , Metilação de DNA , Epigênese Genética/fisiologia , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Curva ROC , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Illumina DNA methylation microarrays enable epigenome-wide analysis vastly used for the discovery of novel DNA methylation variation in health and disease. However, the microarrays' probe design cannot fully consider the vast human genetic diversity, leading to genetic artifacts. Distinguishing genuine from artifactual genetic influence is of particular relevance in the study of DNA methylation heritability and methylation quantitative trait loci. But despite its importance, current strategies to account for genetic artifacts are lagging due to a limited mechanistic understanding on how such artifacts operate. RESULTS: To address this, we develop and benchmark UMtools, an R-package containing novel methods for the quantification and qualification of genetic artifacts based on fluorescence intensity signals. With our approach, we model and validate known SNPs/indels on a genetically controlled dataset of monozygotic twins, and we estimate minor allele frequency from DNA methylation data and empirically detect variants not included in dbSNP. Moreover, we identify examples where genetic artifacts interact with each other or with imprinting, X-inactivation, or tissue-specific regulation. Finally, we propose a novel strategy based on co-methylation that can discern between genetic artifacts and genuine genomic influence. CONCLUSIONS: We provide an atlas to navigate through the huge diversity of genetic artifacts encountered on DNA methylation microarrays. Overall, our study sets the ground for a paradigm shift in the study of the genetic component of epigenetic variation in DNA methylation microarrays.
Assuntos
Artefatos , Metilação de DNA , Análise de Sequência com Séries de Oligonucleotídeos , Software , Corantes Fluorescentes , Humanos , Mutação INDEL , Íntrons , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Gêmeos Monozigóticos/genéticaRESUMO
Information on the time when a stain was deposited at a crime scene can be valuable in forensic investigations. It can link a DNA-identified stain donor with a crime or provide a post-mortem interval estimation in cases with cadavers. The available methods for estimating stain deposition time have limitations of different types and magnitudes. In this proof-of-principle study we investigated for the first time the use of microbial DNA for this purpose in human saliva stains. First, we identified the most abundant and frequent bacterial species in saliva using publicly available 16S rRNA gene next generation sequencing (NGS) data from 1,848 samples. Next, we assessed time-dependent changes in 15 identified species using de-novo 16S rRNA gene NGS in the saliva stains of two individuals exposed to indoor conditions for up to 1 year. We selected four bacterial species, i.e., Fusobacterium periodonticum, Haemophilus parainfluenzae, Veillonella dispar, and Veillonella parvula showing significant time-dependent changes and developed a 4-plex qPCR assay for their targeted analysis. Then, we analyzed the saliva stains of 15 individuals exposed to indoor conditions for up to 1 month. Bacterial counts generally increased with time and explained 54.9% of the variation (p = <2.2E-16). Time since deposition explained ≥86.5% and ≥88.9% of the variation in each individual and species, respectively (p = <2.2E-16). Finally, based on sample duplicates we built and tested multiple linear regression models for predicting the stain deposition time at an individual level, resulting in an average mean absolute error (MAE) of 5 days (ranging 3.3-7.8 days). Overall, the deposition time of 181 (81.5%) stains was correctly predicted within 1 week. Prediction models were also assessed in stains exposed to similar conditions up to 1 month 7 months later, resulting in an average MAE of 8.8 days (ranging 3.9-16.9 days). Our proof-of-principle study suggests the potential of the DNA profiling of human commensal bacteria as a method of estimating saliva stains time since deposition in the forensic scenario, which may be expanded to other forensically relevant tissues. The study considers practical applications of this novel approach, but various forensic developmental validation and implementation criteria will need to be met in more dedicated studies in the future.
RESUMO
Although DNA methylation variation of autosomal CpGs provides robust age predictive biomarkers, no male-specific age predictor exists based on Y-CpGs yet. Since sex chromosomes play an important role in aging, a Y-chromosome-based age predictor would allow studying male-specific aging effects and would also be useful in forensics. Here, we used blood-based DNA methylation microarray data of 1,057 males from six cohorts aged 15-87 and identified 75 Y-CpGs with an interquartile range of ≥0.1. Of these, 22 and six were significantly hyper- and hypomethylated with age (p(cor)<0.05, Bonferroni), respectively. Amongst several machine learning algorithms, a model based on support vector machines with radial kernel performed best in male-specific age prediction. We achieved a mean absolute deviation (MAD) between true and predicted age of 7.54 years (cor=0.81, validation) when using all 75 Y-CpGs, and a MAD of 8.46 years (cor=0.73, validation) based on the most predictive 19 Y-CpGs. The accuracies of both age predictors did not worsen with increased age, in contrast to autosomal CpG-based age predictors that are known to predict age with reduced accuracy in the elderly. Overall, we introduce the first-of-its-kind male-specific epigenetic age predictor for future applications in aging research and forensics.
Assuntos
Envelhecimento/genética , Cromossomos Humanos Y , Metilação de DNA , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Ilhas de CpG , Epigênese Genética , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Máquina de Vetores de Suporte , Adulto JovemRESUMO
BACKGROUND: Although the genomes of monozygotic twins are practically identical, their methylomes may evolve divergently throughout their lifetime as a consequence of factors such as the environment or aging. Particularly for young and healthy monozygotic twins, DNA methylation divergence, if any, may be restricted to stochastic processes occurring post-twinning during embryonic development and early life. However, to what extent such stochastic mechanisms can systematically provide a stable source of inter-individual epigenetic variation remains uncertain until now. RESULTS: We enriched for inter-individual stochastic variation by using an equivalence testing-based statistical approach on whole blood methylation microarray data from healthy adolescent monozygotic twins. As a result, we identified 333 CpGs displaying similarly large methylation variation between monozygotic co-twins and unrelated individuals. Although their methylation variation surpasses measurement error and is stable in a short timescale, susceptibility to aging is apparent in the long term. Additionally, 46% of these CpGs were replicated in adipose tissue. The identified sites are significantly enriched at the clustered protocadherin loci, known for stochastic methylation in developing neurons. We also confirmed an enrichment in monozygotic twin DNA methylation discordance at these loci in whole genome bisulfite sequencing data from blood and adipose tissue. CONCLUSIONS: We have isolated a component of stochastic methylation variation, distinct from genetic influence, measurement error, and epigenetic drift. Biomarkers enriched in this component may serve in the future as the basis for universal epigenetic fingerprinting, relevant for instance in the discrimination of monozygotic twin individuals in forensic applications, currently impossible with standard DNA profiling.
Assuntos
Metilação de DNA , Epigênese Genética , Gêmeos Monozigóticos/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Ilhas de CpG , Feminino , Genoma Humano , Humanos , Masculino , Pessoa de Meia-Idade , Adulto JovemRESUMO
Forensic DNA phenotyping is gaining interest as the number of applications increases within the forensic genetics community. The possibility of providing investigative leads in addition to conventional DNA profiling for human identification provides new insights into otherwise "cold" police investigations. The ability of reporting on the bio-geographical ancestry (BGA), appearance characteristics and age based on DNA obtained from a crime scene sample of an unknown donor makes the exploration of such markers and the development of new methods meaningful for criminal investigations. The VISible Attributes through GEnomics (VISAGE) Consortium aims to disseminate and broaden the use of predictive markers and develop fully optimized and validated prototypes for forensic casework implementation. Here, the first VISAGE appearance and ancestry tool development, performance and validation is reported. A total of 153 SNPs (96.84 % assay conversion rate) were successfully incorporated into a single multiplex reaction using the AmpliSeq™ design pipeline, and applied for massively parallel sequencing with the Ion S5 platform. A collaborative effort involving six VISAGE laboratory partners was devised to perform all validation tests. An extensive validation plan was carefully organized to explore the assay's overall performance with optimum and low-input samples, as well as with challenging and casework mock samples. In addition, forensic validation studies such as concordance and mixture tests recurring to the Coriell sample set with known genotypes were performed. Finally, inhibitor tolerance and specificity were also evaluated. Results showed a robust, highly sensitive assay with good overall concordance between laboratories.
Assuntos
Impressões Digitais de DNA , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genética , Software , Marcadores Genéticos , Humanos , Fenótipo , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
Human blood traces are amongst the most commonly encountered biological stains collected at crime scenes. Identifying the body site of origin of a forensic blood trace can provide crucial information in many cases, such as in sexual and violent assaults. However, means for reliably and accurately identifying from which body site a forensic blood trace originated are missing, but would be highly valuable in crime scene investigations. With this study, we introduce a taxonomy-independent deep neural network approach based on massively parallel microbiome sequencing, which delivers accurate body site of origin classification of forensically-relevant blood samples, such as menstrual, nasal, fingerprick, and venous blood. A total of 50 deep neural networks were trained using a large 16S rRNA gene sequencing dataset from 773 reference samples, including 220 female urogenital tract, 190 nasal cavity, 213 skin, and 150 venous blood samples. Validation was performed with de-novo generated 16S rRNA gene massively parallel sequencing (MPS) data from 94 blood test samples of four different body sites, and achieved high classification accuracy with AUC values at 0.992 for menstrual blood (Nâ¯=â¯23), 0.978 for nasal blood (Nâ¯=â¯16), 0.978 for fingerprick blood (Nâ¯=â¯30), and 0.990 for venous blood (Nâ¯=â¯25). The obtained highly accurate classification of menstrual blood was independent of the day of the menses, as established in additional 86 menstrual blood test samples. Accurate body site of origin classification was also revealed for 45 fresh and aged mock casework blood samples from all four body sites. Our novel microbiome approach works based on the assumption that a sample is from blood, as can be obtained in forensic practise from prior presumptive blood testing, and provides accurate information on the specific body source of blood, with high potentials for future forensic applications.
Assuntos
Sangue/microbiologia , Dedos/microbiologia , Microbiota/genética , Mucosa Nasal/microbiologia , Vagina/microbiologia , Epitélio/microbiologia , Feminino , Genética Forense/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Menstruação , Redes Neurais de Computação , RNA Ribossômico 16S , Pele/microbiologia , VeiasRESUMO
Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 ± 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUCcrossvalidation 0.925 ± 0.021, AUCexternalvalidation0.914), former (0.766 ± 0.023, 0.699) and never smoking (0.830 ± 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 ± 0.068, 0.796; 15 pack-years 0.767 ± 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 ± 0.024, 0.760; 10 years 0.766 ± 0.033, 0.764; 15 years 0.767 ± 0.020, 0.754). Model application to children revealed highly accurate inference of the true non-smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.
Assuntos
Cotinina/sangue , Metilação de DNA , DNA/sangue , Epigenômica/métodos , Fumar/efeitos adversos , Adulto , Área Sob a Curva , Biomarcadores/sangue , Feminino , Humanos , Masculino , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Fumar/genética , Abandono do Hábito de FumarRESUMO
Correct identification of different human epithelial materials such as from skin, saliva and vaginal origin is relevant in forensic casework as it provides crucial information for crime reconstruction. However, the overlap in human cell type composition between these three epithelial materials provides challenges for their differentiation and identification when using previously proposed human cell biomarkers, while their microbiota composition largely differs. By using validated 16S rRNA gene massively parallel sequencing data from the Human Microbiome Project of 1636 skin, oral and vaginal samples, 50 taxonomy-independent deep learning networks were trained to classify these three tissues. Validation testing was performed in de-novo generated high-throughput 16S rRNA gene sequencing data using the Ion Torrent™ Personal Genome Machine from 110 test samples: 56 hand skin, 31 saliva and 23 vaginal secretion specimens. Body-site classification accuracy of these test samples was very high as indicated by AUC values of 0.99 for skin, 0.99 for oral, and 1 for vaginal secretion. Misclassifications were limited to 3 (5%) skin samples. Additional forensic validation testing was performed in mock casework samples by de-novo high-throughput sequencing of 19 freshly-prepared samples and 22 samples aged for 1 up to 7.6 years. All of the 19 fresh and 20 (91%) of the 22 aged mock casework samples were correctly tissue-type classified. Moreover, comparing the microbiome results with outcomes from previous human mRNA-based tissue identification testing in the same 16 aged mock casework samples reveals that our microbiome approach performs better in 12 (75%), similarly in 2 (12.5%), and less good in 2 (12.5%) of the samples. Our results demonstrate that this new microbiome approach allows for accurate tissue-type classification of three human epithelial materials of skin, oral and vaginal origin, which is highly relevant for future forensic investigations.
Assuntos
Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala , Microbiota , RNA Ribossômico 16S/genética , Análise de Sequência de RNA , Feminino , Genética Forense/métodos , Humanos , Masculino , Saliva/microbiologia , Pele/microbiologia , Vagina/microbiologiaRESUMO
Forensic epigenetics, i.e., investigating epigenetics variation to resolve forensically relevant questions unanswerable with standard forensic DNA profiling has been gaining substantial ground over the last few years. Differential DNA methylation among tissues and individuals has been proposed as useful resource for three forensic applications i) determining the tissue type of a human biological trace, ii) estimating the age of an unknown trace donor, and iii) differentiating between monozygotic twins. Thus far, forensic epigenetic investigations have used a wide range of methods for CpG marker discovery, prediction modelling and targeted DNA methylation analysis, all coming with advantages and disadvantages when it comes to forensic trace analysis. In this review, we summarize the most recent literature on these three main topics of current forensic epigenetic investigations and discuss limitations and practical considerations in experimental design and data interpretation, such as technical and biological biases. Moreover, we provide future perspectives with regard to new research questions, new epigenetic markers and recent technological advances that - as we envision - will move the field towards forensic epigenomics in the near future.
Assuntos
Metilação de DNA , Epigênese Genética , Epigenômica , Genética Forense , Envelhecimento/genética , Líquidos Corporais/química , Ilhas de CpG/genética , Marcadores Genéticos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Espectrometria de Massas , Modelos Estatísticos , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Gêmeos Monozigóticos/genéticaRESUMO
Monozygotic (MZ) twins are typically indistinguishable via forensic DNA profiling. Recently, we demonstrated that epigenetic differentiation of MZ twins is feasible; however, proportions of twin differentially methylated CpG sites (tDMSs) identified in reference-type blood DNA were not replicated in trace-type blood DNA. Here we investigated buccal swabs as typical forensic reference material, and saliva and cigarette butts as commonly encountered forensic trace materials. As an analog to a forensic case, we analyzed one MZ twin pair. Epigenome-wide microarray analysis in reference-type buccal DNA revealed 25 candidate tDMSs with >0.5 twin-to-twin differences. MethyLight quantitative PCR (qPCR) of 22 selected tDMSs in trace-type DNA revealed in saliva DNA that six tDMSs (27.3%) had >0.1 twin-to-twin differences, seven (31.8%) had smaller (<0.1) but robustly detected differences, whereas for nine (40.9%) the differences were in the opposite direction relative to the microarray data; for cigarette butt DNA, results were 50%, 22.7%, and 27.3%, respectively. The discrepancies between reference-type and trace-type DNA outcomes can be explained by cell composition differences, method-to-method variation, and other technical reasons including bisulfite conversion inefficiency. Our study highlights the importance of the DNA source and that careful characterization of biological and technical effects is needed before epigenetic MZ twin differentiation is applicable in forensic casework.