RESUMO
Human skin is colonized with skin microbiota that includes commensal bacteria, fungi, arthropods, archaea and viruses. The composition of the microbiota varies at different anatomical locations according to changes in body temperature, pH, humidity/hydration or sebum content. A homeostatic skin microbiota is crucial to maintain epithelial barrier functions, to protect from invading pathogens and to interact with the immune system. Therefore, maintaining homeostasis holds promise to be an achievable goal for microbiome-directed treatment strategies as well as a prophylactic strategy to prevent the development of skin diseases, as dysbiosis or disruption of homeostatic skin microbiota is associated with skin inflammation. A healthy skin microbiome is likely modulated by genetic as well as environmental and lifestyle factors. In this review, we aim to provide a complete overview of the lifestyle and environmental factors that can contribute to maintaining the skin microbiome healthy. Awareness of these factors could be the basis for a prophylactic strategy to prevent the development of skin diseases or to be used as a therapeutic approach.
RESUMO
Human microbiome research is moving from characterization and association studies to translational applications in medical research, clinical diagnostics, and others. One of these applications is the prediction of human traits, where machine learning (ML) methods are often employed, but face practical challenges. Class imbalance in available microbiome data is one of the major problems, which, if unaccounted for, leads to spurious prediction accuracies and limits the classifier's generalization. Here, we investigated the predictability of smoking habits from class-imbalanced saliva microbiome data by combining data augmentation techniques to account for class imbalance with ML methods for prediction. We collected publicly available saliva 16S rRNA gene sequencing data and smoking habit metadata demonstrating a serious class imbalance problem, i.e., 175 current vs. 1,070 non-current smokers. Three data augmentation techniques (synthetic minority over-sampling technique, adaptive synthetic, and tree-based associative data augmentation) were applied together with seven ML methods: logistic regression, k-nearest neighbors, support vector machine with linear and radial kernels, decision trees, random forest, and extreme gradient boosting. K-fold nested cross-validation was used with the different augmented data types and baseline non-augmented data to validate the prediction outcome. Combining data augmentation with ML generally outperformed baseline methods in our dataset. The final prediction model combined tree-based associative data augmentation and support vector machine with linear kernel, and achieved a classification performance expressed as Matthews correlation coefficient of 0.36 and AUC of 0.81. Our method successfully addresses the problem of class imbalance in microbiome data for reliable prediction of smoking habits.
RESUMO
Over the last few years, advances in massively parallel sequencing technologies (also referred to next generation sequencing) and bioinformatics analysis tools have boosted our knowledge on the human microbiome. Such insights have brought new perspectives and possibilities to apply human microbiome analysis in many areas, particularly in medicine. In the forensic field, the use of microbial DNA obtained from human materials is still in its infancy but has been suggested as a potential alternative in situations when other human (non-microbial) approaches present limitations. More specifically, DNA analysis of a wide variety of microorganisms that live in and on the human body offers promises to answer various forensically relevant questions, such as post-mortem interval estimation, individual identification, and tissue/body fluid identification, among others. However, human microbiome analysis currently faces significant challenges that need to be considered and overcome via future forensically oriented human microbiome research to provide the necessary solutions. In this perspective article, we discuss the most relevant biological, technical and data-related issues and propose future solutions that will pave the way towards the integration of human microbiome analysis in the forensic toolkit.
Assuntos
Microbiota , Biologia Computacional , DNA/genética , Medicina Legal , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Information on the time when a stain was deposited at a crime scene can be valuable in forensic investigations. It can link a DNA-identified stain donor with a crime or provide a post-mortem interval estimation in cases with cadavers. The available methods for estimating stain deposition time have limitations of different types and magnitudes. In this proof-of-principle study we investigated for the first time the use of microbial DNA for this purpose in human saliva stains. First, we identified the most abundant and frequent bacterial species in saliva using publicly available 16S rRNA gene next generation sequencing (NGS) data from 1,848 samples. Next, we assessed time-dependent changes in 15 identified species using de-novo 16S rRNA gene NGS in the saliva stains of two individuals exposed to indoor conditions for up to 1 year. We selected four bacterial species, i.e., Fusobacterium periodonticum, Haemophilus parainfluenzae, Veillonella dispar, and Veillonella parvula showing significant time-dependent changes and developed a 4-plex qPCR assay for their targeted analysis. Then, we analyzed the saliva stains of 15 individuals exposed to indoor conditions for up to 1 month. Bacterial counts generally increased with time and explained 54.9% of the variation (p = <2.2E-16). Time since deposition explained ≥86.5% and ≥88.9% of the variation in each individual and species, respectively (p = <2.2E-16). Finally, based on sample duplicates we built and tested multiple linear regression models for predicting the stain deposition time at an individual level, resulting in an average mean absolute error (MAE) of 5 days (ranging 3.3-7.8 days). Overall, the deposition time of 181 (81.5%) stains was correctly predicted within 1 week. Prediction models were also assessed in stains exposed to similar conditions up to 1 month 7 months later, resulting in an average MAE of 8.8 days (ranging 3.9-16.9 days). Our proof-of-principle study suggests the potential of the DNA profiling of human commensal bacteria as a method of estimating saliva stains time since deposition in the forensic scenario, which may be expanded to other forensically relevant tissues. The study considers practical applications of this novel approach, but various forensic developmental validation and implementation criteria will need to be met in more dedicated studies in the future.
RESUMO
Human blood traces are amongst the most commonly encountered biological stains collected at crime scenes. Identifying the body site of origin of a forensic blood trace can provide crucial information in many cases, such as in sexual and violent assaults. However, means for reliably and accurately identifying from which body site a forensic blood trace originated are missing, but would be highly valuable in crime scene investigations. With this study, we introduce a taxonomy-independent deep neural network approach based on massively parallel microbiome sequencing, which delivers accurate body site of origin classification of forensically-relevant blood samples, such as menstrual, nasal, fingerprick, and venous blood. A total of 50 deep neural networks were trained using a large 16S rRNA gene sequencing dataset from 773 reference samples, including 220 female urogenital tract, 190 nasal cavity, 213 skin, and 150 venous blood samples. Validation was performed with de-novo generated 16S rRNA gene massively parallel sequencing (MPS) data from 94 blood test samples of four different body sites, and achieved high classification accuracy with AUC values at 0.992 for menstrual blood (Nâ¯=â¯23), 0.978 for nasal blood (Nâ¯=â¯16), 0.978 for fingerprick blood (Nâ¯=â¯30), and 0.990 for venous blood (Nâ¯=â¯25). The obtained highly accurate classification of menstrual blood was independent of the day of the menses, as established in additional 86 menstrual blood test samples. Accurate body site of origin classification was also revealed for 45 fresh and aged mock casework blood samples from all four body sites. Our novel microbiome approach works based on the assumption that a sample is from blood, as can be obtained in forensic practise from prior presumptive blood testing, and provides accurate information on the specific body source of blood, with high potentials for future forensic applications.
Assuntos
Sangue/microbiologia , Dedos/microbiologia , Microbiota/genética , Mucosa Nasal/microbiologia , Vagina/microbiologia , Epitélio/microbiologia , Feminino , Genética Forense/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Menstruação , Redes Neurais de Computação , RNA Ribossômico 16S , Pele/microbiologia , VeiasRESUMO
Correct identification of different human epithelial materials such as from skin, saliva and vaginal origin is relevant in forensic casework as it provides crucial information for crime reconstruction. However, the overlap in human cell type composition between these three epithelial materials provides challenges for their differentiation and identification when using previously proposed human cell biomarkers, while their microbiota composition largely differs. By using validated 16S rRNA gene massively parallel sequencing data from the Human Microbiome Project of 1636 skin, oral and vaginal samples, 50 taxonomy-independent deep learning networks were trained to classify these three tissues. Validation testing was performed in de-novo generated high-throughput 16S rRNA gene sequencing data using the Ion Torrent™ Personal Genome Machine from 110 test samples: 56 hand skin, 31 saliva and 23 vaginal secretion specimens. Body-site classification accuracy of these test samples was very high as indicated by AUC values of 0.99 for skin, 0.99 for oral, and 1 for vaginal secretion. Misclassifications were limited to 3 (5%) skin samples. Additional forensic validation testing was performed in mock casework samples by de-novo high-throughput sequencing of 19 freshly-prepared samples and 22 samples aged for 1 up to 7.6 years. All of the 19 fresh and 20 (91%) of the 22 aged mock casework samples were correctly tissue-type classified. Moreover, comparing the microbiome results with outcomes from previous human mRNA-based tissue identification testing in the same 16 aged mock casework samples reveals that our microbiome approach performs better in 12 (75%), similarly in 2 (12.5%), and less good in 2 (12.5%) of the samples. Our results demonstrate that this new microbiome approach allows for accurate tissue-type classification of three human epithelial materials of skin, oral and vaginal origin, which is highly relevant for future forensic investigations.
Assuntos
Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala , Microbiota , RNA Ribossômico 16S/genética , Análise de Sequência de RNA , Feminino , Genética Forense/métodos , Humanos , Masculino , Saliva/microbiologia , Pele/microbiologia , Vagina/microbiologiaRESUMO
Monozygotic (MZ) twins share the same STR profile, demonstrating a practical problem in forensic casework. DNA methylation has provided a suitable resource for MZ twin differentiation; however, studies addressing the forensic feasibility are lacking. Here, we investigated epigenetic MZ twin differentiation from blood under the forensic scenario comprising i) the discovery of candidate markers in reference-type blood DNA via genome-wide analysis, ii) the technical validation of candidate markers in reference-type blood DNA using a suitable targeted method, and iii) the analysis of the validated markers in trace-type DNA. Genome-wide methylation analysis in blood DNA from 10 MZ twin pairs resulted in 19-111 twin-differentially methylated sites (tDMSs) per pair with >0.3 twin-to-twin differences. Considering all top three candidate tDMSs across all pairs in the technical validation based on methylation-specific qPCR, 67.85% generated >0.1 twin-to-twin differences. Of the validated tDMSs, 68.4% showed >0.1 twin-to-twin differences with qPCR in trace-type DNA across 8 pairs. Using an updated marker selection strategy, 8 additional candidate tDMSs were obtained for an example MZ pair, of which 7 showed >0.1 twin-to-twin differences in both reference- and trace-type DNA. Lastly, we introduce a high-resolution melting curve analysis of the entire fragment that can complement the proposed approach. Overall, our study demonstrates the general feasibility of epigenetic twin differentiation in the forensic context and highlights that the number of informative tDMSs in the final trace DNA analysis is crucial, as some candidate markers identified in reference DNA were shown not informative in the trace DNA due to various, including technical, reasons. Future studies will need to address the optimal number of epigenetic markers required for reliable identification of MZ twin individuals including statistical considerations.