RESUMEN
Environmental DNA metabarcoding is a powerful approach for use in biomonitoring and impact assessments. Amplicon-based eDNA sequence data are characteristically highly divergent in sequencing depth (total reads per sample) as influenced inter alia by the number of samples simultaneously analyzed per sequencing run. The random forest (RF) machine learning algorithm has been successfully employed to accurately classify unknown samples into monitoring categories. To employ RF to eDNA data, and avoid sequencing-depth artifacts, sequence data across samples are normalized using rarefaction, a process that inherently loses information. The aim of this study was to inform future sampling designs in terms of the relationship between sampling depth and RF accuracy. We analyzed three published and one new bacterial amplicon datasets, using a RF, based initially on the maximal rarefied data available (minimum mean ofâ¯>â¯30,000 reads across all datasets) to give our baseline performance. We then evaluated the RF classification success based on increasingly rarefied datasets. We found that extreme to moderate rarefaction (50-5000 sequences per sample) was sufficient to achieve prediction performance commensurate to the full data, depending on the classification task. We did not find that the number of classification classes, data balance across classes, or the total number of sequences or samples, were associated with predictive accuracy. We identified the ability of the training data to adequately characterize the classes being mapped as the most important criterion and discuss how this finding can inform future sampling design for eDNA based biomonitoring to reduce costs and computation time.
RESUMEN
Genetic diversity creation is a core technology in directed evolution where a high quality mutant library is crucial to its success. Owing to its importance, the technology in genetic diversity creation has seen rapid development over the years and its application has diversified into other fields of scientific research. The advances in molecular cloning and mutagenesis since 2008 were reviewed. Specifically, new cloning techniques were classified based on their principles of complementary overhangs, homologous sequences, overlapping PCR and megaprimers and the advantages, drawbacks and performances of these methods were highlighted. New mutagenesis methods developed for random mutagenesis, focused mutagenesis and DNA recombination were surveyed. The technical requirements of these methods and the mutational spectra were compared and discussed with references to commonly used techniques. The trends of mutant library preparation were summarised. Challenges in genetic diversity creation were discussed with emphases on creating "smart" libraries, controlling the mutagenesis spectrum and specific challenges in each group of mutagenesis methods. An outline of the wider applications of genetic diversity creation includes genome engineering, viral evolution, metagenomics and a study of protein functions. The review ends with an outlook for genetic diversity creation and the prospective developments that can have future impact in this field.