RESUMO
Root system architecture (RSA), the distribution of roots in soil, plays a major role in plant survival. RSA is shaped by multiple developmental processes that are largely governed by the phytohormone auxin, suggesting that auxin regulates responses of roots that are important for local adaptation. However, auxin has a central role in numerous processes, and it is unclear which molecular mechanisms contribute to the variation in RSA for environmental adaptation. Using natural variation in Arabidopsis, we identify EXOCYST70A3 as a modulator of the auxin system that causes variation in RSA by acting on PIN4 protein distribution. Allelic variation and genetic perturbation of EXOCYST70A3 lead to alteration of root gravitropic responses, resulting in a different RSA depth profile and drought resistance. Overall our findings suggest that the local modulation of the pleiotropic auxin pathway can gives rise to distinct RSAs that can be adaptive in specific environments.
Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Ácidos Indolacéticos/metabolismo , Alelos , Apomorfina/análogos & derivados , Apomorfina/farmacologia , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Secas , Exocitose , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Estudo de Associação Genômica Ampla , Proteínas de Membrana Transportadoras/metabolismo , Mutação , Raízes de Plantas/efeitos dos fármacos , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/metabolismoRESUMO
Cell dormancy is a widespread mechanism used by bacteria to evade environmental threats, including antibiotics. Here we monitored bacterial antibiotic tolerance and regrowth at the single-cell level and found that each individual survival cell shows different "dormancy depth," which in return regulates the lag time for cell resuscitation after removal of antibiotic. We further established that protein aggresome-a collection of endogenous protein aggregates-is an important indicator of bacterial dormancy depth, whose formation is promoted by decreased cellular ATP level. For cells to leave the dormant state and resuscitate, clearance of protein aggresome and recovery of proteostasis are required. We revealed that the ability to recruit functional DnaK-ClpB machineries, which facilitate protein disaggregation in an ATP-dependent manner, determines the lag time for bacterial regrowth. Better understanding of the key factors regulating bacterial regrowth after surviving antibiotic attack could lead to new therapeutic strategies for combating bacterial antibiotic tolerance.
Assuntos
Trifosfato de Adenosina/metabolismo , Antibacterianos/farmacologia , Farmacorresistência Bacteriana , Metabolismo Energético/efeitos dos fármacos , Proteínas de Escherichia coli/metabolismo , Escherichia coli/efeitos dos fármacos , Agregados Proteicos , Endopeptidase Clp/genética , Endopeptidase Clp/metabolismo , Escherichia coli/genética , Escherichia coli/crescimento & desenvolvimento , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Choque Térmico HSP70/genética , Proteínas de Choque Térmico HSP70/metabolismo , Proteínas de Choque Térmico/genética , Proteínas de Choque Térmico/metabolismo , Concentração de Íons de Hidrogênio , Viabilidade Microbiana/efeitos dos fármacos , Análise de Célula Única , Fatores de TempoRESUMO
DNA variants that arise after conception can show mosaicism, varying in presence and extent among tissues. Mosaic variants have been reported in Mendelian diseases, but further investigation is necessary to broadly understand their incidence, transmission, and clinical impact. A mosaic pathogenic variant in a disease-related gene may cause an atypical phenotype in terms of severity, clinical features, or timing of disease onset. Using high-depth sequencing, we studied results from one million unrelated individuals referred for genetic testing for almost 1,900 disease-related genes. We observed 5,939 mosaic sequence or intragenic copy number variants distributed across 509 genes in nearly 5,700 individuals, constituting approximately 2% of molecular diagnoses in the cohort. Cancer-related genes had the most mosaic variants and showed age-specific enrichment, in part reflecting clonal hematopoiesis in older individuals. We also observed many mosaic variants in genes related to early-onset conditions. Additional mosaic variants were observed in genes analyzed for reproductive carrier screening or associated with dominant disorders with low penetrance, posing challenges for interpreting their clinical significance. When we controlled for the potential involvement of clonal hematopoiesis, most mosaic variants were enriched in younger individuals and were present at higher levels than in older individuals. Furthermore, individuals with mosaicism showed later disease onset or milder phenotypes than individuals with non-mosaic variants in the same genes. Collectively, the large compendium of variants, disease correlations, and age-specific results identified in this study expand our understanding of the implications of mosaic DNA variation for diagnosis and genetic counseling.
Assuntos
Variações do Número de Cópias de DNA , Mosaicismo , Variações do Número de Cópias de DNA/genética , Testes Genéticos , Fenótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , MutaçãoRESUMO
Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.
Assuntos
Algoritmos , Filogenia , Biologia Computacional/métodos , Humanos , Bactérias/genética , Bactérias/classificação , Software , Alinhamento de Sequência , Análise de Sequência de DNA/métodosRESUMO
Single-cell sequencing has revolutionized our ability to dissect the heterogeneity within tumor populations. In this study, we present LoRA-TV (Low Rank Approximation with Total Variation), a novel method for clustering tumor cells based on the read depth profiles derived from single-cell sequencing data. Traditional analysis pipelines process read depth profiles of each cell individually. By aggregating shared genomic signatures distributed among individual cells using low-rank optimization and robust smoothing, the proposed method enhances clustering performance. Results from analyses of both simulated and real data demonstrate its effectiveness compared with state-of-the-art alternatives, as supported by improvements in the adjusted Rand index and computational efficiency.
Assuntos
Neoplasias , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Neoplasias/genética , Neoplasias/patologia , Análise por Conglomerados , Algoritmos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodosRESUMO
The paucity of investigations of carbon (C) dynamics through the soil profile with warming makes it challenging to evaluate the terrestrial C feedback to climate change. Soil microbes are important engines driving terrestrial biogeochemical cycles; their carbon use efficiency (CUE), defined as the proportion of metabolized organic C allocated to microbial biomass, is a key regulator controlling the fate of soil C. It has been theorized that microbial CUE should decline with warming; however, empirical evidence for this response is scarce, and data from deeper soils are particularly scarce. Here, based on soil samples from a whole-soil-profile warming experiment (0 to 1 m, +4 °C) and 18O tracing approach, we examined the vertical variation of microbial CUE and its response to ~3.3-y experimental warming in an alpine grassland on the Qinghai-Tibetan Plateau. Microbial CUE decreased with soil depth, a trend that was primarily controlled by soil C availability. However, warming had limited effects on microbial CUE regardless of soil depth. Similarly, warming had no significant effect on soil C availability, as characterized by extractable organic C, enzyme-based lignocellulose index, and lignin phenol-based ratios of vanillyls, syringyls, and cinnamyls. Collectively, our work suggests that short-term warming does not alter microbial CUE in either surface or deep soils, and emphasizes the regulatory role of soil C availability on microbial CUE.
Assuntos
Pradaria , Solo , Solo/química , Carbono/metabolismo , Microbiologia do Solo , Mudança ClimáticaRESUMO
Despite achievements in suppressing dendrites and regulating Zn crystal growth, secondary aqueous Zn batteries are still rare in the market. Existing strategies mainly focus on electrode modification and electrolyte optimization, while the essential role of ion concentration in liquid-to-solid electrodeposition is neglected for a long time. Herein, the mechanism of concentration regulation in Zn electrodeposition is investigated in depth by combining electrochemical tests, post hoc characterization, and multiscale simulations. First, initial Zn electrodeposition is thermodynamically controlled epitaxial growth, whereas with the rapid depletion of ions, the concentration overpotential transcends the thermodynamic influence to kinetic control. Then, the evolution of the morphology from 2D sheets to 1D whiskers due to the concentration change is insightfully revealed by the morphological characterization and phase-field modeling. Furthermore, the depth of discharge (DOD) results in large concentration differences at the electrode-electrolyte interface, with a mild concentration distribution at lower DOD generating (002) crystal plane 2D sheets and a heavily varied concentration distribution at higher DOD yielding arbitrarily oriented 3D blocks. As a proof of concept, relaxation is introduced into two systems to homogenize the concentration distribution, revalidating the essential role of concentration in regulating electrodeposition, and two vital factors affecting the relaxation time, i.e., current density and electrode distance, are deeply investigated, demonstrating that the relaxation time is positively related to both and is more sensitive to the electrode distance. This work contributes to reacquainting aqueous batteries undergoing phase transitions and reveals a missing piece of the puzzle in regulating Zn electrodeposition.
RESUMO
A model for intermediate-depth earthquakes of subduction zones is evaluated based on shear localization, shear heating, and runaway creep within thin carbonate layers in an altered downgoing oceanic plate and the overlying mantle wedge. Thermal shear instabilities in carbonate lenses add to potential mechanisms for intermediate-depth seismicity, which are based on serpentine dehydration and embrittlement of altered slabs or viscous shear instabilities in narrow fine-grained olivine shear zones. Peridotites in subducting plates and the overlying mantle wedge may be altered by reactions with CO2-bearing fluids sourced from seawater or the deep mantle, to form carbonate minerals, in addition to hydrous silicates. Effective viscosities of magnesian carbonates are higher than those for antigorite serpentine and they are markedly lower than those for H2O-saturated olivine. However, magnesian carbonates may extend to greater mantle depths than hydrous silicates at temperatures and pressures of subduction zones. Strain rates within altered downgoing mantle peridotites may be localized within carbonated layers following slab dehydration. A simple model of shear heating and temperature-sensitive creep of carbonate horizons, based on experimentally determined creep laws, predicts conditions of stable and unstable shear with strain rates up to 10/s, comparable to seismic velocities of frictional fault surfaces. Applied to intermediate-depth earthquakes of the Tonga subduction zone and the double Wadati-Benioff zone of NE Japan, this mechanism provides an alternative to the generation of earthquakes by dehydration embrittlement, beyond the stability of antigorite serpentine in subduction zones.
RESUMO
Young breast and bowel cancers (e.g., those diagnosed before age 40 or 50 years) have far greater morbidity and mortality in terms of years of life lost, and are increasing in incidence, but have been less studied. For breast and bowel cancers, the familial relative risks, and therefore the familial variances in age-specific log(incidence), are much greater at younger ages, but little of these familial variances has been explained. Studies of families and twins can address questions not easily answered by studies of unrelated individuals alone. We describe existing and emerging family and twin data that can provide special opportunities for discovery. We present designs and statistical analyses, including novel ideas such as the VALID (Variance in Age-specific Log Incidence Decomposition) model for causes of variation in risk, the DEPTH (DEPendency of association on the number of Top Hits) and other approaches to analyse genome-wide association study data, and the within-pair, ICE FALCON (Inference about Causation from Examining FAmiliaL CONfounding) and ICE CRISTAL (Inference about Causation from Examining Changes in Regression coefficients and Innovative STatistical AnaLysis) approaches to causation and familial confounding. Example applications to breast and colorectal cancer are presented. Motivated by the availability of the resources of the Breast and Colon Cancer Family Registries, we also present some ideas for future studies that could be applied to, and compared with, cancers diagnosed at older ages and address the challenges posed by young breast and bowel cancers.
RESUMO
The endoplasmic reticulum (ER) is organized into ordered regions enriched in cholesterol and sphingomyelin, and disordered microdomains characterized by more fluidity. Rabbit CYP1A1 and CYP1A2 localize into disordered and ordered microdomains, respectively. Previously, a CYP1A2 chimera containing the first 109 amino acids of CYP1A1 showed altered microdomain localization. The goal of this study was to identify specific residues responsible for CYP1A microdomain localization. Thus, CYP1A2 chimeras containing substitutions from homologous regions of CYP1A1 were expressed in HEK 293T/17 cells, and the localization was examined after solubilization with Brij 98. A CYP1A2 mutant with the three amino acids from CYP1A1 (VAG) at positions 27-29 of CYP1A2 was generated that showed a distribution pattern similar to those of CYP1A1/1A2 chimeras containing both the first 109 amino acids and the first 31 amino acids of CYP1A1 followed by remaining amino acids of CYP1A2. Similarly, the reciprocal substitution of three amino acids from CYP1A2 (AVR) into CYP1A1 resulted in a partial redistribution of the chimera into ordered microdomains. Molecular dynamic simulations indicate that the positive charges of the CYP1A1 and CYP1A2 linker regions between the N-termini and catalytic domains resulted in different depths of immersion of the N-termini in the membrane. The overlap of the distribution of positively charged residues in CYP1A2 (AVR) and negatively charged phospholipids was higher in the ordered than disordered microdomain. These findings identify three residues in the CYP1A N-terminus as a novel microdomain-targeting motif of the P450s and provide a mechanistic explanation for the differential microdomain localization of CYP1A.
RESUMO
The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of â¼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (â¼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called "missing heritability" of SNV-based GWASs.
Assuntos
Variações do Número de Cópias de DNA , Repetições Minissatélites , Variações do Número de Cópias de DNA/genética , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Repetições Minissatélites/genética , FenótipoRESUMO
Neisseria meningitidis protects itself from complement-mediated killing by binding complement factor H (FH). Previous studies associated susceptibility to meningococcal disease (MD) with variation in CFH, but the causal variants and underlying mechanism remained unknown. Here we attempted to define the association more accurately by sequencing the CFH-CFHR locus and imputing missing genotypes in previously obtained GWAS datasets of MD-affected individuals of European ancestry and matched controls. We identified a CFHR3 SNP that provides protection from MD (rs75703017, p value = 1.1 × 10-16) by decreasing the concentration of FH in the blood (p value = 1.4 × 10-11). We subsequently used dual-luciferase studies and CRISPR gene editing to establish that deletion of rs75703017 increased FH expression in hepatocyte by preventing promotor inhibition. Our data suggest that reduced concentrations of FH in the blood confer protection from MD; with reduced access to FH, N. meningitidis is less able to shield itself from complement-mediated killing.
Assuntos
Fator H do Complemento , Infecções Meningocócicas , Proteínas Sanguíneas/genética , Fator H do Complemento/genética , Proteínas do Sistema Complemento/genética , Predisposição Genética para Doença , Genótipo , Humanos , Infecções Meningocócicas/genéticaRESUMO
Duplex sequencing technology has been widely used in the detection of low-frequency mutations in circulating tumor deoxyribonucleic acid (DNA), but how to determine the sequencing depth and other experimental parameters to ensure the stable detection of low-frequency mutations is still an urgent problem to be solved. The mutation detection rules of duplex sequencing constrain not only the number of mutated templates but also the number of mutation-supportive reads corresponding to each forward and reverse strand of the mutated templates. To tackle this problem, we proposed a Depth Estimation model for stable detection of Low-Frequency MUTations in duplex sequencing (DELFMUT), which models the identity correspondence and quantitative relationships between templates and reads using the zero-truncated negative binomial distribution without considering the sequences composed of bases. The results of DELFMUT were verified by real duplex sequencing data. In the case of known mutation frequency and mutation detection rule, DELFMUT can recommend the combinations of DNA input and sequencing depth to guarantee the stable detection of mutations, and it has a great application value in guiding the experimental parameter setting of duplex sequencing technology.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Neoplasias/genética , Taxa de Mutação , DNARESUMO
Community structure, including relationships between and within groups, is foundational to our understanding of the world around us. For dissimilarity-based data, leveraging social concepts of conflict and alignment, we provide an approach for capturing meaningful structural information resulting from induced local comparisons. In particular, a measure of local (community) depth is introduced that leads directly to a probabilistic partitioning conveying locally interpreted closeness (or cohesion). A universal choice of threshold for distinguishing strongly and weakly cohesive pairs permits consideration of both local and global structure. Cases in which one might benefit from use of the approach include data with varying density such as that arising as snapshots of complex processes in which differing mechanisms drive evolution locally. The inherent recalibrating in response to density allows one to sidestep the need for localizing parameters, common to many existing methods. Mathematical results together with applications in linguistics, cultural psychology, and genetics, as well as to benchmark clustering data have been included. Together, these demonstrate how meaningful community structure can be identified without additional inputs (e.g., number of clusters or neighborhood size), optimization criteria, iterative procedures, or distributional assumptions.
Assuntos
Modelos Teóricos , Características de Residência , Ciências Sociais , Algoritmos , HumanosRESUMO
SignificanceThe exothermic metamorphic reaction in orthopyroxene (Opx), a major component of oceanic lithospheric mantle, is shown to trigger brittle failure in laboratory deformation experiments under conditions where garnet exsolution takes place. The reaction product is an extremely fine-grained material, forming narrow reaction zones that are mechanically weak, thereby facilitating macroscopic faulting. Oceanic subduction zones are characterized by two separate bands of seismicity, known as the double seismic zone. The upper band of seismicity, located in the oceanic crust, is well explained by dehydration-induced mechanical instability. Our newly discovered metamorphism-induced mechanical instability provides an alternative physical mechanism for earthquakes in the lower band of seismicity (located in the oceanic lithospheric mantle), with no requirement of hydration/dehydration processes.
RESUMO
Redox flow batteries (RFBs) are attractive large-scale energy storage techniques, achieving remarkable progress in performance enhancement for the last decades. Nevertheless, an in-depth understanding of the reaction mechanism still remains challenging due to its unique operation mechanism, where electrochemistry and hydrodynamics simultaneously govern battery performance. Thus, to elucidate the precise reactions occurring in RFB systems, an appropriate analysis technique that enables the real-time observation of electrokinetic phenomena is indispensable. Herein, we report in operando visualization and analytical study of RFBs by employing a membrane-free microfluidic platform, that is, a membrane-free microfluidic RFB. Using this platform, the electrokinetic investigations were carried out for the 5,10-bis(2-methoxyethyl)-5,10-dihydrophenazine (BMEPZ) catholyte, which has been recently proposed as a high-performance multiredox organic molecule. Taking advantage of the inherent colorimetric property of BMEPZ, we unravel the intrinsic electrochemical properties in terms of charge and mass transfer kinetics during the multiredox reaction through in operando visualization, which enables theoretical study of physicochemical hydrodynamics in electrochemical systems. Based on insights on the electrokinetic limitations in RFBs, we verify the validity of electrode geometry design that can suppress the range of the depletion region, leading to enhanced cell performance.
RESUMO
The novel depth-sensing system presented here revolutionizes structured light (SL) technology by employing metasurfaces and photonic crystal surface-emitting lasers (PCSELs) for efficient facial recognition in monocular depth-sensing. Unlike conventional dot projectors relying on diffractive optical elements (DOEs) and collimators, our system projects approximately 45,700 infrared dots from a compact 297-µm-dimention metasurface, drastically more spots (1.43 times) and smaller (233 times) than the DOE-based dot projector in an iPhone. With a measured field-of-view (FOV) of 158° and a 0.611° dot sampling angle, the system is lens-free and lightweight and boasts lower power consumption than vertical-cavity surface-emitting laser (VCSEL) arrays, resulting in a 5-10 times reduction in power. Utilizing a GaAs-based metasurface and a simplified optical architecture, this innovation not only addresses the drawbacks of traditional SL depth-sensing but also opens avenues for compact integration into wearable devices, offering remarkable advantages in size, power efficiency, and potential for widespread adoption.
RESUMO
BACKGROUND: In high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa. RESULTS: We developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. We also propose two corresponding diagnosis tests on the varying sequencing depth effect for validation. We find that TaxaNorm achieves comparable performance to existing methods in most simulation scenarios in downstream analysis and reaches a higher power for some cases. Specifically, it balances power and false discovery control well. When applying the method in a real dataset, TaxaNorm has improved performance when correcting technical bias. CONCLUSION: TaxaNorm both sample- and taxon- specific bias by introducing an appropriate regression framework in the microbiome data, which aids in data interpretation and visualization. The 'TaxaNorm' R package is freely available through the CRAN repository https://CRAN.R-project.org/package=TaxaNorm and the source code can be downloaded at https://github.com/wangziyue57/TaxaNorm .
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Microbiota , Microbiota/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , AlgoritmosRESUMO
BACKGROUND: Structural variations (SVs) are widespread across genome and have a great impact on evolution, disease, and phenotypic diversity. Despite the development of numerous bioinformatic tools, commonly referred to as SV callers, tailored for detecting SVs using whole genome sequence (WGS) data and employing diverse algorithms, their performance necessitates rigorous evaluation with real data and validated SVs. Moreover, a considerable proportion of these tools have been primarily designed and optimized using human genome data. Consequently, their applicability and performance in Avian species, characterized by smaller genomes and distinct genomic architectures, remain inadequately assessed. RESULTS: We performed a comprehensive assessment of the performance of ten widely used SV callers using population-level real genomic data with the validated five common types of SVs. The performance of SV callers varies with the types and sizes of SVs. As compared with other tools, GRIDSS, Lumpy, Wham, and Manta present better detection accuracy. Pindel can detect more small SVs than others. CNVnator and CNVkit can detect more medium and large copy number variations. Given the poor consistency among different SV callers, the combination calling strategy is not recommended. All tools show poor ability in the detection of insertions (especially with size > 150 bp). At least 50× read depth is required to detect more than 80% of the SVs for most tools. CONCLUSIONS: This study highlights the importance and necessity of using real sequencing data, rather than simulated data only, with validated SVs for SV caller evaluation. Some practical guidance and suggestions are provided for SV detection in future researches.
Assuntos
Galinhas , Sequenciamento Completo do Genoma , Animais , Galinhas/genética , Sequenciamento Completo do Genoma/métodos , Genômica/métodos , Algoritmos , Variação Estrutural do Genoma , Software , Variações do Número de Cópias de DNA , Biologia Computacional/métodos , GenomaRESUMO
BACKGROUND: Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another's effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios. RESULTS: We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality. CONCLUSIONS: We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.