RESUMO
MOTIVATION: Somatic mosaicism has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6 bp and comprise >1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs is lacking. RESULTS: We introduce prancSTR, a novel method for detecting mosaic STRs from individual high-throughput sequencing datasets. prancSTR is designed to detect loci characterized by a single high-frequency mosaic allele, but can also detect loci with multiple mosaic alleles. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mosaic STRs in simulated data, demonstrate its feasibility by identifying candidate mosaic STRs in Illumina whole genome sequencing data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project, and evaluate the use of prancSTR on Element and PacBio data. In addition to prancSTR, we present simTR, a novel simulation framework which simulates raw sequencing reads with realistic error profiles at STRs. AVAILABILITY AND IMPLEMENTATION: prancSTR and simTR are freely available at https://github.com/gymrek-lab/trtools. Detailed documentation is available at https://trtools.readthedocs.io/.
Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Mosaicismo , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alelos , SoftwareRESUMO
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
Assuntos
Variação Genética , Genoma Humano , Sequências de Repetição em Tandem , Humanos , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos/métodosRESUMO
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.
RESUMO
Motivation: Somatic mosaicism, in which a mutation occurs post-zygotically, has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6bp and comprise more than 1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs (mSTRs) is lacking. Results: We introduce prancSTR, a novel method for detecting mSTRs from individual high-throughput sequencing datasets. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mSTRs in simulated data and demonstrate its feasibility by identifying candidate mSTRs in whole genome sequencing (WGS) data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project. Our analysis identified an average of 76 and 577 non-homopolymer and homopolymer mSTRs respectively per cell line as well as multiple cell lines with outlier mSTR counts more than 6 times the population average, suggesting a subset of cell lines have particularly high STR instability rates. Availability: prancSTR is freely available at https://github.com/gymrek-lab/trtools. Documentation: Detailed documentation is available at https://trtools.readthedocs.io/.
RESUMO
Fundamental restoration ecology and community ecology theories can help us better understand the underlying mechanisms of fecal microbiota transplantation (FMT) and to better design future microbial therapeutics for recurrent Clostridioides difficile infections (rCDI) and other dysbiosis-related conditions. In this study, stool samples were collected from donors and rCDI patients one week prior to FMT (pre-FMT), as well as from patients one week following FMT (post-FMT). Using metagenomic sequencing and machine learning, our results suggested that FMT outcome is not only dependent on the ecological structure of the recipients, but also the interactions between the donor and recipient microbiomes at the taxonomical and functional levels. We observed that the presence of specific bacteria in donors (Clostridioides spp., Desulfovibrio spp., Odoribacter spp. and Oscillibacter spp.) and the absence of fungi (Yarrowia spp.) and bacteria (Wigglesworthia spp.) in recipients prior to FMT could predict FMT success. Our results also suggested a series of interlocked mechanisms for FMT success, including the repair of the disturbed gut ecosystem by transient colonization of nexus species followed by secondary succession of bile acid metabolizers, sporulators, and short chain fatty acid producers.