Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 9.864
Filtrar
1.
BMC Bioinformatics ; 21(1): 518, 2020 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-33176676

RESUMO

BACKGROUND: DNBSEQ™ platforms are new massively parallel sequencing (MPS) platforms that use DNA nanoball technology. Use of data generated from DNBSEQ™ platforms to detect single nucleotide variants (SNVs) and small insertions and deletions (indels) has proven to be quite effective, while the feasibility of copy number variants (CNVs) detection is unclear. RESULTS: Here, we first benchmarked different CNV detection tools based on Illumina whole-genome sequencing (WGS) data of NA12878 and then assessed these tools in CNV detection based on DNBSEQ™ sequencing data from the same sample. When the same tool was used, the CNVs detected based on DNBSEQ™ and Illumina data were similar in quantity, length and distribution, while great differences existed within results from different tools and even based on data from a single platform. We further estimated the CNV detection power based on available CNV benchmarks of NA12878 and found similar precision and sensitivity between the DNBSEQ™ and Illumina platforms. We also found higher precision of CNVs shorter than 1 kbp based on DNBSEQ™ platforms than those based on Illumina platforms by using Pindel, DELLY and LUMPY. We carefully compared these two available benchmarks and found a large proportion of specific CNVs between them. Thus, we constructed a more complete CNV benchmark of NA12878 containing 3512 CNV regions. CONCLUSIONS: We assessed and benchmarked CNV detections based on WGS with DNBSEQ™ platforms and provide guidelines for future studies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Genoma Humano , Humanos , Sequenciamento Completo do Genoma
2.
BMC Bioinformatics ; 21(1): 463, 2020 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-33076827

RESUMO

BACKGROUND: Repetitive sequences account for a large proportion of eukaryotes genomes. Identification of repetitive sequences plays a significant role in many applications, such as structural variation detection and genome assembly. Many existing de novo repeat identification pipelines or tools make use of assembly of the high-frequency k-mers to obtain repeats. However, a certain degree of sequence coverage is required for assemblers to get the desired assemblies. On the other hand, assemblers cut the reads into shorter k-mers for assembly, which may destroy the structure of the repetitive regions. For the above reasons, it is difficult to obtain complete and accurate repetitive regions in the genome by using existing tools. RESULTS: In this study, we present a new method called RepAHR for de novo repeat identification by assembly of the high-frequency reads. Firstly, RepAHR scans next-generation sequencing (NGS) reads to find the high-frequency k-mers. Secondly, RepAHR filters the high-frequency reads from whole NGS reads according to certain rules based on the high-frequency k-mer. Finally, the high-frequency reads are assembled to generate repeats by using SPAdes, which is considered as an outstanding genome assembler with NGS sequences. CONLUSIONS: We test RepAHR on five data sets, and the experimental results show that RepAHR outperforms RepARK and REPdenovo for detecting repeats in terms of N50, reference alignment ratio, coverage ratio of reference, mask ratio of Repbase and some other metrics.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Repetitivas de Ácido Nucleico/genética , Software , Animais , Sequência de Bases , Bases de Dados Genéticas , Drosophila melanogaster/genética , Biblioteca Gênica , Genoma Humano , Humanos , Camundongos , Padrões de Referência , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Estatística como Assunto , Fatores de Tempo
3.
Eur J Endocrinol ; 183(5): 497-504, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33107440

RESUMO

Background: Hypophosphataemic rickets (HR) comprise a clinically and genetically heterogeneous group of conditions, defined by renal-tubular phosphate wasting and consecutive loss of bone mineralisation. X-linked hypophosphataemia (XLH) is the most common form, caused by inactivating dominant mutations in PHEX, a gene encompassing 22 exons located at Xp22.1. XLH is treatable by anti-Fibroblast Growth Factor 23 antibody, while for other forms of HR such as therapy may not be indicated. Therefore, a genetic differentiation of HR is recommended. Objective: To develop and validate a next-generation sequencing panel for HR with special focus on PHEX. Design and methods: We designed an AmpliSeq gene panel for the IonTorrent PGM next-generation platform for PHEX and ten other HR-related genes. For validation of PHEX sequencing 50 DNA-samples from XLH-patients, in whom 42 different mutations in PHEX and 1 structural variation have been proven before, were blinded, anonymised and investigated with the NGS panel. In addition, we analyzed one known homozygous DMP1 mutation and two samples of HR-patients, where no pathogenic PHEX mutation had been detected by conventional sequencing. Results: The panel detected all 42 pathogenic missense/nonsense/splice-site/indel PHEX-mutations and in one the known homozygous DMP1 mutation. In the remaining two patients, we revealed a somatic mosaicism of a PHEX mutation in one; as well as two variations in DMP1 and a very rare compound heterozygous variation in ENPP1 in the second patient. Conclusions: This developed NGS panel is a reliable tool with high sensitivity and specificity for the diagnosis of XLH and related forms of HR.


Assuntos
Raquitismo Hipofosfatêmico Familiar/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nefropatias/diagnóstico , Endopeptidase Neutra Reguladora de Fosfato PHEX/análise , Distúrbios do Metabolismo do Fósforo/diagnóstico , Proteínas da Matriz Extracelular/análise , Raquitismo Hipofosfatêmico Familiar/genética , Feminino , Doenças Genéticas Ligadas ao Cromossomo X , Humanos , Nefropatias/genética , Masculino , Mutação , Fosfoproteínas/análise , Distúrbios do Metabolismo do Fósforo/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA
4.
Viruses ; 12(10)2020 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-33066701

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of coronavirus disease 2019 (COVID-19). Sequencing the viral genome as the outbreak progresses is important, particularly in the identification of emerging isolates with different pathogenic potential and to identify whether nucleotide changes in the genome will impair clinical diagnostic tools such as real-time PCR assays. Although single nucleotide polymorphisms and point mutations occur during the replication of coronaviruses, one of the biggest drivers in genetic change is recombination. This can manifest itself in insertions and/or deletions in the viral genome. Therefore, sequencing strategies that underpin molecular epidemiology and inform virus biology in patients should take these factors into account. A long amplicon/read length-based RT-PCR sequencing approach focused on the Oxford Nanopore MinION/GridION platforms was developed to identify and sequence the SARS-CoV-2 genome in samples from patients with or suspected of COVID-19. The protocol, termed Rapid Sequencing Long Amplicons (RSLAs) used random primers to generate cDNA from RNA purified from a sample from a patient, followed by single or multiplex PCRs to generate longer amplicons of the viral genome. The base protocol was used to identify SARS-CoV-2 in a variety of clinical samples and proved sensitive in identifying viral RNA in samples from patients that had been declared negative using other nucleic acid-based assays (false negative). Sequencing the amplicons revealed that a number of patients had a proportion of viral genomes with deletions.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/virologia , Pneumonia Viral/virologia , Betacoronavirus/isolamento & purificação , Técnicas de Laboratório Clínico , Infecções por Coronavirus/diagnóstico , DNA Complementar/análise , DNA Complementar/genética , DNA Viral/análise , DNA Viral/genética , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Epidemiologia Molecular , Reação em Cadeia da Polimerase Multiplex , Pandemias , Pneumonia Viral/diagnóstico , RNA Viral/análise , RNA Viral/genética , Reação em Cadeia da Polimerase em Tempo Real , Análise de Sequência
5.
BMC Bioinformatics ; 21(1): 429, 2020 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-33004007

RESUMO

BACKGROUND: PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. RESULTS: Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. CONCLUSIONS: SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools .


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Arabidopsis/genética , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala/normas , Controle de Qualidade
6.
Nucleic Acids Res ; 48(19): e114, 2020 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-33035301

RESUMO

The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)-sequence differences between paralogous sequences-to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3-90.6%) and BLASR (82.9-90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8-21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Duplicações Segmentares Genômicas , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Humanos
7.
Gigascience ; 9(10)2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-33057676

RESUMO

BACKGROUND: Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources. For many research laboratories, this presents an obstacle, especially in resource-limited environments. FINDINGS: We present IDseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). The IDseq Portal accepts raw mNGS data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline, which results in the assignment of reads and contigs to taxonomic categories. The taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation. Furthermore, IDseq supports environmental background model generation and automatic internal spike-in control recognition, providing statistics that are critical for data interpretation. IDseq was designed with the specific intent of detecting novel pathogens. Here, we benchmark novel virus detection capability using both synthetically evolved viral sequences and real-world samples, including IDseq analysis of a nasopharyngeal swab sample acquired and processed locally in Cambodia from a tourist from Wuhan, China, infected with the recently emergent SARS-CoV-2. CONCLUSION: The IDseq Portal reduces the barrier to entry for mNGS data analysis and enables bench scientists, clinicians, and bioinformaticians to gain insight from mNGS datasets for both known and novel pathogens.


Assuntos
Betacoronavirus/genética , Computação em Nuvem , Infecções por Coronavirus/virologia , Metagenoma , Metagenômica/métodos , Pneumonia Viral/virologia , Betacoronavirus/patogenicidade , Infecções por Coronavirus/diagnóstico , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Pandemias , Pneumonia Viral/diagnóstico , Software
8.
BMC Bioinformatics ; 21(Suppl 14): 369, 2020 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-32998686

RESUMO

BACKGROUND: Chromosome conformation capture-based methods, especially Hi-C, enable scientists to detect genome-wide chromatin interactions and study the spatial organization of chromatin, which plays important roles in gene expression regulation, DNA replication and repair etc. Thus, developing computational methods to unravel patterns behind the data becomes critical. Existing computational methods focus on intrachromosomal interactions and ignore interchromosomal interactions partly because there is no prior knowledge for interchromosomal interactions and the frequency of interchromosomal interactions is much lower while the search space is much larger. With the development of single-cell technologies, the advent of single-cell Hi-C makes interrogating the spatial structure of chromatin at single-cell resolution possible. It also brings a new type of frequency information, the number of single cells with chromatin interactions between two disjoint chromosome regions. RESULTS: Considering the lack of computational methods on interchromosomal interactions and the unsurprisingly frequent intrachromosomal interactions along the diagonal of a chromatin contact map, we propose a computational method dedicated to analyzing interchromosomal interactions of single-cell Hi-C with this new frequency information. To the best of our knowledge, our proposed tool is the first to identify regions with statistically frequent interchromosomal interactions at single-cell resolution. We demonstrate that the tool utilizing networks and binomial statistical tests can identify interesting structural regions through visualization, comparison and enrichment analysis and it also supports different configurations to provide users with flexibility. CONCLUSIONS: It will be a useful tool for analyzing single-cell Hi-C interchromosomal interactions.


Assuntos
Cromossomos/metabolismo , Análise de Célula Única/métodos , Animais , Cromatina/metabolismo , Fase G1 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Camundongos , Células-Tronco Embrionárias Murinas/citologia , Células-Tronco Embrionárias Murinas/metabolismo , Oócitos/citologia , Oócitos/metabolismo , Fase S , Zigoto/citologia , Zigoto/metabolismo
9.
BMC Bioinformatics ; 21(1): 468, 2020 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-33081690

RESUMO

BACKGROUND: Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS: We developed a Classification tool using Discriminative K-mers and Approximate Matching algorithm (CDKAM). This approximate matching method was used for searching k-mers, which included two phases, a quick mapping phase and a dynamic programming phase. Simulated datasets as well as real TGS datasets have been tested to compare the performance of CDKAM with existing methods. We showed that CDKAM performed better in many aspects, especially when classifying TGS data with average length 1000-1500 bases. CONCLUSIONS: CDKAM is an effective program with higher accuracy and lower memory requirement for TGS metagenome sequence classification. It produces a high species-level accuracy.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
10.
Zool Res ; 41(6): 705-708, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33045776

RESUMO

Since the first reported severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in December 2019, coronavirus disease 2019 (COVID-19) has become a global pandemic, spreading to more than 200 countries and regions worldwide. With continued research progress and virus detection, SARS-CoV-2 genomes and sequencing data have been reported and accumulated at an unprecedented rate. To meet the need for fast analysis of these genome sequences, the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB) has established an online coronavirus analysis platform, which includes de novoassembly, BLAST alignment, genome annotation, variant identification, and variant annotation modules. The online analysis platform can be freely accessed at the 2019 Novel Coronavirus Resource (2019nCoVR) (https://bigd.big.ac.cn/ncov/online/tools).


Assuntos
Betacoronavirus/genética , Biologia Computacional/métodos , Infecções por Coronavirus/diagnóstico , Genoma Viral/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pneumonia Viral/diagnóstico , Animais , Betacoronavirus/classificação , Betacoronavirus/fisiologia , China , Biologia Computacional/organização & administração , Infecções por Coronavirus/virologia , Variação Genética , Humanos , Internet , Anotação de Sequência Molecular , Pandemias , Pneumonia Viral/virologia
11.
Nat Protoc ; 15(10): 3264-3283, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32913232

RESUMO

We recently introduced Cleavage Under Targets & Tagmentation (CUT&Tag), an epigenomic profiling strategy in which antibodies are bound to chromatin proteins in situ in permeabilized nuclei. These antibodies are then used to tether the cut-and-paste transposase Tn5. Activation of the transposase simultaneously cleaves DNA and adds adapters ('tagmentation') for paired-end DNA sequencing. Here, we introduce a streamlined CUT&Tag protocol that suppresses DNA accessibility artefacts to ensure high-fidelity mapping of the antibody-targeted protein and improves the signal-to-noise ratio over current chromatin profiling methods. Streamlined CUT&Tag can be performed in a single PCR tube, from cells to amplified libraries, providing low-cost genome-wide chromatin maps. By simplifying library preparation CUT&Tag requires less than a day at the bench, from live cells to sequencing-ready barcoded libraries. As a result of low background levels, barcoded and pooled CUT&Tag libraries can be sequenced for as little as $25 per sample. This enables routine genome-wide profiling of chromatin proteins and modifications and requires no special skills or equipment.


Assuntos
Cromatina/genética , Mapeamento Cromossômico/métodos , Epigenômica/métodos , Sequência de Bases , DNA/genética , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Histonas/metabolismo , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Transposases/genética , Transposases/metabolismo
12.
PLoS One ; 15(9): e0238984, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32966312

RESUMO

Garcinia kola (Heckel) is a versatile tree indigenous to West and Central Africa. All parts of the tree have value in traditional medicine. Natural populations of the species have declined over the years due to overexploitation. Assessment of genetic diversity and population structure of G. kola is important for its management and conservation. The present study investigates the genetic diversity and population structure of G. kola populations in Benin using ultra-high-throughput diversity array technology (DArT) single nucleotide polymorphism (SNP) markers. From the 102 accessions sampled, two were excluded from the final dataset owing to poor genotyping coverage. A total of 43,736 SNPs were reported, of which 12,585 were used for analyses after screening with quality control parameters including Minor allele frequency (≥ 0.05), call rate (≥ 80%), reproducibility (≥ 95%), and polymorphic information content (≥ 1%). Analysis revealed low genetic diversity with expected heterozygosity per population ranging from 0.196 to 0.228. Pairwise F-statistics (FST) revealed low levels of genetic differentiation between populations while an Analysis of molecular variance (AMOVA) indicated that the majority of variation (97.86%) was within populations. Population structure analysis through clustering and discriminant analysis on principal component revealed two admixed clusters, implying little genetic structure. However, the model-based maximum likelihood in Admixture indicated only one genetic cluster. The present study indicated low genetic diversity of G. kola, and interventions are needed to be tailored towards its conservation.


Assuntos
Garcinia kola/genética , Alelos , Benin , Frequência do Gene/genética , Variação Genética/genética , Genética Populacional/métodos , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes
13.
BMC Med Genet ; 21(1): 173, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32867697

RESUMO

BACKGROUND: Alström syndrome is a rare recessively inherited disorder caused by variants in the ALMS1 gene. It is characterized by multiple organ dysfunction, including cone-rod retinal dystrophy, dilated cardiomyopathy, hearing loss, obesity, insulin resistance, hyperinsulinemia, type 2 diabetes mellitus and systemic fibrosis. Heterogeneity and age-dependent development of clinical manifestations make it difficult to obtain a clear diagnosis, especially in pediatric patients. CASE PRESENTATION: Here we report the case of a girl with Alström syndrome. Genetic examination was proposed at age 22 months when suspected macular degeneration was the only major finding. Next generation sequencing of a panel of genes linked to eye-related pathologies revealed two compound heterozygous variants in the ALMS1 gene. Frameshift variants c.1196_1202del, p.(Thr399Lysfs*11), rs761292021 and c.11310_11313del, (p.Glu3771Trpfs*18), rs747272625 were detected in exons 5 and 16, respectively. Both variants cause frameshifts and generation of a premature stop-codon that probably leads to mRNA nonsense-mediated decay. Validation and segregation of ALMS1 variants were confirmed by Sanger sequencing. CONCLUSIONS: Genetic testing makes it possible, even in childhood, to increase the number of correct diagnoses of patients who have ambiguous phenotypes caused by rare genetic variants. The development of high-throughput sequencing technologies offers an exceptionally valuable screening tool for clear genetic diagnoses and ensures early multidisciplinary management and treatment of the emerging symptoms.


Assuntos
Síndrome de Alstrom/genética , Proteínas de Ciclo Celular/genética , Diagnóstico Precoce , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Síndrome de Alstrom/diagnóstico , Códon sem Sentido , Feminino , Mutação da Fase de Leitura , Heterozigoto , Humanos , Lactente
14.
PLoS One ; 15(9): e0238893, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32956361

RESUMO

Utilization of murine models remains a valuable tool in biomedical research, yet, disease phenotype of mice across studies can vary considerably. With advances in next generation sequencing, it is increasingly recognized that inconsistencies in host phenotype can be attributed, at least in part, to differences in gut bacterial composition. Research with inbred murine strains demonstrates that housing conditions play a significant role in variations of gut bacterial composition, however, few studies have assessed whether observed variation influences host phenotype in response to an intervention. Our study initially sought to examine the effects of a long-term (9-months) dietary intervention (i.e., diets with distinct fatty acid compositions) on the metabolic health, in particular glucose homeostasis, of genetically-outbred male and female CD-1 mice. Yet, mice were shipped from two different husbandry facilities of the same commercial vendor (Cohort A and B, respectively), and we observed throughout the study that diet, sex, and aging differentially influenced the metabolic phenotype of mice depending on their husbandry facility of origin. Examination of the colonic bacteria of mice revealed distinct bacterial compositions, including 23 differentially abundant genera and an enhanced alpha diversity in mice of Cohort B compared to Cohort A. We also observed that a distinct metabolic phenotype was linked with these differentially abundant bacteria and indices of alpha diversity. Our findings support that metabolic phenotypic variation of mice of the same strain but shipped from different husbandry facilities may be influenced by their colonic bacterial community structure. Our work is an important precautionary note for future research of metabolic diseases via mouse models, particularly those that seek to examine factors such diet, sex, and aging.


Assuntos
Bactérias/classificação , Dieta/efeitos adversos , Fezes/microbiologia , Glucose/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Camundongos Endogâmicos/genética , Criação de Animais Domésticos , Animais , Bactérias/efeitos dos fármacos , Bactérias/genética , Bactérias/isolamento & purificação , Feminino , Microbioma Gastrointestinal/efeitos dos fármacos , Masculino , Camundongos , Modelos Animais , Fenótipo , Filogenia , Análise de Sequência de DNA
15.
PLoS One ; 15(9): e0239850, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32986766

RESUMO

Massively parallel sequencing (MPS) has revolutionised clinical genetics and research within human genetics by enabling the detection of variants in multiple genes in several samples at the same time. Today, multiple approaches for MPS of DNA are available, including targeted gene sequencing (TGS) panels, whole exome sequencing (WES), and whole genome sequencing (WGS). As MPS is becoming an integrated part of the work in genetic laboratories, it is important to investigate the variant detection performance of the various MPS methods. We compared the results of single nucleotide variant (SNV) detection of three MPS methods: WGS, WES, and HaloPlex target enrichment sequencing (HES) using matched DNA of 10 individuals. The detection performance was investigated in 100 genes associated with cardiomyopathies and channelopathies. The results showed that WGS overall performed better than those of WES and HES. WGS had a more uniform and widespread coverage of the investigated regions compared to WES and HES, which both had a right-skewed coverage distribution and difficulties in covering regions and genes with high GC-content. WGS and WES showed roughly the same high sensitivities for detection of SNVs, whereas HES showed a lower sensitivity due to a higher number of false negative results.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Exoma/métodos , Alelos , Cardiomiopatias/genética , Canalopatias/genética , Exoma , Genoma Humano , Genótipo , Humanos , Sensibilidade e Especificidade , Análise de Sequência de DNA/métodos
16.
BMC Infect Dis ; 20(1): 648, 2020 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-32883215

RESUMO

BACKGROUND: Due to the frequent reassortment and zoonotic potential of influenza A viruses, rapid gain of sequence information is crucial. Alongside established next-generation sequencing protocols, the MinION sequencing device (Oxford Nanopore Technologies) has become a serious competitor for routine whole-genome sequencing. Here, we established a novel, rapid and high-throughput MinION multiplexing workflow based on a universal RT-PCR. METHODS: Twelve representative influenza A virus samples of multiple subtypes were universally amplified in a one-step RT-PCR and subsequently sequenced on the MinION instrument in conjunction with a barcoding library preparation kit from the rapid family and the MinIT performing live base-calling. The identical PCR products were sequenced on an IonTorrent platform and, after final consensus assembly, all data was compared for validation. To prove the practicability of the MinION-MinIT method in human and veterinary diagnostics, we sequenced recent and historical influenza strains for further benchmarking. RESULTS: The MinION-MinIT combination generated over two million reads for twelve samples in a six-hour sequencing run, from which a total of 72% classified as quality screened, trimmed and mapped influenza reads to produce full genome sequences. Identities between the datasets of > 99.9% were achieved, with 100% coverage of all segments alongside a sufficient confidence and 4492fold mean depth. From RNA extraction to finished sequences, only 14 h were required. CONCLUSIONS: Overall, we developed and validated a novel and rapid multiplex workflow for influenza A virus sequencing. This protocol suits both clinical and academic settings, aiding in real time diagnostics and passive surveillance.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Vírus da Influenza A/genética , Sequenciamento por Nanoporos/métodos , Humanos , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Fluxo de Trabalho
17.
PLoS Comput Biol ; 16(9): e1008173, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32946435

RESUMO

Single-cell Hi-C (scHi-C) interrogates genome-wide chromatin interaction in individual cells, allowing us to gain insights into 3D genome organization. However, the extremely sparse nature of scHi-C data poses a significant barrier to analysis, limiting our ability to tease out hidden biological information. In this work, we approach this problem by applying topic modeling to scHi-C data. Topic modeling is well-suited for discovering latent topics in a collection of discrete data. For our analysis, we generate nine different single-cell combinatorial indexed Hi-C (sci-Hi-C) libraries from five human cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1), consisting over 19,000 cells. We demonstrate that topic modeling is able to successfully capture cell type differences from sci-Hi-C data in the form of "chromatin topics." We further show enrichment of particular compartment structures associated with locus pairs in these topics.


Assuntos
Cromatina , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Linhagem Celular , Cromatina/química , Cromatina/genética , Análise por Conglomerados , Biblioteca Gênica , Humanos , Processamento de Linguagem Natural
18.
BMC Bioinformatics ; 21(Suppl 13): 388, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32938392

RESUMO

BACKGROUND: In Overlap-Layout-Consensus (OLC) based de novo assembly, all reads must be compared with every other read to find overlaps. This makes the process rather slow and limits the practicality of using de novo assembly methods at a large scale in the field. Darwin is a fast and accurate read overlapper that can be used for de novo assembly of state-of-the-art third generation long DNA reads. Darwin is designed to be hardware-friendly and can be accelerated on specialized computer system hardware to achieve higher performance. RESULTS: This work accelerates Darwin on GPUs. Using real Pacbio data, our GPU implementation on Tesla K40 has shown a speedup of 109x vs 8 CPU threads of an Intel Xeon machine and 24x vs 64 threads of IBM Power8 machine. The GPU implementation supports both linear and affine gap, scoring model. The results show that the GPU implementation can achieve the same high speedup for different scoring schemes. CONCLUSIONS: The GPU implementation proposed in this work shows significant improvement in performance compared to the CPU version, thereby making it accessible for utilization as a practical read overlapper in a DNA assembly pipeline. Furthermore, our GPU acceleration can also be used for performing fast Smith-Waterman alignment between long DNA reads. GPU hardware has become commonly available in the field today, making the proposed acceleration accessible to a larger public. The implementation is available at https://github.com/Tongdongq/darwin-gpu .


Assuntos
Algoritmos , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos
19.
BMC Bioinformatics ; 21(Suppl 8): 299, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938362

RESUMO

BACKGROUND: The development of Next Generation Sequencing (NGS) has had a major impact on the study of genetic sequences. Among problems that researchers in the field have to face, one of the most challenging is the taxonomic classification of metagenomic reads, i.e., identifying the microorganisms that are present in a sample collected directly from the environment. The analysis of environmental samples (metagenomes) are particularly important to figure out the microbial composition of different ecosystems and it is used in a wide variety of fields: for instance, metagenomic studies in agriculture can help understanding the interactions between plants and microbes, or in ecology, they can provide valuable insights into the functions of environmental communities. RESULTS: In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. The tool LiME (Lightweight Metagenomics via eBWT) is available at https://github.com/veronicaguerrini/LiME . CONCLUSIONS: In order to assess the reliability of our approach, we run several experiments on NGS data from two simulated metagenomes among those provided in benchmarking analysis and on a real metagenome from the Human Microbiome Project. The experiment results on the simulated data show that LiME is competitive with the widely used taxonomic classifiers. It achieves high levels of precision and specificity - e.g. 99.9% of the positive control reads are correctly assigned and the percentage of classified reads of the negative control is less than 0.01% - while keeping a high sensitivity. On the real metagenome, we show that LiME is able to deliver classification results comparable to that of MagicBlast. Overall, the experiments confirm the effectiveness of our method and its high accuracy even in negative control samples.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Humanos , Reprodutibilidade dos Testes
20.
Commun Biol ; 3(1): 538, 2020 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-32994472

RESUMO

The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo, a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per sample on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome sample, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.


Assuntos
Betacoronavirus/genética , Biologia Computacional/métodos , Genoma Humano , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Betacoronavirus/patogenicidade , Telefone Celular/instrumentação , Biologia Computacional/instrumentação , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/virologia , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Nanoporos , Pandemias , Pneumonia Viral/diagnóstico , Pneumonia Viral/virologia , Sequenciamento Completo do Genoma/instrumentação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA