Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Nat Med ; 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38689063

RESUMO

Despite substantial progress in cancer microbiome research, recognized confounders and advances in absolute microbiome quantification remain underused; this raises concerns regarding potential spurious associations. Here we study the fecal microbiota of 589 patients at different colorectal cancer (CRC) stages and compare observations with up to 15 published studies (4,439 patients and controls total). Using quantitative microbiome profiling based on 16S ribosomal RNA amplicon sequencing, combined with rigorous confounder control, we identified transit time, fecal calprotectin (intestinal inflammation) and body mass index as primary microbial covariates, superseding variance explained by CRC diagnostic groups. Well-established microbiome CRC targets, such as Fusobacterium nucleatum, did not significantly associate with CRC diagnostic groups (healthy, adenoma and carcinoma) when controlling for these covariates. In contrast, the associations of Anaerococcus vaginalis, Dialister pneumosintes, Parvimonas micra, Peptostreptococcus anaerobius, Porphyromonas asaccharolytica and Prevotella intermedia remained robust, highlighting their future target potential. Finally, control individuals (age 22-80 years, mean 57.7 years, standard deviation 11.3) meeting criteria for colonoscopy (for example, through a positive fecal immunochemical test) but without colonic lesions are enriched for the dysbiotic Bacteroides2 enterotype, emphasizing uncertainties in defining healthy controls in cancer microbiome research. Together, these results indicate the importance of quantitative microbiome profiling and covariate control for biomarker identification in CRC microbiome studies.

2.
Proc Natl Acad Sci U S A ; 120(31): e2301536120, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37487069

RESUMO

Colorectal cancers (CRCs) form a heterogenous group classified into epigenetic and transcriptional subtypes. The basis for the epigenetic subtypes, exemplified by varying degrees of promoter DNA hypermethylation, and its relation to the transcriptional subtypes is not well understood. We link cancer-specific transcription factor (TF) expression alterations to methylation alterations near TF-binding sites at promoter and enhancer regions in CRCs and their premalignant precursor lesions to provide mechanistic insights into the origins and evolution of the CRC molecular subtypes. A gradient of TF expression changes forms a basis for the subtypes of abnormal DNA methylation, termed CpG-island promoter DNA methylation phenotypes (CIMPs), in CRCs and other cancers. CIMP is tightly correlated with cancer-specific hypermethylation at enhancers, which we term CpG-enhancer methylation phenotype (CEMP). Coordinated promoter and enhancer methylation appears to be driven by downregulation of TFs with common binding sites at the hypermethylated enhancers and promoters. The altered expression of TFs related to hypermethylator subtypes occurs early during CRC development, detectable in premalignant adenomas. TF-based profiling further identifies patients with worse overall survival. Importantly, altered expression of these TFs discriminates the transcriptome-based consensus molecular subtypes (CMS), thus providing a common basis for CIMP and CMS subtypes.


Assuntos
Neoplasias Colorretais , Lesões Pré-Cancerosas , Humanos , Fatores de Transcrição , Regulação da Expressão Gênica , Metilação de DNA , Epigênese Genética
3.
Chem Res Toxicol ; 36(7): 1028-1036, 2023 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-37327474

RESUMO

The search for chemical hit material is a lengthy and increasingly expensive drug discovery process. To improve it, ligand-based quantitative structure-activity relationship models have been broadly applied to optimize primary and secondary compound properties. Although these models can be deployed as early as the stage of molecule design, they have a limited applicability domain─if the structures of interest differ substantially from the chemical space on which the model was trained, a reliable prediction will not be possible. Image-informed ligand-based models partly solve this shortcoming by focusing on the phenotype of a cell caused by small molecules, rather than on their structure. While this enables chemical diversity expansion, it limits the application to compounds physically available and imaged. Here, we employ an active learning approach to capitalize on both of these methods' strengths and boost the model performance of a mitochondrial toxicity assay (Glu/Gal). Specifically, we used a phenotypic Cell Painting screen to build a chemistry-independent model and adopted the results as the main factor in selecting compounds for experimental testing. With the additional Glu/Gal annotation for selected compounds we were able to dramatically improve the chemistry-informed ligand-based model with respect to the increased recognition of compounds from a 10% broader chemical space.


Assuntos
Aprendizado Profundo , Relação Quantitativa Estrutura-Atividade , Ligantes , Descoberta de Drogas/métodos
4.
Arthritis Rheumatol ; 75(5): 673-684, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36409582

RESUMO

OBJECTIVE: CD4+ T cells are implicated in rheumatoid arthritis (RA) pathology from the strong association between RA and certain HLA class II gene variants. This study was undertaken to examine the synovial T cell receptor (TCR) repertoire, T cell phenotypes, and T cell specificities in small joints of RA patients at time of diagnosis before therapeutic intervention. METHODS: Sixteen patients, of whom 11 patients were anti-citrullinated protein antibody (ACPA)-positive and 5 patients were ACPA-, underwent ultrasound-guided synovial biopsy of a small joint (n = 13) or arthroscopic synovial biopsy of a large joint (n = 3), followed by direct sorting of single T cells for paired sequencing of the αß TCR together with flow cytometry analysis. TCRs from expanded CD4+ T cell clones of 4 patients carrying an HLA-DRB1*04:01 allele were artificially reexpressed to study antigen specificity. RESULTS: T cell analysis demonstrated CD4+ dominance and the presence of peripheral helper T-like cells in both patient groups. We identified >4,000 unique TCR sequences, as well as 225 clonal expansions. Additionally, T cells with double α-chains were a recurring feature. We identified a biased gene usage of the Vß chain segment TRBV20-1 in CD4+ cells from ACPA+ patients. In vitro stimulation of T cell lines expressing selected TCRs with an extensive panel of citrullinated and viral peptides identified several different virus-specific TCRs (e.g., human cytomegalovirus and human herpesvirus 2). Still, the majority of clones remained orphans with unknown specificity. CONCLUSION: Minimally invasive biopsies of the RA synovium allow for single-cell TCR sequencing and phenotyping. Clonally expanded, viral-reactive T cells account for part of the diverse CD4+ T cell repertoire. TRBV20-1 bias in ACPA+ patients suggests recognition of common antigens.


Assuntos
Artrite Reumatoide , Humanos , Membrana Sinovial/patologia , Linfócitos T CD4-Positivos , Receptores de Antígenos de Linfócitos T/genética , Cadeias HLA-DRB1/genética
5.
Genome Biol ; 23(1): 55, 2022 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-35172874

RESUMO

BACKGROUND: Multiplexing of samples in single-cell RNA-seq studies allows a significant reduction of the experimental costs, straightforward identification of doublets, increased cell throughput, and reduction of sample-specific batch effects. Recently published multiplexing techniques using oligo-conjugated antibodies or -lipids allow barcoding sample-specific cells, a process called "hashing." RESULTS: Here, we compare the hashing performance of TotalSeq-A and -C antibodies, custom synthesized lipids and MULTI-seq lipid hashes in four cell lines, both for single-cell RNA-seq and single-nucleus RNA-seq. We also compare TotalSeq-B antibodies with CellPlex reagents (10x Genomics) on human PBMCs and TotalSeq-B with different lipids on primary mouse tissues. Hashing efficiency was evaluated using the intrinsic genetic variation of the cell lines and mouse strains. Antibody hashing was further evaluated on clinical samples using PBMCs from healthy and SARS-CoV-2 infected patients, where we demonstrate a more affordable approach for large single-cell sequencing clinical studies, while simultaneously reducing batch effects. CONCLUSIONS: Benchmarking of different hashing strategies and computational pipelines indicates that correct demultiplexing can be achieved with both lipid- and antibody-hashed human cells and nuclei, with MULTISeqDemux as the preferred demultiplexing function and antibody-based hashing as the most efficient protocol on cells. On nuclei datasets, lipid hashing delivers the best results. Lipid hashing also outperforms antibodies on cells isolated from mouse brain. However, antibodies demonstrate better results on tissues like spleen or lung.


Assuntos
COVID-19/sangue , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Animais , Anticorpos/química , Estudos de Casos e Controles , Linhagem Celular Tumoral , Núcleo Celular/química , Humanos , Lipídeos/química , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Neutrófilos/química , Neutrófilos/imunologia , Neutrófilos/virologia
6.
Nat Protoc ; 15(7): 2247-2276, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32561888

RESUMO

This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.


Assuntos
Redes Reguladoras de Genes , Análise de Célula Única/métodos , Fluxo de Trabalho , Animais , Linhagem Celular Tumoral , Humanos , Camundongos
8.
Stem Cell Reports ; 11(2): 363-379, 2018 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-30057263

RESUMO

Tauopathies such as frontotemporal dementia (FTD) remain incurable to date, partially due to the lack of translational in vitro disease models. The MAPT gene, encoding the microtubule-associated protein tau, has been shown to play an important role in FTD pathogenesis. Therefore, we used zinc finger nucleases to introduce two MAPT mutations into healthy donor induced pluripotent stem cells (iPSCs). The IVS10+16 mutation increases the expression of 4R tau, while the P301S mutation is pro-aggregant. Whole-transcriptome analysis of MAPT IVS10+16 neurons reveals neuronal subtype differences, reduced neural progenitor proliferation potential, and aberrant WNT/SHH signaling. Notably, these neurodevelopmental phenotypes could be recapitulated in neurons from patients carrying the MAPT IVS10+16 mutation. Moreover, the additional pro-aggregant P301S mutation revealed additional phenotypes, such as an increased calcium burst frequency, reduced lysosomal acidity, tau oligomerization, and neurodegeneration. This series of iPSCs could serve as a platform to unravel a potential link between pathogenic 4R tau and FTD.

9.
Genome Med ; 9(1): 80, 2017 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-28854983

RESUMO

The identification of functional non-coding mutations is a key challenge in the field of genomics. Here we introduce µ-cisTarget to filter, annotate and prioritize cis-regulatory mutations based on their putative effect on the underlying "personal" gene regulatory network. We validated µ-cisTarget by re-analyzing the TAL1 and LMO1 enhancer mutations in T-ALL, and the TERT promoter mutation in melanoma. Next, we re-sequenced the full genomes of ten cancer cell lines and used matched transcriptome data and motif discovery to identify master regulators with de novo binding sites that result in the up-regulation of nearby oncogenic drivers. µ-cisTarget is available from http://mucistarget.aertslab.org .


Assuntos
Análise Mutacional de DNA/métodos , Redes Reguladoras de Genes , Genes Neoplásicos , Mutação , Neoplasias/genética , Sequências Reguladoras de Ácido Nucleico , Algoritmos , Sítios de Ligação , Linhagem Celular Tumoral , Feminino , Perfilação da Expressão Gênica , Genômica/métodos , Humanos , Masculino , Neoplasias/metabolismo , Medicina de Precisão/métodos , Fatores de Transcrição/metabolismo
10.
BMC Bioinformatics ; 18(1): 273, 2017 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-28545391

RESUMO

BACKGROUND: Alternative gene splicing is a common phenomenon in which a single gene gives rise to multiple transcript isoforms. The process is strictly guided and involves a multitude of proteins and regulatory complexes. Unfortunately, aberrant splicing events do occur which have been linked to genetic disorders, such as several types of cancer and neurodegenerative diseases (Fan et al., Theor Biol Med Model 3:19, 2006). Therefore, understanding the mechanism of alternative splicing and identifying the difference in splicing events between diseased and healthy tissue is crucial in biomedical research with the potential of applications in personalized medicine as well as in drug development. RESULTS: We propose a linear mixed model, Random Effects for the Identification of Differential Splicing (REIDS), for the identification of alternative splicing events. Based on a set of scores, an exon score and an array score, a decision regarding alternative splicing can be made. The model enables the ability to distinguish a differential expressed gene from a differential spliced exon. The proposed model was applied to three case studies concerning both exon and HTA arrays. CONCLUSION: The REIDS model provides a work flow for the identification of alternative splicing events relying on the established linear mixed model. The model can be applied to different types of arrays.


Assuntos
Processamento Alternativo , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transcriptoma , Área Sob a Curva , Neoplasias do Colo/genética , Neoplasias do Colo/metabolismo , Neoplasias do Colo/patologia , Éxons , Humanos , Proteínas com Domínio LIM/genética , Proteínas dos Microfilamentos/genética , Isoformas de Proteínas/genética , Curva ROC
11.
PLoS One ; 12(3): e0174575, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28358893

RESUMO

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/genética , Software , Transcriptoma/genética , Algoritmos , Biologia Computacional , Genômica , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
12.
BMC Bioinformatics ; 16: 379, 2015 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-26554718

RESUMO

BACKGROUND: Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. RESULTS: For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. CONCLUSIONS: We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.


Assuntos
Infecções por HIV/genética , HIV-1/genética , Hepacivirus/genética , Hepatite C/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Análise por Conglomerados , Simulação por Computador , Genoma Viral , Infecções por HIV/virologia , Hepatite C/virologia , Humanos , Plasmídeos/genética , Análise de Regressão
13.
PLoS One ; 10(7): e0132868, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26182406

RESUMO

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.


Assuntos
Algoritmos , Exoma , Genoma Humano , Alinhamento de Sequência/economia , Software , Benchmarking , Mapeamento de Sequências Contíguas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos
14.
J Virol Methods ; 221: 29-38, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25917877

RESUMO

Massively parallel sequencing (MPS) technology has opened new avenues to study viral dynamics and treatment-induced resistance mechanisms of infections such as human immunodeficiency virus (HIV) and hepatitis C virus (HCV). Whereas the Roche/454 platform has been used widely for the detection of low-frequent drug resistant variants, more recently developed short-read MPS technologies have the advantage of delivering a higher sequencing depth at a lower cost per sequenced base. This study assesses the performance characteristics of Illumina MPS technology for the characterization of genetic variability in viral populations by deep sequencing. The reported results from MPS experiments comprising HIV and HCV plasmids demonstrate that a 0.5-1% lower limit of detection can be achieved readily with Illumina MPS while retaining good accuracy also at low frequencies. Deep sequencing of a set of clinical samples (12 HIV and 9 HCV patients), designed at a similar budget for both MPS platforms, reveals a comparable lower limit of detection for Illumina and Roche/454. Finally, this study shows the possibility to apply Illumina's paired-end sequencing as a strategy to assess linkage between different mutations identified in individual viral subspecies. These results support the use of Illumina as another MPS platform of choice for deep sequencing of viral minority species.


Assuntos
Variação Genética , HIV/classificação , HIV/genética , Hepacivirus/classificação , Hepacivirus/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Virologia/métodos , HIV/isolamento & purificação , Infecções por HIV/virologia , Hepacivirus/isolamento & purificação , Hepatite C/virologia , Humanos
15.
BMC Bioinformatics ; 16: 59, 2015 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-25887734

RESUMO

BACKGROUND: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. RESULTS: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. CONCLUSIONS: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.


Assuntos
Algoritmos , Variação Genética/genética , Hepacivirus/genética , Hepatite C/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Software , Análise por Conglomerados , Genoma Viral , Genômica/métodos , Hepatite C/virologia , Humanos , Sensibilidade e Especificidade , Análise de Sequência de DNA/métodos
16.
Bioinformatics ; 31(15): 2482-8, 2015 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25819078

RESUMO

MOTIVATION: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. RESULTS: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.


Assuntos
Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos
17.
Bioinformatics ; 31(1): 94-101, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25178459

RESUMO

MOTIVATION: In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. RESULTS: A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. AVAILABILITY: The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory.


Assuntos
Algoritmos , Variação Genética/genética , Genômica/métodos , Hepacivirus/genética , Hepatite C/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Genoma Viral , Hepatite C/virologia , Humanos
18.
Nat Commun ; 5: 4767, 2014 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-25182477

RESUMO

The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect.


Assuntos
Criopreservação , Variações do Número de Cópias de DNA , Genoma Humano , Transcriptoma , Adaptação Fisiológica/genética , Sequência de Bases , Proliferação de Células , Sobrevivência Celular/genética , Células Clonais , Perfilação da Expressão Gênica , Instabilidade Genômica , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cariótipo , Dados de Sequência Molecular , Plasmídeos/química , Plasmídeos/metabolismo , Transformação Genética
19.
Elife ; 3: e02725, 2014 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25085081

RESUMO

DNA replication errors that persist as mismatch mutations make up the molecular fingerprint of mismatch repair (MMR)-deficient tumors and convey them with resistance to standard therapy. Using whole-genome and whole-exome sequencing, we here confirm an MMR-deficient mutation signature that is distinct from other tumor genomes, but surprisingly similar to germ-line DNA, indicating that a substantial fraction of human genetic variation arises through mutations escaping MMR. Moreover, we identify a large set of recurrent indels that may serve to detect microsatellite instability (MSI). Indeed, using endometrial tumors with immunohistochemically proven MMR deficiency, we optimize a novel marker set capable of detecting MSI and show it to have greater specificity and selectivity than standard MSI tests. Additionally, we show that recurrent indels are enriched for the 'DNA double-strand break repair by homologous recombination' pathway. Consequently, DSB repair is reduced in MMR-deficient tumors, triggering a dose-dependent sensitivity of MMR-deficient tumor cultures to DSB inducers.


Assuntos
Biomarcadores Tumorais/genética , Quebras de DNA de Cadeia Dupla , Neoplasias do Endométrio/genética , Mutação INDEL , Repetições de Microssatélites , Neoplasias Ovarianas/genética , Pareamento Incorreto de Bases , Impressões Digitais de DNA , Reparo de Erro de Pareamento de DNA , Neoplasias do Endométrio/diagnóstico , Neoplasias do Endométrio/patologia , Feminino , Recombinação Homóloga , Humanos , Instabilidade de Microssatélites , Estadiamento de Neoplasias , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/patologia , Sensibilidade e Especificidade
20.
Genome Med ; 5(11): 106, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24286536

RESUMO

BACKGROUND: Tumor cells in the blood of patients with metastatic carcinomas are associated with poor survival. Knowledge of the cells' genetic make-up can help to guide targeted therapy. We evaluated the efficiency and quality of isolation and amplification of DNA from single circulating tumor cells (CTC). METHODS: The efficiency of the procedure was determined by spiking blood with SKBR-3 cells, enrichment with the CellSearch system, followed by single cell sorting by fluorescence-activated cell sorting (FACS) and whole genome amplification. A selection of single cell DNA from fixed and unfixed SKBR-3 cells was exome sequenced and the DNA quality analyzed. Single CTC from patients with lung cancer were used to demonstrate the potential of single CTC molecular characterization. RESULTS: The overall efficiency of the procedure from spiked cell to amplified DNA was approximately 20%. Losses attributed to the CellSearch system were around 20%, transfer to FACS around 25%, sorting around 5% and DNA amplification around 25%. Exome sequencing revealed that the quality of the DNA was affected by the fixation of the cells, amplification, and the low starting quantity of DNA. A single fixed cell had an average coverage at 20× depth of 30% when sequencing to an average of 40× depth, whereas a single unfixed cell had 45% coverage. GenomiPhi-amplified genomic DNA had a coverage of 72% versus a coverage of 87% of genomic DNA. Twenty-one percent of the CTC from patients with lung cancer identified by the CellSearch system could be isolated individually and amplified. CONCLUSIONS: CTC enriched by the CellSearch system were sorted by FACS, and DNA retrieved and amplified with an overall efficiency of 20%. Analysis of the sequencing data showed that this DNA could be used for variant calling, but not for quantitative measurements such as copy number detection. Close to 55% of the exome of single SKBR-3 cells were successfully sequenced to 20× depth making it possible to call 72% of the variants. The overall coverage was reduced to 30% at 20× depth, making it possible to call 56% of the variants in CellSave-fixed cells.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...