Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
BMC Med Genomics ; 13(1): 156, 2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-33059707

RESUMO

BACKGROUND: Treating cancer depends in part on identifying the mutations driving each patient's disease. Many clinical laboratories are adopting high-throughput sequencing for assaying patients' tumours, applying targeted panels to formalin-fixed paraffin-embedded tumour tissues to detect clinically-relevant mutations. While there have been some benchmarking and best practices studies of this scenario, much variant calling work focuses on whole-genome or whole-exome studies, with fresh or fresh-frozen tissue. Thus, definitive guidance on best choices for sequencing platforms, sequencing strategies, and variant calling for clinical variant detection is still being developed. METHODS: Because ground truth for clinical specimens is rarely known, we used the well-characterized Coriell cell lines GM12878 and GM12877 to generate data. We prepared samples to mimic as closely as possible clinical biopsies, including formalin fixation and paraffin embedding. We evaluated two well-known targeted sequencing panels, Illumina's TruSight 170 hybrid-capture panel and the amplification-based Oncomine Focus panel. Sequencing was performed on an Illumina NextSeq500 and an Ion Torrent PGM respectively. We performed multiple replicates of each assay, to test reproducibility. Finally, we applied four different freely-available somatic single-nucleotide variant (SNV) callers to the data, along with the vendor-recommended callers for each sequencing platform. RESULTS: We did not observe major differences in variant calling success within the regions that each panel covers, but there were substantial differences between callers. All had high sensitivity for true SNVs, but numerous and non-overlapping false positives. Overriding certain default parameters to make them consistent between callers substantially reduced discrepancies, but still resulted in high false positive rates. Intersecting results from multiple replicates or from different variant callers eliminated most false positives, while maintaining sensitivity. CONCLUSIONS: Reproducibility and accuracy of targeted clinical sequencing results depend less on sequencing platform and panel than on variability between replicates and downstream bioinformatics. Differences in variant callers' default parameters are a greater influence on algorithm disagreement than other differences between the algorithms. Contrary to typical clinical practice, we recommend employing multiple variant calling pipelines and/or analyzing replicate samples, as this greatly decreases false positive calls.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Análise Mutacional de DNA/métodos , Mutação , Neoplasias/genética , Neoplasias/patologia , Polimorfismo de Nucleotídeo Único , Biologia Computacional , Formaldeído , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Inclusão em Parafina , Reprodutibilidade dos Testes , Células Tumorais Cultivadas
2.
Nucleic Acids Res ; 46(14): 7221-7235, 2018 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-30016497

RESUMO

Muscle-specific transcription factor MyoD orchestrates the myogenic gene expression program by binding to short DNA motifs called E-boxes within myogenic cis-regulatory elements (CREs). Genome-wide analyses of MyoD cistrome by chromatin immnunoprecipitation sequencing shows that MyoD-bound CREs contain multiple E-boxes of various sequences. However, how E-box numbers, sequences and their spatial arrangement within CREs collectively regulate the binding affinity and transcriptional activity of MyoD remain largely unknown. Here, by an integrative analysis of MyoD cistrome combined with genome-wide analysis of key regulatory histones and gene expression data we show that the affinity landscape of MyoD is driven by multiple E-boxes, and that the overall binding affinity-and associated nucleosome positioning and epigenetic features of the CREs-crucially depend on the variant sequences and positioning of the E-boxes within the CREs. By comparative genomic analysis of single nucleotide polymorphism (SNPs) across publicly available data from 17 strains of laboratory mice, we show that variant sequences within the MyoD-bound motifs, but not their genome-wide counterparts, are under selection. At last, we show that the quantitative regulatory effect of MyoD binding on the nearby genes can, in part, be predicted by the motif composition of the CREs to which it binds. Taken together, our data suggest that motif numbers, sequences and their spatial arrangement within the myogenic CREs are important determinants of the cis-regulatory code of myogenic CREs.


Assuntos
Elementos E-Box/genética , Desenvolvimento Muscular/genética , Proteína MyoD/genética , Proteína MyoD/metabolismo , Transcrição Gênica/genética , Ativação Transcricional/genética , Animais , Sequência de Bases/genética , Imunoprecipitação da Cromatina , Proteínas de Ligação a DNA/genética , Expressão Gênica/genética , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Camundongos , Desenvolvimento Muscular/fisiologia , Motivos de Nucleotídeos/genética , Polimorfismo de Nucleotídeo Único/genética , Regiões Promotoras Genéticas/genética
3.
Cell Discov ; 4: 21, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29736258

RESUMO

Polycomb repressive complex 2 (PRC2) accessory proteins play substoichiometric, tissue-specific roles to recruit PRC2 to specific genomic loci or increase enzymatic activity, while PRC2 core proteins are required for complex stability and global levels of trimethylation of histone 3 at lysine 27 (H3K27me3). Here, we demonstrate a role for the classical PRC2 accessory protein Mtf2/Pcl2 in the hematopoietic system that is more akin to that of a core PRC2 protein. Mtf2-/- erythroid progenitors demonstrate markedly decreased core PRC2 protein levels and a global loss of H3K27me3 at promoter-proximal regions. The resulting de-repression of transcriptional and signaling networks blocks definitive erythroid development, culminating in Mtf2-/- embryos dying by e15.5 due to severe anemia. Gene regulatory network (GRN) analysis demonstrated Mtf2 directly regulates Wnt signaling in erythroblasts, leading to activated canonical Wnt signaling in Mtf2-deficient erythroblasts, while chemical inhibition of canonical Wnt signaling rescued Mtf2-deficient erythroblast differentiation in vitro. Using a combination of in vitro, in vivo and systems analyses, we demonstrate that Mtf2 is a critical epigenetic regulator of Wnt signaling during erythropoiesis and recast the role of polycomb accessory proteins in a tissue-specific context.

4.
Artigo em Inglês | MEDLINE | ID: mdl-26388941

RESUMO

BACKGROUND: Unraveling transcriptional regulatory networks is a central problem in molecular biology and, in this quest, chromatin immunoprecipitation and sequencing (ChIP-seq) technology has given us the unprecedented ability to identify sites of protein-DNA binding and histone modification genome wide. However, multiple systemic and procedural biases hinder harnessing the full potential of this technology. Previous studies have addressed this problem, but a thorough characterization of different, interacting biases on ChIP-seq signals is still lacking. RESULTS: Here, we present a novel framework where the genome-wide ChIP-seq signal is viewed as being quantifiably influenced by different, measurable sources of bias, which can then be computationally subtracted away. We use a compendium of 123 human ENCODE ChIP-seq datasets to build regression models that tell us how much of a ChIP-seq signal can be attributed to mappability, GC-content, chromatin accessibility, and factors represented in input DNA and IgG controls. When we use the model to separate out these non-binding influences from the ChIP-seq signal, we obtain a purified signal that associates better to TF-DNA-binding motifs than do other measures of peak significance. We also carry out a multiscale analysis that reveals how ChIP-seq signal biases differ across different scales. Finally, we investigate previously reported associations between gene expression and ChIP-seq signals at transcription start sites. We show that our model can be used to discriminate ChIP-seq signals that are truly related to gene expression from those that are merely correlated by virtue of bias-in particular, chromatin accessibility bias, which shows up in ChIP-seq signals and also relates to gene expression. CONCLUSIONS: Our study provides new insights into the behavior of ChIP-seq signal biases and proposes a novel mitigation framework that improves results compared to existing techniques. With ChIP-seq now being the central technology for studying transcriptional regulation, it is most crucial to accurately characterize, quantify, and adjust for the genome-wide effects of biases affecting ChIP-seq. Our study also emphasizes that properly accounting for confounders in ChIP-seq data is of paramount importance for obtaining biologically accurate insights into the workings of the complex regulatory mechanisms in living organisms. R and MATLAB packages implementing the framework can be obtained from http://www.perkinslab.ca/Software.html.

5.
PLoS One ; 8(11): e79894, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24278209

RESUMO

Alpha-solenoids are flexible protein structural domains formed by ensembles of alpha-helical repeats (Armadillo and HEAT repeats among others). While homology can be used to detect many of these repeats, some alpha-solenoids have very little sequence homology to proteins of known structure and we expect that many remain undetected. We previously developed a method for detection of alpha-helical repeats based on a neural network trained on a dataset of protein structures. Here we improved the detection algorithm and updated the training dataset using recently solved structures of alpha-solenoids. Unexpectedly, we identified occurrences of alpha-solenoids in solved protein structures that escaped attention, for example within the core of the catalytic subunit of PI3KC. Our results expand the current set of known alpha-solenoids. Application of our tool to the protein universe allowed us to detect their significant enrichment in proteins interacting with many proteins, confirming that alpha-solenoids are generally involved in protein-protein interactions. We then studied the taxonomic distribution of alpha-solenoids to discuss an evolutionary scenario for the emergence of this type of domain, speculating that alpha-solenoids have emerged in multiple taxa in independent events by convergent evolution. We observe a higher rate of alpha-solenoids in eukaryotic genomes and in some prokaryotic families, such as Cyanobacteria and Planctomycetes, which could be associated to increased cellular complexity. The method is available at http://cbdm.mdc-berlin.de/~ard2/.


Assuntos
Genômica , Proteínas/fisiologia , Conformação Proteica , Proteínas/química , Proteínas/genética
6.
Nat Protoc ; 8(8): 1525-34, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23845964

RESUMO

Chromatin immunoprecipitation coupled with ultra-high-throughput sequencing (ChIP-seq) is a widely used method for mapping the interactions of proteins with DNA. However, the requirements for ChIP-grade antibodies impede wider application of this method, and variations in results can be high owing to differences in affinity and cross-reactivity of antibodies. Therefore, we developed chromatin tandem affinity purification (ChTAP) as an effective alternative to ChIP. Through the use of affinity tags and reagents that are identical for all proteins investigated, ChTAP enables one to directly compare the binding between different transcription factors and to directly assess the background in control experiments. Thus, ChTAP-seq can be used to rapidly map the genome-wide binding of multiple DNA-binding proteins in a wide range of cell types. ChTAP can be completed in 3-4 d, starting from cross-linking of chromatin to purification of ChIP DNA.


Assuntos
Cromatina/metabolismo , Cromatografia de Afinidade/métodos , Proteínas de Ligação a DNA/metabolismo , Sítios de Ligação , Imunoprecipitação da Cromatina , Células HEK293 , Humanos , Sonicação
7.
Bioinformatics ; 29(4): 444-50, 2013 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-23300135

RESUMO

MOTIVATION: Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of its impact on the accuracy of the enriched regions identified by peak-calling algorithms. Although many peak-calling algorithms include a fragment-length estimation subroutine, the problem has not been adequately solved, as demonstrated by the variability of the estimates returned by different algorithms. RESULTS: In this article, we investigate the use of strand cross-correlation to estimate mean fragment length of single-end data and show that traditional estimation approaches have mixed reliability. We observe that the mappability of different parts of the genome can introduce an artificial bias into cross-correlation computations, resulting in incorrect fragment-length estimates. We propose a new approach, called mappability-sensitive cross-correlation (MaSC), which removes this bias and allows for accurate and reliable fragment-length estimation. We analyze the computational complexity of this approach, and evaluate its performance on a test suite of NGS datasets, demonstrating its superiority to traditional cross-correlation analysis. AVAILABILITY: An open-source Perl implementation of our approach is available at http://www.perkinslab.ca/Software.html.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mapeamento Cromossômico , Interpretação Estatística de Dados , Genômica , Humanos , Reprodutibilidade dos Testes
8.
BioData Min ; 5(1): 14, 2012 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-22958760

RESUMO

BACKGROUND: Reviewer and editor selection for peer review is getting harder for authors and publishers due to the specialization onto narrower areas of research carried by the progressive growth of the body of knowledge. Examination of the literature facilitates finding appropriate reviewers but is time consuming and complicated by author name ambiguities. RESULTS: We have developed a method called peer2ref to support authors and editors in selecting suitable reviewers for scientific manuscripts. Peer2ref works from a text input, usually the abstract of the manuscript, from which important concepts are extracted as keywords using a fuzzy binary relations approach. The keywords are searched on indexed profiles of words constructed from the bibliography attributed to authors in MEDLINE. The names of these scientists have been previously disambiguated by coauthors identified across the whole MEDLINE. The methods have been implemented in a web server that automatically suggests experts for peer-review among scientists that have authored manuscripts published during the last decade in more than 3,800 journals indexed in MEDLINE. CONCLUSION: peer2ref web server is publicly available at http://www.ogic.ca/projects/peer2ref/.

9.
Dev Cell ; 22(6): 1208-20, 2012 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-22609161

RESUMO

Pax3 and Pax7 regulate stem cell function in skeletal myogenesis. However, molecular insight into their distinct roles has remained elusive. Using gene expression data combined with genome-wide binding-site analysis, we show that both Pax3 and Pax7 bind identical DNA motifs and jointly activate a large panel of genes involved in muscle stem cell function. Surprisingly, in adult myoblasts Pax3 binds a subset (6.4%) of Pax7 targets. Despite a significant overlap in their transcriptional network, Pax7 regulates distinct panels of genes involved in the promotion of proliferation and inhibition of myogenic differentiation. We show that Pax7 has a higher binding affinity to the homeodomain-binding motif relative to Pax3, suggesting that intrinsic differences in DNA binding contribute to the observed functional difference between Pax3 and Pax7 binding in myogenesis. Together, our data demonstrate distinct attributes of Pax7 function and provide mechanistic insight into the nonredundancy of Pax3 and Pax7 in muscle development.


Assuntos
Motivos de Aminoácidos/fisiologia , Proteínas de Homeodomínio/metabolismo , Desenvolvimento Muscular/fisiologia , Músculo Esquelético/metabolismo , Fator de Transcrição PAX7/metabolismo , Transcrição Gênica , Animais , Diferenciação Celular , Proliferação de Células , Perfilação da Expressão Gênica , Camundongos , Fator de Transcrição PAX3 , Fatores de Transcrição Box Pareados/metabolismo
10.
PLoS One ; 5(10): e13431, 2010 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-21048949

RESUMO

BACKGROUND: In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns. PRINCIPAL FINDINGS: In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively. CONCLUSIONS: The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.


Assuntos
Códon , Modelos Teóricos , Mutação , Composição de Bases , Citosina/química , Guanina/química , Cadeias de Markov
11.
J Biomed Discov Collab ; 5: 1-6, 2010 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-20333611

RESUMO

The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors' MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. MLTrends may be used at: http://www.ogic.ca/mltrends.

12.
Mech Ageing Dev ; 131(1): 9-20, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19913570

RESUMO

Skeletal muscle ageing is characterized by faulty degenerative/regenerative processes that promote the decline of its mass, strength, and endurance. In this study, we used a transcriptional profiling method to better understand the molecular pathways and factors that contribute to these processes. To more appropriately contrast the differences in regenerative capacity of old muscle, we compared it with young muscle, where robust growth and efficient myogenic differentiation is ongoing. Notably, in old mice, we found a severe deficit in satellite cells activation. We performed expression analyses on RNA from the gastrocnemius muscle of young (3-week-old) and old (24-month-old) mice. The differential expression highlighted genes that are involved in the efficient functioning of satellite cells. Indeed, the greatest number of up-regulated genes in young mice encoded components of the extracellular matrix required for the maintenance of the satellite cell niche. Moreover, other genes included Wnt inhibitors (Wif1 and Sfrp2) and Notch activator (Dner), which are putatively involved in the interconnected signalling networks that control satellite cell function. The widespread expression differences for inhibitors of TGFbeta signalling further emphasize the shortcomings in satellite cell performance. Therefore, we draw attention to the breakdown of features required to maintain satellite cell integrity during the ageing process.


Assuntos
Envelhecimento/genética , Senescência Celular/genética , Perfilação da Expressão Gênica/métodos , Desenvolvimento Muscular/genética , Músculo Esquelético/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Células Satélites de Músculo Esquelético/metabolismo , Fatores Etários , Animais , Células Cultivadas , Regulação da Expressão Gênica , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Músculo Esquelético/crescimento & desenvolvimento , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Transdução de Sinais/genética
13.
PLoS Comput Biol ; 5(3): e1000304, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19282972

RESUMO

A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor. Here, we show that alpha-rod repeats can be detected using a neural network. The network detects more repeats than are identified by domain databases using multiple profiles, with a low level of false positives (<10%). We identify alpha-rod repeats in approximately 0.4% of proteins in eukaryotic genomes. We then investigate the results for all human proteins, identifying alpha-rod repeats for the first time in six protein families, including proteins STAG1-3, SERAC1, and PSMD1-2 & 5. We also characterize a short version of these repeats in eight protein families of Archaeal, Bacterial, and Fungal species. Finally, we demonstrate the utility of these predictions in directing experimental work to demarcate three alpha-rods in huntingtin, a protein mutated in Huntington's disease. Using yeast two hybrid analysis and an immunoprecipitation technique, we show that the huntingtin fragments containing alpha-rods associate with each other. This is the first definition of domains in huntingtin and the first validation of predicted interactions between fragments of huntingtin, which sets up directions toward functional characterization of this protein. An implementation of the repeat detection algorithm is available as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized using BiasViz, a graphic tool for representation of multiple sequence alignments.


Assuntos
Modelos Químicos , Modelos Moleculares , Proteínas do Tecido Nervoso/análise , Proteínas do Tecido Nervoso/química , Redes Neurais de Computação , Proteínas Nucleares/análise , Proteínas Nucleares/química , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Proteína Huntingtina , Dados de Sequência Molecular , Ligação Proteica , Sequências Repetitivas de Aminoácidos
14.
BMC Res Notes ; 2: 39, 2009 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-19284540

RESUMO

BACKGROUND: Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. FINDINGS: Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments. CONCLUSION: StemBase can be used to study gene expression in human and murine stem cells and is available at http://www.stembase.ca.

15.
Methods Mol Biol ; 407: 137-48, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18453254

RESUMO

StemBase is a database of gene expression data obtained from stem cells and derivatives mainly from mouse and human using DNA microarrays and Serial Analysis of Gene Expression. Here, we describe this database and indicate ways to use it for the study the expression of particular genes in stem cells or to search for genes with particular expression profiles in stem cells, which could be associated to stem cell function or used as stem cell markers.


Assuntos
Biomarcadores/análise , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Células-Tronco/fisiologia , Animais , Humanos , Camundongos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA