Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36794911

RESUMO

SUMMARY: The BioPlex project has created two proteome scale, cell-line-specific protein-protein interaction (PPI) networks: the first in 293T cells, including 120k interactions among 15k proteins; and the second in HCT116 cells, including 70k interactions between 10k proteins. Here, we describe programmatic access to the BioPlex PPI networks and integration with related resources from within R and Python. Besides PPI networks for 293T and HCT116 cells, this includes access to CORUM protein complex data, PFAM protein domain data, PDB protein structures, and transcriptome and proteome data for the two cell lines. The implemented functionality serves as a basis for integrative downstream analysis of BioPlex PPI data with domain-specific R and Python packages, including efficient execution of maximum scoring sub-network analysis, protein domain-domain association analysis, mapping of PPIs onto 3D protein structures and analysis of BioPlex PPIs at the interface of transcriptomic and proteomic data. AVAILABILITY AND IMPLEMENTATION: The BioPlex R package is available from Bioconductor (bioconductor.org/packages/BioPlex), and the BioPlex Python package is available from PyPI (pypi.org/project/bioplexpy). Applications and downstream analyses are available from GitHub (github.com/ccb-hms/BioPlexAnalysis).


Assuntos
Proteoma , Software , Humanos , Proteômica , Mapas de Interação de Proteínas , Transcriptoma
2.
Public Health Nutr ; 24(10): 2952-2963, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-32597744

RESUMO

OBJECTIVE: To characterise dietary habits, their temporal and spatial patterns and associations with BMI in the 23andMe study population. DESIGN: We present a large-scale cross-sectional analysis of self-reported dietary intake data derived from the web-based National Health and Nutrition Examination Survey 2009-2010 dietary screener. Survey-weighted estimates for each food item were characterised by age, sex, race/ethnicity, education and BMI. Temporal patterns were plotted over a 2-year time period, and average consumption for select food items was mapped by state. Finally, dietary intake variables were tested for association with BMI. SETTING: US-based adults 20-85 years of age participating in the 23andMe research programme. PARTICIPANTS: Participants were 23andMe customers who consented to participate in research (n 526 774) and completed web-based surveys on demographic and dietary habits. RESULTS: Survey-weighted estimates show very few participants met federal recommendations for fruit: 2·6 %, vegetables: 5·9 % and dairy intake: 2·8 %. Between 2017 and 2019, fruit, vegetables and milk intake frequency declined, while total dairy remained stable and added sugars increased. Seasonal patterns in reporting were most pronounced for ice cream, chocolate, fruits and vegetables. Dietary habits varied across the USA, with higher intake of sugar and energy dense foods characterising areas with higher average BMI. In multivariate-adjusted models, BMI was directly associated with the intake of processed meat, red meat, dairy and inversely associated with consumption of fruit, vegetables and whole grains. CONCLUSIONS: 23andMe research participants have created an opportunity for rapid, large-scale, real-time nutritional data collection, informing demographic, seasonal and spatial patterns with broad geographical coverage across the USA.


Assuntos
Dieta , Verduras , Adulto , Estudos Transversais , Demografia , Ingestão de Alimentos , Ingestão de Energia , Comportamento Alimentar , Frutas , Humanos , Inquéritos Nutricionais
3.
Nat Methods ; 12(2): 115-21, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25633503

RESUMO

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.


Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Genômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Software , Linguagens de Programação , Interface Usuário-Computador
4.
Bioinformatics ; 33(20): 3311-3313, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-29028267

RESUMO

MOTIVATION: Variant calling is the complex task of separating real polymorphisms from errors. The appropriate strategy will depend on characteristics of the sample, the sequencing methodology and on the questions of interest. RESULTS: We present VariantTools, an extensible framework for developing and testing variant callers. There are facilities for reproducibly tallying, filtering, flagging and annotating variants. The tools are extensible, modular and flexible, so that they are tunable to particular use cases, and they interoperate with existing analysis software so that they can be embedded in established work flows. AVAILABILITY AND IMPLEMENTATION: VariantTools is available from http://www.bioconductor.org/. CONTACT: michafla@gene.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Técnicas de Genotipagem/métodos , Polimorfismo Genético , Análise de Sequência de DNA/métodos , Software , Genômica/métodos
5.
Nature ; 488(7413): 660-4, 2012 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-22895193

RESUMO

Identifying and understanding changes in cancer genomes is essential for the development of targeted therapeutics. Here we analyse systematically more than 70 pairs of primary human colon tumours by applying next-generation sequencing to characterize their exomes, transcriptomes and copy-number alterations. We have identified 36,303 protein-altering somatic changes that include several new recurrent mutations in the Wnt pathway gene TCF7L2, chromatin-remodelling genes such as TET2 and TET3 and receptor tyrosine kinases including ERBB3. Our analysis for significantly mutated cancer genes identified 23 candidates, including the cell cycle checkpoint kinase ATM. Copy-number and RNA-seq data analysis identified amplifications and corresponding overexpression of IGF2 in a subset of colon tumours. Furthermore, using RNA-seq data we identified multiple fusion transcripts including recurrent gene fusions involving R-spondin family members RSPO2 and RSPO3 that together occur in 10% of colon tumours. The RSPO fusions were mutually exclusive with APC mutations, indicating that they probably have a role in the activation of Wnt signalling and tumorigenesis. Consistent with this we show that the RSPO fusion proteins were capable of potentiating Wnt signalling. The R-spondin gene fusions and several other gene mutations identified in this study provide new potential opportunities for therapeutic intervention in colon cancer.


Assuntos
Neoplasias do Colo/genética , Fusão Gênica/genética , Genes Neoplásicos/genética , Peptídeos e Proteínas de Sinalização Intercelular/genética , Trombospondinas/genética , Proteínas Mutadas de Ataxia Telangiectasia , Sequência de Bases , Proteínas de Ciclo Celular/genética , Neoplasias do Colo/metabolismo , Neoplasias do Colo/patologia , Variações do Número de Cópias de DNA/genética , Proteínas de Ligação a DNA/genética , Dioxigenases/genética , Exoma/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/genética , Genes APC , Humanos , Fator de Crescimento Insulin-Like II/genética , Dados de Sequência Molecular , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Proto-Oncogênicas/genética , Receptor ErbB-3/genética , Análise de Sequência de RNA , Transdução de Sinais/genética , Proteína 2 Semelhante ao Fator 7 de Transcrição/genética , Proteínas Supressoras de Tumor/genética , Proteínas Wnt/metabolismo
6.
BMC Genomics ; 17: 61, 2016 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-26768488

RESUMO

BACKGROUND: RNA-editing is a tightly regulated, and essential cellular process for a properly functioning brain. Dysfunction of A-to-I RNA editing can have catastrophic effects, particularly in the central nervous system. Thus, understanding how the process of RNA-editing is regulated has important implications for human health. However, at present, very little is known about the regulation of editing across tissues, and individuals. RESULTS: Here we present an analysis of RNA-editing patterns from 9 different tissues harvested from a single mouse. For comparison, we also analyzed data for 5 of these tissues harvested from 15 additional animals. We find that tissue specificity of editing largely reflects differential expression of substrate transcripts across tissues. We identified a surprising enrichment of editing in intronic regions of brain transcripts, that could account for previously reported higher levels of editing in brain. There exists a small but remarkable amount of editing which is tissue-specific, despite comparable expression levels of the edit site across multiple tissues. Expression levels of editing enzymes and their isoforms can explain some, but not all of this variation. CONCLUSIONS: Together, these data suggest a complex regulation of the RNA-editing process beyond transcript expression levels.


Assuntos
Adenosina Desaminase/genética , Especificidade de Órgãos/genética , Edição de RNA/genética , Proteínas de Ligação a RNA/genética , Adenosina Desaminase/biossíntese , Animais , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Regulação da Expressão Gênica , Humanos , Íntrons/genética , Camundongos , Isoformas de Proteínas/genética , Proteínas de Ligação a RNA/biossíntese , Transcrição Gênica
7.
Nature ; 465(7297): 473-7, 2010 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-20505728

RESUMO

Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/genética , Genoma Humano/genética , Neoplasias Pulmonares/genética , Mutação Puntual/genética , Análise Mutacional de DNA , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Biológicos , Proto-Oncogene Mas , Seleção Genética/genética
8.
EMBO J ; 30(3): 494-509, 2011 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-21179004

RESUMO

TAL1/SCL is a master regulator of haematopoiesis whose expression promotes opposite outcomes depending on the cell type: differentiation in the erythroid lineage or oncogenesis in the T-cell lineage. Here, we used a combination of ChIP sequencing and gene expression profiling to compare the function of TAL1 in normal erythroid and leukaemic T cells. Analysis of the genome-wide binding properties of TAL1 in these two haematopoietic lineages revealed new insight into the mechanism by which transcription factors select their binding sites in alternate lineages. Our study shows limited overlap in the TAL1-binding profile between the two cell types with an unexpected preference for ETS and RUNX motifs adjacent to E-boxes in the T-cell lineage. Furthermore, we show that TAL1 interacts with RUNX1 and ETS1, and that these transcription factors are critically required for TAL1 binding to genes that modulate T-cell differentiation. Thus, our findings highlight a critical role of the cellular environment in modulating transcription factor binding, and provide insight into the mechanism by which TAL1 inhibits differentiation leading to oncogenesis in the T-cell lineage.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Diferenciação Celular/genética , Transformação Celular Neoplásica/genética , Hematopoese/genética , Leucemia de Células T/metabolismo , Proteínas Proto-Oncogênicas/genética , Linfócitos T/metabolismo , Sequência de Bases , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Sítios de Ligação/genética , Células Cultivadas , Imunoprecipitação da Cromatina , Subunidade alfa 2 de Fator de Ligação ao Core/genética , Subunidade alfa 2 de Fator de Ligação ao Core/metabolismo , Perfilação da Expressão Gênica , Hematopoese/fisiologia , Humanos , Células Jurkat , Leucemia de Células T/genética , Análise em Microsséries , Dados de Sequência Molecular , Proteína Proto-Oncogênica c-ets-1/genética , Proteína Proto-Oncogênica c-ets-1/metabolismo , Proteínas Proto-Oncogênicas/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA , Proteína 1 de Leucemia Linfocítica Aguda de Células T , Linfócitos T/citologia
9.
Genome Res ; 22(4): 593-601, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22267523

RESUMO

Hepatitis B virus (HBV) infection is a leading risk factor for hepatocellular carcinoma (HCC). HBV integration into the host genome has been reported, but its scale, impact and contribution to HCC development is not clear. Here, we sequenced the tumor and nontumor genomes (>80× coverage) and transcriptomes of four HCC patients and identified 255 HBV integration sites. Increased sequencing to 240× coverage revealed a proportionally higher number of integration sites. Clonal expansion of HBV-integrated hepatocytes was found specifically in tumor samples. We observe a diverse collection of genomic perturbations near viral integration sites, including direct gene disruption, viral promoter-driven human transcription, viral-human transcript fusion, and DNA copy number alteration. Thus, we report the most comprehensive characterization of HBV integration in hepatocellular carcinoma patients. Such widespread random viral integration will likely increase carcinogenic opportunities in HBV-infected individuals.


Assuntos
Carcinoma Hepatocelular/genética , Genoma Humano/genética , Vírus da Hepatite B/genética , Hepatite B/genética , Neoplasias Hepáticas/genética , Integração Viral/genética , Sequência de Bases , Sítios de Ligação/genética , Carcinoma Hepatocelular/virologia , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Hepatite B/virologia , Vírus da Hepatite B/fisiologia , Interações Hospedeiro-Patógeno/genética , Humanos , Neoplasias Hepáticas/virologia , Masculino , Dados de Sequência Molecular , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de DNA/métodos , Transcriptoma/genética
10.
Genome Res ; 22(12): 2315-27, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23033341

RESUMO

Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.


Assuntos
Processamento Alternativo , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Neoplasias Pulmonares/genética , Mutação , Transcriptoma , ATPases Associadas a Diversas Atividades Celulares , Adenosina Trifosfatases/genética , Adenosina Trifosfatases/metabolismo , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , DNA Helicases/genética , DNA Helicases/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Epigenômica , Éxons , Marcadores Genéticos , Heterozigoto , Histona Desmetilases/genética , Histona Desmetilases/metabolismo , Histona-Lisina N-Metiltransferase , Humanos , Cariotipagem/métodos , Neoplasias Pulmonares/patologia , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação para Cima , Proteínas rac1 de Ligação ao GTP/genética , Proteínas rac1 de Ligação ao GTP/metabolismo
11.
Bioinformatics ; 30(1): 127-8, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24132929

RESUMO

UNLABELLED: Connections between disease phenotypes and drug effects can be made by identifying commonalities in the associated patterns of differential gene expression. Searchable databases that record the impacts of chemical or genetic perturbations on the transcriptome--here referred to as 'connectivity maps'--permit discovery of such commonalities. We describe two R packages, gCMAP and gCMAPWeb, which provide a complete framework to construct and query connectivity maps assembled from user-defined collections of differential gene expression data. Microarray or RNAseq data are processed in a standardized way, and results can be interrogated using various well-established gene set enrichment methods. The packages also feature an easy-to-deploy web application that facilitates reproducible research through automatic generation of graphical and tabular reports. AVAILABILITY AND IMPLEMENTATION: The gCMAP and gCMAPWeb R packages are freely available for UNIX, Windows and Mac OS X operating systems at Bioconductor (http://www.bioconductor.org).


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Interface Usuário-Computador , Animais , Linhagem Celular , Perfilação da Expressão Gênica/métodos , Humanos , Internet
12.
Bioinformatics ; 30(6): 775-83, 2014 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-24162561

RESUMO

MOTIVATION: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY: The motifRG package is publically available via the bioconductor repository. CONTACT: yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , DNA/genética , Humanos , Fatores de Transcrição/genética
14.
PLoS Comput Biol ; 9(8): e1003118, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23950696

RESUMO

We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Software , Algoritmos , Animais , Genômica/normas , Humanos , Camundongos , Alinhamento de Sequência , Análise de Sequência de DNA
15.
Nucleic Acids Res ; 40(2): 499-510, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21917857

RESUMO

Although microRNAs (miRNAs) are important regulators of gene expression, the transcriptional regulation of miRNAs themselves is not well understood. We employed an integrative computational pipeline to dissect the transcription factors (TFs) responsible for altered miRNA expression in ovarian carcinoma. Using experimental data and computational predictions to define miRNA promoters across the human genome, we identified TFs with binding sites significantly overrepresented among miRNA genes overexpressed in ovarian carcinoma. This pipeline nominated TFs of the p53/p63/p73 family as candidate drivers of miRNA overexpression. Analysis of data from an independent set of 253 ovarian carcinomas in The Cancer Genome Atlas showed that p73 and p63 expression is significantly correlated with expression of miRNAs whose promoters contain p53/p63/p73 family binding sites. In experimental validation of specific miRNAs predicted by the analysis to be regulated by p73 and p63, we found that p53/p63/p73 family binding sites modulate promoter activity of miRNAs of the miR-200 family, which are known regulators of cancer stem cells and epithelial-mesenchymal transitions. Furthermore, in chromatin immunoprecipitation studies both p73 and p63 directly associated with the miR-200b/a/429 promoter. This study delineates an integrative approach that can be applied to discover transcriptional regulatory mechanisms in other biological settings where analogous genomic data are available.


Assuntos
Proteínas de Ligação a DNA/metabolismo , Genômica/métodos , MicroRNAs/genética , Proteínas Nucleares/metabolismo , Fatores de Transcrição/metabolismo , Proteínas Supressoras de Tumor/metabolismo , Sítios de Ligação , Carcinoma/genética , Carcinoma/metabolismo , Linhagem Celular Tumoral , Feminino , Genoma Humano , Humanos , MicroRNAs/biossíntese , Anotação de Sequência Molecular , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/metabolismo , Regiões Promotoras Genéticas , Sítio de Iniciação de Transcrição , Ativação Transcricional , Proteína Tumoral p73
16.
Database (Oxford) ; 20242024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38625809

RESUMO

The National Health and Nutrition Examination Survey provides comprehensive data on demographics, sociology, health and nutrition. Conducted in 2-year cycles since 1999, most of its data are publicly accessible, making it pivotal for research areas like studying social determinants of health or tracking trends in health metrics such as obesity or diabetes. Assembling the data and analyzing it presents a number of technical and analytic challenges. This paper introduces the nhanesA R package, which is designed to assist researchers in data retrieval and analysis and to enable the sharing and extension of prior research efforts. We believe that fostering community-driven activity in data reproducibility and sharing of analytic methods will greatly benefit the scientific community and propel scientific advancements. Database URL: https://github.com/cjendres1/nhanes.


Assuntos
Armazenamento e Recuperação da Informação , Inquéritos Nutricionais , Reprodutibilidade dos Testes , Bases de Dados Factuais
17.
Stat Appl Genet Mol Biol ; 11(2)2012 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-22499690

RESUMO

The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims surrounding cmap intimated that such tools might lead to new, previously unanticipated applications of existing drugs. However, further application suggests that its primary utility is in connecting a disease condition whose biology is largely unknown to a drug whose mechanisms of action are well understood, making cmap a tool for enhancing biological knowledge.The success of the Connectivity Map is belied by its simplicity. The aforementioned signature serves as an unordered query which is applied to a customized database of (differential) gene-expression experiments designed to elicit response to a wide range of drugs, across of spectrum of concentrations, durations, and cell lines. Such application is effected by computing a per experiment score that measures "closeness" between the signature and the experiment. Top-scoring experiments, and the attendant drug(s), are then deemed relevant to the disease underlying the query. Inference supporting such elicitations is pursued via re-sampling. In this paper, we revisit two key aspects of the Connectivity Map implementation. Firstly, we develop new approaches to measuring closeness for the common scenario wherein the query constitutes an ordered list. These involve using metrics proposed for analyzing partially ranked data, these being of interest in their own right and not widely used. Secondly, we advance an alternate inferential approach based on generating empirical null distributions that exploit the scope, and capture dependencies, embodied by the database. Using these refinements we undertake a comprehensive re-evaluation of Connectivity Map findings that, in general terms, reveal that accommodating ordered queries is less critical than the mode of inference.


Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Algoritmos , Biologia Computacional/métodos , Estrogênios/farmacologia , Expressão Gênica/efeitos dos fármacos , Predisposição Genética para Doença , Genômica/métodos , Inibidores de Histona Desacetilases/farmacologia , Humanos , Limoninas/farmacologia
18.
Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21097465

RESUMO

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.


Assuntos
Bases de Dados Factuais/normas , Disseminação de Informação
19.
Proc Natl Acad Sci U S A ; 107(21): 9546-51, 2010 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-20460310

RESUMO

With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering-using filter/test pairs that are independent under the null hypothesis but correlated under the alternative-is a general approach that can substantially increase the efficiency of experiments.


Assuntos
Biometria/métodos , Algoritmos , Biologia Computacional , Modelos Genéticos
20.
Proc Natl Acad Sci U S A ; 105(30): 10513-8, 2008 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-18663219

RESUMO

Improved approaches for the detection of common epithelial malignancies are urgently needed to reduce the worldwide morbidity and mortality caused by cancer. MicroRNAs (miRNAs) are small ( approximately 22 nt) regulatory RNAs that are frequently dysregulated in cancer and have shown promise as tissue-based markers for cancer classification and prognostication. We show here that miRNAs are present in human plasma in a remarkably stable form that is protected from endogenous RNase activity. miRNAs originating from human prostate cancer xenografts enter the circulation, are readily measured in plasma, and can robustly distinguish xenografted mice from controls. This concept extends to cancer in humans, where serum levels of miR-141 (a miRNA expressed in prostate cancer) can distinguish patients with prostate cancer from healthy controls. Our results establish the measurement of tumor-derived miRNAs in serum or plasma as an important approach for the blood-based detection of human cancer.


Assuntos
Biomarcadores Tumorais/genética , Regulação Neoplásica da Expressão Gênica , MicroRNAs/sangue , MicroRNAs/genética , Animais , Clonagem Molecular , Perfilação da Expressão Gênica , Humanos , Masculino , Camundongos , Transplante de Neoplasias , Neoplasias/metabolismo , Neoplasias da Próstata/sangue , Neoplasias da Próstata/genética , RNA Neoplásico/sangue , RNA Neoplásico/metabolismo , Ribonucleases/metabolismo , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA