Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 75
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36794911

ABSTRACT

SUMMARY: The BioPlex project has created two proteome scale, cell-line-specific protein-protein interaction (PPI) networks: the first in 293T cells, including 120k interactions among 15k proteins; and the second in HCT116 cells, including 70k interactions between 10k proteins. Here, we describe programmatic access to the BioPlex PPI networks and integration with related resources from within R and Python. Besides PPI networks for 293T and HCT116 cells, this includes access to CORUM protein complex data, PFAM protein domain data, PDB protein structures, and transcriptome and proteome data for the two cell lines. The implemented functionality serves as a basis for integrative downstream analysis of BioPlex PPI data with domain-specific R and Python packages, including efficient execution of maximum scoring sub-network analysis, protein domain-domain association analysis, mapping of PPIs onto 3D protein structures and analysis of BioPlex PPIs at the interface of transcriptomic and proteomic data. AVAILABILITY AND IMPLEMENTATION: The BioPlex R package is available from Bioconductor (bioconductor.org/packages/BioPlex), and the BioPlex Python package is available from PyPI (pypi.org/project/bioplexpy). Applications and downstream analyses are available from GitHub (github.com/ccb-hms/BioPlexAnalysis).


Subject(s)
Proteome , Software , Humans , Proteomics , Protein Interaction Maps , Transcriptome
2.
Public Health Nutr ; 24(10): 2952-2963, 2021 07.
Article in English | MEDLINE | ID: mdl-32597744

ABSTRACT

OBJECTIVE: To characterise dietary habits, their temporal and spatial patterns and associations with BMI in the 23andMe study population. DESIGN: We present a large-scale cross-sectional analysis of self-reported dietary intake data derived from the web-based National Health and Nutrition Examination Survey 2009-2010 dietary screener. Survey-weighted estimates for each food item were characterised by age, sex, race/ethnicity, education and BMI. Temporal patterns were plotted over a 2-year time period, and average consumption for select food items was mapped by state. Finally, dietary intake variables were tested for association with BMI. SETTING: US-based adults 20-85 years of age participating in the 23andMe research programme. PARTICIPANTS: Participants were 23andMe customers who consented to participate in research (n 526 774) and completed web-based surveys on demographic and dietary habits. RESULTS: Survey-weighted estimates show very few participants met federal recommendations for fruit: 2·6 %, vegetables: 5·9 % and dairy intake: 2·8 %. Between 2017 and 2019, fruit, vegetables and milk intake frequency declined, while total dairy remained stable and added sugars increased. Seasonal patterns in reporting were most pronounced for ice cream, chocolate, fruits and vegetables. Dietary habits varied across the USA, with higher intake of sugar and energy dense foods characterising areas with higher average BMI. In multivariate-adjusted models, BMI was directly associated with the intake of processed meat, red meat, dairy and inversely associated with consumption of fruit, vegetables and whole grains. CONCLUSIONS: 23andMe research participants have created an opportunity for rapid, large-scale, real-time nutritional data collection, informing demographic, seasonal and spatial patterns with broad geographical coverage across the USA.


Subject(s)
Diet , Vegetables , Adult , Cross-Sectional Studies , Demography , Eating , Energy Intake , Feeding Behavior , Fruit , Humans , Nutrition Surveys
3.
Nat Methods ; 12(2): 115-21, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25633503

ABSTRACT

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.


Subject(s)
Computational Biology , Gene Expression Profiling , Genomics/methods , High-Throughput Screening Assays/methods , Software , Programming Languages , User-Computer Interface
4.
Bioinformatics ; 33(20): 3311-3313, 2017 Oct 15.
Article in English | MEDLINE | ID: mdl-29028267

ABSTRACT

MOTIVATION: Variant calling is the complex task of separating real polymorphisms from errors. The appropriate strategy will depend on characteristics of the sample, the sequencing methodology and on the questions of interest. RESULTS: We present VariantTools, an extensible framework for developing and testing variant callers. There are facilities for reproducibly tallying, filtering, flagging and annotating variants. The tools are extensible, modular and flexible, so that they are tunable to particular use cases, and they interoperate with existing analysis software so that they can be embedded in established work flows. AVAILABILITY AND IMPLEMENTATION: VariantTools is available from http://www.bioconductor.org/. CONTACT: michafla@gene.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genotyping Techniques/methods , Polymorphism, Genetic , Sequence Analysis, DNA/methods , Software , Genomics/methods
5.
Nature ; 488(7413): 660-4, 2012 Aug 30.
Article in English | MEDLINE | ID: mdl-22895193

ABSTRACT

Identifying and understanding changes in cancer genomes is essential for the development of targeted therapeutics. Here we analyse systematically more than 70 pairs of primary human colon tumours by applying next-generation sequencing to characterize their exomes, transcriptomes and copy-number alterations. We have identified 36,303 protein-altering somatic changes that include several new recurrent mutations in the Wnt pathway gene TCF7L2, chromatin-remodelling genes such as TET2 and TET3 and receptor tyrosine kinases including ERBB3. Our analysis for significantly mutated cancer genes identified 23 candidates, including the cell cycle checkpoint kinase ATM. Copy-number and RNA-seq data analysis identified amplifications and corresponding overexpression of IGF2 in a subset of colon tumours. Furthermore, using RNA-seq data we identified multiple fusion transcripts including recurrent gene fusions involving R-spondin family members RSPO2 and RSPO3 that together occur in 10% of colon tumours. The RSPO fusions were mutually exclusive with APC mutations, indicating that they probably have a role in the activation of Wnt signalling and tumorigenesis. Consistent with this we show that the RSPO fusion proteins were capable of potentiating Wnt signalling. The R-spondin gene fusions and several other gene mutations identified in this study provide new potential opportunities for therapeutic intervention in colon cancer.


Subject(s)
Colonic Neoplasms/genetics , Gene Fusion/genetics , Genes, Neoplasm/genetics , Intercellular Signaling Peptides and Proteins/genetics , Thrombospondins/genetics , Ataxia Telangiectasia Mutated Proteins , Base Sequence , Cell Cycle Proteins/genetics , Colonic Neoplasms/metabolism , Colonic Neoplasms/pathology , DNA Copy Number Variations/genetics , DNA-Binding Proteins/genetics , Dioxygenases/genetics , Exome/genetics , Gene Expression Profiling , Gene Expression Regulation, Neoplastic/genetics , Genes, APC , Humans , Insulin-Like Growth Factor II/genetics , Molecular Sequence Data , Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Serine-Threonine Kinases/genetics , Proto-Oncogene Proteins/genetics , Receptor, ErbB-3/genetics , Sequence Analysis, RNA , Signal Transduction/genetics , Transcription Factor 7-Like 2 Protein/genetics , Tumor Suppressor Proteins/genetics , Wnt Proteins/metabolism
6.
BMC Genomics ; 17: 61, 2016 Jan 15.
Article in English | MEDLINE | ID: mdl-26768488

ABSTRACT

BACKGROUND: RNA-editing is a tightly regulated, and essential cellular process for a properly functioning brain. Dysfunction of A-to-I RNA editing can have catastrophic effects, particularly in the central nervous system. Thus, understanding how the process of RNA-editing is regulated has important implications for human health. However, at present, very little is known about the regulation of editing across tissues, and individuals. RESULTS: Here we present an analysis of RNA-editing patterns from 9 different tissues harvested from a single mouse. For comparison, we also analyzed data for 5 of these tissues harvested from 15 additional animals. We find that tissue specificity of editing largely reflects differential expression of substrate transcripts across tissues. We identified a surprising enrichment of editing in intronic regions of brain transcripts, that could account for previously reported higher levels of editing in brain. There exists a small but remarkable amount of editing which is tissue-specific, despite comparable expression levels of the edit site across multiple tissues. Expression levels of editing enzymes and their isoforms can explain some, but not all of this variation. CONCLUSIONS: Together, these data suggest a complex regulation of the RNA-editing process beyond transcript expression levels.


Subject(s)
Adenosine Deaminase/genetics , Organ Specificity/genetics , RNA Editing/genetics , RNA-Binding Proteins/genetics , Adenosine Deaminase/biosynthesis , Animals , Brain/growth & development , Brain/metabolism , Gene Expression Regulation , Humans , Introns/genetics , Mice , Protein Isoforms/genetics , RNA-Binding Proteins/biosynthesis , Transcription, Genetic
7.
Nature ; 465(7297): 473-7, 2010 May 27.
Article in English | MEDLINE | ID: mdl-20505728

ABSTRACT

Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Genome, Human/genetics , Lung Neoplasms/genetics , Point Mutation/genetics , DNA Mutational Analysis , Humans , Male , Middle Aged , Models, Biological , Proto-Oncogene Mas , Selection, Genetic/genetics
8.
EMBO J ; 30(3): 494-509, 2011 Feb 02.
Article in English | MEDLINE | ID: mdl-21179004

ABSTRACT

TAL1/SCL is a master regulator of haematopoiesis whose expression promotes opposite outcomes depending on the cell type: differentiation in the erythroid lineage or oncogenesis in the T-cell lineage. Here, we used a combination of ChIP sequencing and gene expression profiling to compare the function of TAL1 in normal erythroid and leukaemic T cells. Analysis of the genome-wide binding properties of TAL1 in these two haematopoietic lineages revealed new insight into the mechanism by which transcription factors select their binding sites in alternate lineages. Our study shows limited overlap in the TAL1-binding profile between the two cell types with an unexpected preference for ETS and RUNX motifs adjacent to E-boxes in the T-cell lineage. Furthermore, we show that TAL1 interacts with RUNX1 and ETS1, and that these transcription factors are critically required for TAL1 binding to genes that modulate T-cell differentiation. Thus, our findings highlight a critical role of the cellular environment in modulating transcription factor binding, and provide insight into the mechanism by which TAL1 inhibits differentiation leading to oncogenesis in the T-cell lineage.


Subject(s)
Basic Helix-Loop-Helix Transcription Factors/genetics , Cell Differentiation/genetics , Cell Transformation, Neoplastic/genetics , Hematopoiesis/genetics , Leukemia, T-Cell/metabolism , Proto-Oncogene Proteins/genetics , T-Lymphocytes/metabolism , Base Sequence , Basic Helix-Loop-Helix Transcription Factors/metabolism , Binding Sites/genetics , Cells, Cultured , Chromatin Immunoprecipitation , Core Binding Factor Alpha 2 Subunit/genetics , Core Binding Factor Alpha 2 Subunit/metabolism , Gene Expression Profiling , Hematopoiesis/physiology , Humans , Jurkat Cells , Leukemia, T-Cell/genetics , Microarray Analysis , Molecular Sequence Data , Proto-Oncogene Protein c-ets-1/genetics , Proto-Oncogene Protein c-ets-1/metabolism , Proto-Oncogene Proteins/metabolism , Reverse Transcriptase Polymerase Chain Reaction , Sequence Analysis, DNA , T-Cell Acute Lymphocytic Leukemia Protein 1 , T-Lymphocytes/cytology
9.
Genome Res ; 22(4): 593-601, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22267523

ABSTRACT

Hepatitis B virus (HBV) infection is a leading risk factor for hepatocellular carcinoma (HCC). HBV integration into the host genome has been reported, but its scale, impact and contribution to HCC development is not clear. Here, we sequenced the tumor and nontumor genomes (>80× coverage) and transcriptomes of four HCC patients and identified 255 HBV integration sites. Increased sequencing to 240× coverage revealed a proportionally higher number of integration sites. Clonal expansion of HBV-integrated hepatocytes was found specifically in tumor samples. We observe a diverse collection of genomic perturbations near viral integration sites, including direct gene disruption, viral promoter-driven human transcription, viral-human transcript fusion, and DNA copy number alteration. Thus, we report the most comprehensive characterization of HBV integration in hepatocellular carcinoma patients. Such widespread random viral integration will likely increase carcinogenic opportunities in HBV-infected individuals.


Subject(s)
Carcinoma, Hepatocellular/genetics , Genome, Human/genetics , Hepatitis B virus/genetics , Hepatitis B/genetics , Liver Neoplasms/genetics , Virus Integration/genetics , Base Sequence , Binding Sites/genetics , Carcinoma, Hepatocellular/virology , Female , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Hepatitis B/virology , Hepatitis B virus/physiology , Host-Pathogen Interactions/genetics , Humans , Liver Neoplasms/virology , Male , Molecular Sequence Data , Mutation , Oligonucleotide Array Sequence Analysis , Sequence Analysis, DNA/methods , Transcriptome/genetics
10.
Genome Res ; 22(12): 2315-27, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23033341

ABSTRACT

Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.


Subject(s)
Alternative Splicing , Gene Expression Regulation, Neoplastic , Genome, Human/genetics , Lung Neoplasms/genetics , Mutation , Transcriptome , ATPases Associated with Diverse Cellular Activities , Adenosine Triphosphatases/genetics , Adenosine Triphosphatases/metabolism , Cell Line, Tumor , DNA Copy Number Variations , DNA Helicases/genetics , DNA Helicases/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Epigenomics , Exons , Genetic Markers , Heterozygote , Histone Demethylases/genetics , Histone Demethylases/metabolism , Histone-Lysine N-Methyltransferase , Humans , Karyotyping/methods , Lung Neoplasms/pathology , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Polymorphism, Single Nucleotide , Reproducibility of Results , Sequence Analysis, RNA , Transcription Factors/genetics , Transcription Factors/metabolism , Up-Regulation , rac1 GTP-Binding Protein/genetics , rac1 GTP-Binding Protein/metabolism
11.
Bioinformatics ; 30(1): 127-8, 2014 Jan 01.
Article in English | MEDLINE | ID: mdl-24132929

ABSTRACT

UNLABELLED: Connections between disease phenotypes and drug effects can be made by identifying commonalities in the associated patterns of differential gene expression. Searchable databases that record the impacts of chemical or genetic perturbations on the transcriptome--here referred to as 'connectivity maps'--permit discovery of such commonalities. We describe two R packages, gCMAP and gCMAPWeb, which provide a complete framework to construct and query connectivity maps assembled from user-defined collections of differential gene expression data. Microarray or RNAseq data are processed in a standardized way, and results can be interrogated using various well-established gene set enrichment methods. The packages also feature an easy-to-deploy web application that facilitates reproducible research through automatic generation of graphical and tabular reports. AVAILABILITY AND IMPLEMENTATION: The gCMAP and gCMAPWeb R packages are freely available for UNIX, Windows and Mac OS X operating systems at Bioconductor (http://www.bioconductor.org).


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , User-Computer Interface , Animals , Cell Line , Gene Expression Profiling/methods , Humans , Internet
12.
Bioinformatics ; 30(6): 775-83, 2014 Mar 15.
Article in English | MEDLINE | ID: mdl-24162561

ABSTRACT

MOTIVATION: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY: The motifRG package is publically available via the bioconductor repository. CONTACT: yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromatin Immunoprecipitation/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Base Sequence , DNA/genetics , Humans , Transcription Factors/genetics
14.
PLoS Comput Biol ; 9(8): e1003118, 2013.
Article in English | MEDLINE | ID: mdl-23950696

ABSTRACT

We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.


Subject(s)
Databases, Genetic , Genomics/methods , Software , Algorithms , Animals , Genomics/standards , Humans , Mice , Sequence Alignment , Sequence Analysis, DNA
15.
Nucleic Acids Res ; 40(2): 499-510, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21917857

ABSTRACT

Although microRNAs (miRNAs) are important regulators of gene expression, the transcriptional regulation of miRNAs themselves is not well understood. We employed an integrative computational pipeline to dissect the transcription factors (TFs) responsible for altered miRNA expression in ovarian carcinoma. Using experimental data and computational predictions to define miRNA promoters across the human genome, we identified TFs with binding sites significantly overrepresented among miRNA genes overexpressed in ovarian carcinoma. This pipeline nominated TFs of the p53/p63/p73 family as candidate drivers of miRNA overexpression. Analysis of data from an independent set of 253 ovarian carcinomas in The Cancer Genome Atlas showed that p73 and p63 expression is significantly correlated with expression of miRNAs whose promoters contain p53/p63/p73 family binding sites. In experimental validation of specific miRNAs predicted by the analysis to be regulated by p73 and p63, we found that p53/p63/p73 family binding sites modulate promoter activity of miRNAs of the miR-200 family, which are known regulators of cancer stem cells and epithelial-mesenchymal transitions. Furthermore, in chromatin immunoprecipitation studies both p73 and p63 directly associated with the miR-200b/a/429 promoter. This study delineates an integrative approach that can be applied to discover transcriptional regulatory mechanisms in other biological settings where analogous genomic data are available.


Subject(s)
DNA-Binding Proteins/metabolism , Genomics/methods , MicroRNAs/genetics , Nuclear Proteins/metabolism , Transcription Factors/metabolism , Tumor Suppressor Proteins/metabolism , Binding Sites , Carcinoma/genetics , Carcinoma/metabolism , Cell Line, Tumor , Female , Genome, Human , Humans , MicroRNAs/biosynthesis , Molecular Sequence Annotation , Ovarian Neoplasms/genetics , Ovarian Neoplasms/metabolism , Promoter Regions, Genetic , Transcription Initiation Site , Transcriptional Activation , Tumor Protein p73
16.
Database (Oxford) ; 20242024 Apr 15.
Article in English | MEDLINE | ID: mdl-38625809

ABSTRACT

The National Health and Nutrition Examination Survey provides comprehensive data on demographics, sociology, health and nutrition. Conducted in 2-year cycles since 1999, most of its data are publicly accessible, making it pivotal for research areas like studying social determinants of health or tracking trends in health metrics such as obesity or diabetes. Assembling the data and analyzing it presents a number of technical and analytic challenges. This paper introduces the nhanesA R package, which is designed to assist researchers in data retrieval and analysis and to enable the sharing and extension of prior research efforts. We believe that fostering community-driven activity in data reproducibility and sharing of analytic methods will greatly benefit the scientific community and propel scientific advancements. Database URL: https://github.com/cjendres1/nhanes.


Subject(s)
Information Storage and Retrieval , Nutrition Surveys , Reproducibility of Results , Databases, Factual
17.
Stat Appl Genet Mol Biol ; 11(2)2012 Jan 06.
Article in English | MEDLINE | ID: mdl-22499690

ABSTRACT

The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims surrounding cmap intimated that such tools might lead to new, previously unanticipated applications of existing drugs. However, further application suggests that its primary utility is in connecting a disease condition whose biology is largely unknown to a drug whose mechanisms of action are well understood, making cmap a tool for enhancing biological knowledge.The success of the Connectivity Map is belied by its simplicity. The aforementioned signature serves as an unordered query which is applied to a customized database of (differential) gene-expression experiments designed to elicit response to a wide range of drugs, across of spectrum of concentrations, durations, and cell lines. Such application is effected by computing a per experiment score that measures "closeness" between the signature and the experiment. Top-scoring experiments, and the attendant drug(s), are then deemed relevant to the disease underlying the query. Inference supporting such elicitations is pursued via re-sampling. In this paper, we revisit two key aspects of the Connectivity Map implementation. Firstly, we develop new approaches to measuring closeness for the common scenario wherein the query constitutes an ordered list. These involve using metrics proposed for analyzing partially ranked data, these being of interest in their own right and not widely used. Secondly, we advance an alternate inferential approach based on generating empirical null distributions that exploit the scope, and capture dependencies, embodied by the database. Using these refinements we undertake a comprehensive re-evaluation of Connectivity Map findings that, in general terms, reveal that accommodating ordered queries is less critical than the mode of inference.


Subject(s)
Data Mining/methods , Databases, Genetic , Gene Expression Profiling , Algorithms , Computational Biology/methods , Estrogens/pharmacology , Gene Expression/drug effects , Genetic Predisposition to Disease , Genomics/methods , Histone Deacetylase Inhibitors/pharmacology , Humans , Limonins/pharmacology
18.
Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21097465

ABSTRACT

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.


Subject(s)
Databases, Factual/standards , Information Dissemination
19.
Proc Natl Acad Sci U S A ; 107(21): 9546-51, 2010 May 25.
Article in English | MEDLINE | ID: mdl-20460310

ABSTRACT

With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering-using filter/test pairs that are independent under the null hypothesis but correlated under the alternative-is a general approach that can substantially increase the efficiency of experiments.


Subject(s)
Biometry/methods , Algorithms , Computational Biology , Models, Genetic
20.
Proc Natl Acad Sci U S A ; 105(30): 10513-8, 2008 Jul 29.
Article in English | MEDLINE | ID: mdl-18663219

ABSTRACT

Improved approaches for the detection of common epithelial malignancies are urgently needed to reduce the worldwide morbidity and mortality caused by cancer. MicroRNAs (miRNAs) are small ( approximately 22 nt) regulatory RNAs that are frequently dysregulated in cancer and have shown promise as tissue-based markers for cancer classification and prognostication. We show here that miRNAs are present in human plasma in a remarkably stable form that is protected from endogenous RNase activity. miRNAs originating from human prostate cancer xenografts enter the circulation, are readily measured in plasma, and can robustly distinguish xenografted mice from controls. This concept extends to cancer in humans, where serum levels of miR-141 (a miRNA expressed in prostate cancer) can distinguish patients with prostate cancer from healthy controls. Our results establish the measurement of tumor-derived miRNAs in serum or plasma as an important approach for the blood-based detection of human cancer.


Subject(s)
Biomarkers, Tumor/genetics , Gene Expression Regulation, Neoplastic , MicroRNAs/blood , MicroRNAs/genetics , Animals , Cloning, Molecular , Gene Expression Profiling , Humans , Male , Mice , Neoplasm Transplantation , Neoplasms/metabolism , Prostatic Neoplasms/blood , Prostatic Neoplasms/genetics , RNA, Neoplasm/blood , RNA, Neoplasm/metabolism , Ribonucleases/metabolism , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL