Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Clin Epigenetics ; 12(1): 34, 2020 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-32075680

RESUMO

BACKGROUND: Obesity and diabetes mellitus are directly implicated in many adverse health consequences in adults as well as in the offspring of obese and diabetic mothers. Hispanic Americans are particularly at risk for obesity, diabetes, and end-stage renal disease. Maternal obesity and/or diabetes through prenatal programming may alter the fetal epigenome increasing the risk of metabolic disease in their offspring. The aims of this study were to determine if maternal obesity or diabetes mellitus during pregnancy results in a change in infant methylation of CpG islands adjacent to targeted genes specific for obesity or diabetes disease pathways in a largely Hispanic population. METHODS: Methylation levels in the cord blood of 69 newborns were determined using the Illumina Infinium MethylationEPIC BeadChip. Over 850,000 different probe sites were analyzed to determine whether maternal obesity and/or diabetes mellitus directly attributed to differential methylation; epigenome-wide and regional analyses were performed for significant CpG sites. RESULTS: Following quality control, agranular leukocyte samples from 69 newborns (23 normal term (NT), 14 diabetes (DM), 23 obese (OB), 9 DM/OB) were analyzed for over 850,000 different probe sites. Contrasts between the NT, DM, OB, and DM/OB were considered. After correction for multiple testing, 15 CpGs showed differential methylation from the NT, associated with 10 differentially methylated genes between the diabetic and non-diabetic subgroups, CCDC110, KALRN, PAG1, GNRH1, SLC2A9, CSRP2BP, HIVEP1, RALGDS, DHX37, and SCNN1D. The effects of diabetes were partly mediated by the altered methylation of HOOK2, LCE3C, and TMEM63B. The effects of obesity were partly mediated by the differential methylation of LTF and DUSP22. CONCLUSIONS: The presented data highlights the associated altered methylation patterns potentially mediated by maternal diabetes and/or obesity. Larger studies are warranted to investigate the role of both the identified differentially methylated loci and the effects on newborn body composition and future health risk factors for metabolic disease. Additional future consideration should be targeted to the role of Hispanic inheritance. Potential future targeting of transgenerational propagation and developmental programming may reduce population obesity and diabetes risk.


Assuntos
Metilação de DNA , Diabetes Gestacional/genética , Epigenômica/métodos , Sangue Fetal/química , Hispânico ou Latino/genética , Obesidade/genética , Adulto , Ilhas de CpG , Diabetes Gestacional/etnologia , Epigênese Genética , Feminino , Redes Reguladoras de Genes , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , Idade Materna , Troca Materno-Fetal , Obesidade/etnologia , Gravidez , Estudos Prospectivos , Adulto Jovem
2.
BMC Genomics ; 17(1): 628, 2016 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-27519264

RESUMO

BACKGROUND: The continuous and non-synchronous nature of postnatal male germ-cell development has impeded stage-specific resolution of molecular events of mammalian meiotic prophase in the testis. Here the juvenile onset of spermatogenesis in mice is analyzed by combining cytological and transcriptomic data in a novel computational analysis that allows decomposition of the transcriptional programs of spermatogonia and meiotic prophase substages. RESULTS: Germ cells from testes of individual mice were obtained at two-day intervals from 8 to 18 days post-partum (dpp), prepared as surface-spread chromatin and immunolabeled for meiotic stage-specific protein markers (STRA8, SYCP3, phosphorylated H2AFX, and HISTH1T). Eight stages were discriminated cytologically by combinatorial antibody labeling, and RNA-seq was performed on the same samples. Independent principal component analyses of cytological and transcriptomic data yielded similar patterns for both data types, providing strong evidence for substage-specific gene expression signatures. A novel permutation-based maximum covariance analysis (PMCA) was developed to map co-expressed transcripts to one or more of the eight meiotic prophase substages, thereby linking distinct molecular programs to cytologically defined cell states. Expression of meiosis-specific genes is not substage-limited, suggesting regulation of substage transitions at other levels. CONCLUSIONS: This integrated analysis provides a general method for resolving complex cell populations. Here it revealed not only features of meiotic substage-specific gene expression, but also a network of substage-specific transcription factors and relationships to potential target genes.


Assuntos
Meiose , RNA/metabolismo , Espermatócitos/metabolismo , Animais , Células Cultivadas , Cromatina/metabolismo , Redes Reguladoras de Genes , Células Germinativas/citologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Análise de Componente Principal , RNA/química , RNA/isolamento & purificação , Reação em Cadeia da Polimerase em Tempo Real , Análise de Sequência de RNA , Espermatócitos/citologia , Espermatogênese , Testículo/citologia , Fatores de Transcrição/metabolismo , Transcriptoma
3.
Nature ; 526(7571): 112-7, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26367794

RESUMO

The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.


Assuntos
Densidade Óssea/genética , Fraturas Ósseas/genética , Genoma Humano/genética , Proteínas de Homeodomínio/genética , Animais , Osso e Ossos/metabolismo , Modelos Animais de Doenças , Europa (Continente)/etnologia , Exoma/genética , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Genômica , Genótipo , Humanos , Camundongos , Análise de Sequência de DNA , População Branca/genética , Proteínas Wnt/genética
4.
Artigo em Inglês | MEDLINE | ID: mdl-26351520

RESUMO

BACKGROUND: Genetic recombination plays an important role in evolution, facilitating the creation of new, favorable combinations of alleles and the removal of deleterious mutations by unlinking them from surrounding sequences. In most mammals, the placement of genetic crossovers is determined by the binding of PRDM9, a highly polymorphic protein with a long zinc finger array, to its cognate binding sites. It is one of over 800 genes encoding proteins with zinc finger domains in the human genome. RESULTS: We report a novel technique, Affinity-seq, that for the first time identifies both the genome-wide binding sites of DNA-binding proteins and quantitates their relative affinities. We have applied this in vitro technique to PRDM9, the zinc-finger protein that activates genetic recombination, obtaining new information on the regulation of hotspots, whose locations and activities determine the recombination landscape. We identified 31,770 binding sites in the mouse genome for the PRDM9(Dom2) variant. Comparing these results with hotspot usage in vivo, we find that less than half of potential PRDM9 binding sites are utilized in vivo. We show that hotspot usage is increased in actively transcribed genes and decreased in genomic regions containing H3K9me2/3 histone marks or bound to the nuclear lamina. CONCLUSIONS: These results show that a major factor determining whether a binding site will become an active hotspot and what its activity will be are constraints imposed by prior chromatin modifications on the ability of PRDM9 to bind to DNA in vivo. These constraints lead to the presence of long genomic regions depleted of recombination.

5.
Genetics ; 198(1): 59-73, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25236449

RESUMO

Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Animais , Feminino , Genoma , Masculino , Camundongos , Locos de Características Quantitativas
6.
PLoS Genet ; 10(6): e1004423, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24945404

RESUMO

Heritability of bone mineral density (BMD) varies across skeletal sites, reflecting different relative contributions of genetic and environmental influences. To quantify the degree to which common genetic variants tag and environmental factors influence BMD, at different sites, we estimated the genetic (rg) and residual (re) correlations between BMD measured at the upper limbs (UL-BMD), lower limbs (LL-BMD) and skull (SK-BMD), using total-body DXA scans of ∼ 4,890 participants recruited by the Avon Longitudinal Study of Parents and their Children (ALSPAC). Point estimates of rg indicated that appendicular sites have a greater proportion of shared genetic architecture (LL-/UL-BMD rg = 0.78) between them, than with the skull (UL-/SK-BMD rg = 0.58 and LL-/SK-BMD rg = 0.43). Likewise, the residual correlation between BMD at appendicular sites (r(e) = 0.55) was higher than the residual correlation between SK-BMD and BMD at appendicular sites (r(e) = 0.20-0.24). To explore the basis for the observed differences in rg and re, genome-wide association meta-analyses were performed (n ∼ 9,395), combining data from ALSPAC and the Generation R Study identifying 15 independent signals from 13 loci associated at genome-wide significant level across different skeletal regions. Results suggested that previously identified BMD-associated variants may exert site-specific effects (i.e. differ in the strength of their association and magnitude of effect across different skeletal sites). In particular, variants at CPED1 exerted a larger influence on SK-BMD and UL-BMD when compared to LL-BMD (P = 2.01 × 10(-37)), whilst variants at WNT16 influenced UL-BMD to a greater degree when compared to SK- and LL-BMD (P = 2.31 × 10(-14)). In addition, we report a novel association between RIN3 (previously associated with Paget's disease) and LL-BMD (rs754388: ß = 0.13, SE = 0.02, P = 1.4 × 10(-10)). Our results suggest that BMD at different skeletal sites is under a mixture of shared and specific genetic and environmental influences. Allowing for these differences by performing genome-wide association at different skeletal sites may help uncover new genetic influences on BMD.


Assuntos
Densidade Óssea/genética , Proteínas de Transporte/genética , Fatores de Troca do Nucleotídeo Guanina/genética , Proteínas Wnt/genética , Adulto , Desenvolvimento Ósseo , Osso e Ossos/fisiologia , Criança , Estudos de Coortes , Feminino , Estudo de Associação Genômica Ampla , Humanos , Estudos Longitudinais , Extremidade Inferior/crescimento & desenvolvimento , Extremidade Inferior/fisiologia , Masculino , Osteoporose/epidemiologia , Polimorfismo de Nucleotídeo Único , Gravidez , Estudos Prospectivos , Crânio/crescimento & desenvolvimento , Crânio/fisiologia , Extremidade Superior/crescimento & desenvolvimento , Extremidade Superior/fisiologia , Adulto Jovem
7.
Stem Cells ; 32(5): 1161-72, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24307629

RESUMO

Embryonic stem cells (ESCs), characterized by their ability to both self-renew and differentiate into multiple cell lineages, are a powerful model for biomedical research and developmental biology. Human and mouse ESCs share many features, yet have distinctive aspects, including fundamental differences in the signaling pathways and cell cycle controls that support self-renewal. Here, we explore the molecular basis of human ESC self-renewal using Bayesian network machine learning to integrate cell-type-specific, high-throughput data for gene function discovery. We integrated high-throughput ESC data from 83 human studies (~1.8 million data points collected under 1,100 conditions) and 62 mouse studies (~2.4 million data points collected under 1,085 conditions) into separate human and mouse predictive networks focused on ESC self-renewal to analyze shared and distinct functional relationships among protein-coding gene orthologs. Computational evaluations show that these networks are highly accurate, literature validation confirms their biological relevance, and reverse transcriptase polymerase chain reaction (RT-PCR) validation supports our predictions. Our results reflect the importance of key regulatory genes known to be strongly associated with self-renewal and pluripotency in both species (e.g., POU5F1, SOX2, and NANOG), identify metabolic differences between species (e.g., threonine metabolism), clarify differences between human and mouse ESC developmental signaling pathways (e.g., leukemia inhibitory factor (LIF)-activated JAK/STAT in mouse; NODAL/ACTIVIN-A-activated fibroblast growth factor in human), and reveal many novel genes and pathways predicted to be functionally associated with self-renewal in each species. These interactive networks are available online at www.StemSight.org for stem cell researchers to develop new hypotheses, discover potential mechanisms involving sparsely annotated genes, and prioritize genes of interest for experimental validation.


Assuntos
Diferenciação Celular , Proliferação de Células , Células-Tronco Embrionárias/citologia , Biologia de Sistemas/métodos , Algoritmos , Animais , Teorema de Bayes , Linhagem da Célula , Biologia Computacional/métodos , Células-Tronco Embrionárias/metabolismo , Redes Reguladoras de Genes , Humanos , Camundongos , Reprodutibilidade dos Testes , Transdução de Sinais
8.
PLoS One ; 8(2): e56810, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23468881

RESUMO

Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.


Assuntos
Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/fisiologia , Animais , Teorema de Bayes , Diferenciação Celular , Biologia Computacional , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Camundongos , Fases de Leitura Aberta , Mapas de Interação de Proteínas , Proteoma , Reprodutibilidade dos Testes
9.
PLoS Comput Biol ; 8(9): e1002694, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23028291

RESUMO

Integrated analyses of functional genomics data have enormous potential for identifying phenotype-associated genes. Tissue-specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. Accounting for tissue specificity in global integration of functional genomics data is challenging, as "functionality" and "functional relationships" are often not resolved for specific tissue types. We address this challenge by generating tissue-specific functional networks, which can effectively represent the diversity of protein function for more accurate identification of phenotype-associated genes in the laboratory mouse. Specifically, we created 107 tissue-specific functional relationship networks through integration of genomic data utilizing knowledge of tissue-specific gene expression patterns. Cross-network comparison revealed significantly changed genes enriched for functions related to specific tissue development. We then utilized these tissue-specific networks to predict genes associated with different phenotypes. Our results demonstrate that prediction performance is significantly improved through using the tissue-specific networks as compared to the global functional network. We used a testis-specific functional relationship network to predict genes associated with male fertility and spermatogenesis phenotypes, and experimentally confirmed one top prediction, Mbyl1. We then focused on a less-common genetic disease, ataxia, and identified candidates uniquely predicted by the cerebellum network, which are supported by both literature and experimental evidence. Our systems-level, tissue-specific scheme advances over traditional global integration and analyses and establishes a prototype to address the tissue-specific effects of genetic perturbations, diseases and drugs.


Assuntos
Predisposição Genética para Doença/genética , Modelos Biológicos , Especificidade de Órgãos/genética , Mapeamento de Interação de Proteínas/métodos , Proteoma/genética , Proteoma/metabolismo , Transdução de Sinais/genética , Animais , Simulação por Computador , Humanos , Camundongos , Distribuição Tecidual
10.
PLoS One ; 7(3): e33720, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22448268

RESUMO

RNA editing is a process that modifies RNA nucleotides and changes the efficiency and fidelity of the central dogma. Enzymes that catalyze RNA editing are required for life, and defects in RNA editing are associated with many diseases. Recent advances in sequencing have enabled the genome-wide identification of RNA editing sites in mammalian transcriptomes. Here, we demonstrate that canonical RNA editing (A-to-I and C-to-U) occurs in liver, white adipose, and bone tissues of the laboratory mouse, and we show that apparent non-canonical editing (all other possible base substitutions) is an artifact of current high-throughput sequencing technology. Further, we report that high-confidence canonical RNA editing sites can cause non-synonymous amino acid changes and are significantly enriched in 3' UTRs, specifically at microRNA target sites, suggesting both regulatory and functional consequences for RNA editing.


Assuntos
Regiões 3' não Traduzidas/genética , Tecido Adiposo Branco/metabolismo , Osso e Ossos/metabolismo , Fígado/metabolismo , MicroRNAs/genética , Edição de RNA/genética , Animais , Sequência de Bases , Feminino , Camundongos , Camundongos Endogâmicos C57BL , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Polimorfismo de Fragmento de Restrição
11.
Bonekey Rep ; 1: 98, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23951485

RESUMO

Osteoporosis, the progressive loss of bone mass resulting in fragility fractures, affects ∼75 million people in the United States, Europe and Japan. Bone mineral density (BMD) correlates with fracture risk and is widely used in clinical settings to predict fracture. Numerous studies have demonstrated that peak bone mass is highly heritable and consequently a number of genome-wide association studies (GWASs) have been conducted to identify the genes that regulate BMD. Traditional intercross mapping in the mouse has met with limited successes in the field of skeletal biology. With the advent of human GWAS, questions have arisen about the continued need for mouse models in genetics research. However, significant advances have been made in the field of mouse genetics, including new genetics resource populations and loci mapping techniques, which enable gene-level mapping resolution. In this review, we discuss the need for mouse models to help understand the skeletal biology underlying novel human GWAS findings, how loci discovered in the mouse can be used to complement GWAS analysis and highlight the recent advances made in the field of skeletal biology from the use of these new and developing resources. We conclude this paper with a discussion of the need for systems-level approaches in the skeletal biology field, with an emphasis on the need for pathway and network analyses.

12.
PLoS Comput Biol ; 6(11): e1000991, 2010 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-21085640

RESUMO

An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL) linkage mapping and genome-wide association studies (GWAS). However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD), a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies) and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary material is available at http://cbfg.jax.org/phenotype.


Assuntos
Mapeamento Cromossômico , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Membro 8 da Subfamília G de Transportadores de Cassetes de Ligação de ATP , Transportadores de Cassetes de Ligação de ATP/genética , Algoritmos , Animais , Inteligência Artificial , Teorema de Bayes , Densidade Óssea , Análise por Conglomerados , Bases de Dados Genéticas , Modelos Animais de Doenças , Lipoproteínas/genética , Camundongos , Camundongos Transgênicos , Osteoporose/genética , Fenótipo , Locos de Características Quantitativas , Reprodutibilidade dos Testes , Fatores de Risco , Inibidor Tecidual de Metaloproteinase-2/genética
13.
Nat Methods ; 7(3 Suppl): S56-68, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20195258

RESUMO

High-throughput studies of biological systems are rapidly accumulating a wealth of 'omics'-scale data. Visualization is a key aspect of both the analysis and understanding of these data, and users now have many visualization methods and tools to choose from. The challenge is to create clear, meaningful and integrated visualizations that give biological insight, without being overwhelmed by the intrinsic complexity of the data. In this review, we discuss how visualization tools are being used to help interpret protein interaction, gene expression and metabolic profile data, and we highlight emerging new directions.


Assuntos
Genômica , Processamento de Imagem Assistida por Computador , Metabolômica , Proteômica , Biologia de Sistemas , Espectrometria de Massas , Ressonância Magnética Nuclear Biomolecular , Ligação Proteica
14.
Methods Mol Biol ; 548: 273-93, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19521830

RESUMO

Modern experimental techniques have produced a wealth of high-throughput data that has enabled the ongoing genomic revolution. As the field continues to integrate experimental and computational analyzes of this data, it is essential that performance evaluations of high-throughput results be carried out in a consistent and biologically informative manner. Here, we present an overview of evaluation techniques for high-throughput experimental data and computational methods, and we discuss a number of potential pitfalls in this process. These primarily involve the biological diversity of genomic data, which can be masked or misrepresented in overly simplified global evaluations. We describe systems for preserving information about biological context during dataset evaluation, which can help to ensure that multiple different evaluations are more directly comparable. This biological variety in high-throughput data can also be taken advantage of computationally through data integration and process specificity to produce richer systems-level predictions of cellular function. An awareness of these considerations can greatly improve the evaluation and analysis of any high-throughput experimental dataset.


Assuntos
Genoma Fúngico , Proteoma , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Biologia Computacional , Interpretação Estatística de Dados , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/estatística & dados numéricos , Bases de Dados de Proteínas/normas , Bases de Dados de Proteínas/estatística & dados numéricos , Genômica/normas , Genômica/estatística & dados numéricos , Proteômica/normas , Proteômica/estatística & dados numéricos , Biologia de Sistemas
15.
Bioinformatics ; 25(18): 2404-10, 2009 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-19561015

RESUMO

MOTIVATION: Rapidly expanding repositories of highly informative genomic data have generated increasing interest in methods for protein function prediction and inference of biological networks. The successful application of supervised machine learning to these tasks requires a gold standard for protein function: a trusted set of correct examples, which can be used to assess performance through cross-validation or other statistical approaches. Since gene annotation is incomplete for even the best studied model organisms, the biological reliability of such evaluations may be called into question. RESULTS: We address this concern by constructing and analyzing an experimentally based gold standard through comprehensive validation of protein function predictions for mitochondrion biogenesis in Saccharomyces cerevisiae. Specifically, we determine that (i) current machine learning approaches are able to generalize and predict novel biology from an incomplete gold standard and (ii) incomplete functional annotations adversely affect the evaluation of machine learning performance. While computational approaches performed better than predicted in the face of incomplete data, relative comparison of competing approaches-even those employing the same training data-is problematic with a sparse gold standard. Incomplete knowledge causes individual methods' performances to be differentially underestimated, resulting in misleading performance evaluations. We provide a benchmark gold standard for yeast mitochondria to complement current databases and an analysis of our experimental results in the hopes of mitigating these effects in future comparative evaluations. AVAILABILITY: The mitochondrial benchmark gold standard, as well as experimental results and additional data, is available at http://function.princeton.edu/mitochondria.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Algoritmos , Bases de Dados de Proteínas , Mitocôndrias/metabolismo , Proteínas/química , Saccharomyces cerevisiae/metabolismo
16.
PLoS Genet ; 5(3): e1000407, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19300474

RESUMO

Mitochondria are central to many cellular processes including respiration, ion homeostasis, and apoptosis. Using computational predictions combined with traditional quantitative experiments, we have identified 100 proteins whose deficiency alters mitochondrial biogenesis and inheritance in Saccharomyces cerevisiae. In addition, we used computational predictions to perform targeted double-mutant analysis detecting another nine genes with synthetic defects in mitochondrial biogenesis. This represents an increase of about 25% over previously known participants. Nearly half of these newly characterized proteins are conserved in mammals, including several orthologs known to be involved in human disease. Mutations in many of these genes demonstrate statistically significant mitochondrial transmission phenotypes more subtle than could be detected by traditional genetic screens or high-throughput techniques, and 47 have not been previously localized to mitochondria. We further characterized a subset of these genes using growth profiling and dual immunofluorescence, which identified genes specifically required for aerobic respiration and an uncharacterized cytoplasmic protein required for normal mitochondrial motility. Our results demonstrate that by leveraging computational analysis to direct quantitative experimental assays, we have characterized mutants with subtle mitochondrial defects whose phenotypes were undetected by high-throughput methods.


Assuntos
Mitocôndrias/genética , Proteínas/fisiologia , Saccharomyces cerevisiae/ultraestrutura , Respiração Celular/genética , Citoplasma/química , Genes Mitocondriais , Proteínas Mitocondriais , Proteínas Mutantes , Mutação , Proteínas/genética , Proteômica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crescimento & desenvolvimento
17.
PLoS Comput Biol ; 5(3): e1000322, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19300515

RESUMO

Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.


Assuntos
Biologia/métodos , Mitocôndrias/fisiologia , Proteínas Mitocondriais/metabolismo , Modelos Biológicos , Projetos de Pesquisa , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/fisiologia , Transdução de Sinais/fisiologia , Simulação por Computador
18.
Genome Res ; 19(6): 1093-106, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19246570

RESUMO

Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular protein, pathway, or disease if given a functional map summarizing the data and interactions most relevant to his or her area of interest. Using a regularized Bayesian integration system, we provide maps of functional activity and interaction networks in over 200 areas of human cellular biology, each including information from approximately 30,000 genome-scale experiments pertaining to approximately 25,000 human genes. Key to these analyses is the ability to efficiently summarize this large data collection from a variety of biologically informative perspectives: prediction of protein function and functional modules, cross-talk among biological processes, and association of novel genes and pathways with known genetic disorders. In addition to providing maps of each of these areas, we also identify biological processes active in each data set. Experimental investigation of five specific genes, AP3B1, ATP6AP1, BLOC1S1, LAMP2, and RAB11A, has confirmed novel roles for these proteins in the proper initiation of macroautophagy in amino acid-starved human fibroblasts. Our functional maps can be explored using HEFalMp (Human Experimental/Functional Mapper), a web interface allowing interactive visualization and investigation of this large body of information.


Assuntos
Mapeamento Cromossômico/métodos , Fibroblastos/metabolismo , Genoma Humano/genética , Complexo 3 de Proteínas Adaptadoras/genética , Complexo 3 de Proteínas Adaptadoras/metabolismo , Subunidades beta do Complexo de Proteínas Adaptadoras/genética , Subunidades beta do Complexo de Proteínas Adaptadoras/metabolismo , Algoritmos , Aminoácidos/metabolismo , Autofagia/genética , Teorema de Bayes , Biologia Computacional/métodos , Fibroblastos/citologia , Perfilação da Expressão Gênica/estatística & dados numéricos , Redes Reguladoras de Genes , Humanos , Immunoblotting , Proteína 2 de Membrana Associada ao Lisossomo , Proteínas de Membrana Lisossomal/genética , Proteínas de Membrana Lisossomal/metabolismo , Microscopia de Fluorescência , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Transdução de Sinais/genética , ATPases Vacuolares Próton-Translocadoras/genética , ATPases Vacuolares Próton-Translocadoras/metabolismo , Proteínas rab de Ligação ao GTP/genética , Proteínas rab de Ligação ao GTP/metabolismo
19.
Bioinformatics ; 23(20): 2692-9, 2007 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-17724061

RESUMO

MOTIVATION: The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium. RESULTS: We have collected Saccharomyces cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each dataset's relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community. AVAILABILITY: Our query-driven search engine, called SPELL, is available at http://function.princeton.edu/SPELL. SUPPLEMENTARY INFORMATION: Several additional data files, figures and discussions are available at http://function.princeton.edu/SPELL/supplement.


Assuntos
Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Algoritmos , Sistemas de Gerenciamento de Base de Dados , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
BMC Bioinformatics ; 8: 250, 2007 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-17626636

RESUMO

BACKGROUND: The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). RESULTS: We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. CONCLUSION: The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.


Assuntos
Algoritmos , Análise por Conglomerados , Expressão Gênica , Genes Fúngicos , Saccharomyces cerevisiae/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Regulação Fúngica da Expressão Gênica , Curva ROC , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA