Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Science ; 380(6643): eabn2937, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37104612

RESUMO

Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.


Assuntos
Doença , Variação Genética , Animais , Humanos , Evolução Biológica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Doença/genética
2.
bioRxiv ; 2023 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-36945512

RESUMO

Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

3.
bioRxiv ; 2023 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-36711864

RESUMO

Chronic inflammation and tissue fibrosis are common stress responses that worsen organ function, yet the molecular mechanisms governing their crosstalk are poorly understood. In diseased organs, stress-induced changes in gene expression fuel maladaptive cell state transitions and pathological interaction between diverse cellular compartments. Although chronic fibroblast activation worsens dysfunction of lung, liver, kidney, and heart, and exacerbates many cancers, the stress-sensing mechanisms initiating the transcriptional activation of fibroblasts are not well understood. Here, we show that conditional deletion of the transcription co-activator Brd4 in Cx3cr1-positive myeloid cells ameliorates heart failure and is associated with a dramatic reduction in fibroblast activation. Analysis of single-cell chromatin accessibility and BRD4 occupancy in vivo in Cx3cr1-positive cells identified a large enhancer proximal to Interleukin-1 beta (Il1b), and a series of CRISPR deletions revealed the precise stress-dependent regulatory element that controlled expression of Il1b in disease. Secreted IL1B functioned non-cell autonomously to activate a p65/RELA-dependent enhancer near the transcription factor MEOX1, resulting in a profibrotic response in human cardiac fibroblasts. In vivo, antibody-mediated IL1B neutralization prevented stress-induced expression of MEOX1, inhibited fibroblast activation, and improved cardiac function in heart failure. The elucidation of BRD4-dependent crosstalk between a specific immune cell subset and fibroblasts through IL1B provides new therapeutic strategies for heart disease and other disorders of chronic inflammation and maladaptive tissue remodeling.

4.
Nat Microbiol ; 7(10): 1605-1620, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36138165

RESUMO

Pharmaceuticals have extensive reciprocal interactions with the microbiome, but whether bacterial drug sensitivity and metabolism is driven by pathways conserved in host cells remains unclear. Here we show that anti-cancer fluoropyrimidine drugs inhibit the growth of gut bacterial strains from 6 phyla. In both Escherichia coli and mammalian cells, fluoropyrimidines disrupt pyrimidine metabolism. Proteobacteria and Firmicutes metabolized 5-fluorouracil to its inactive metabolite dihydrofluorouracil, mimicking the major host mechanism for drug clearance. The preTA operon was necessary and sufficient for 5-fluorouracil inactivation by E. coli, exhibited high catalytic efficiency for the reductive reaction, decreased the bioavailability and efficacy of oral fluoropyrimidine treatment in mice and was prevalent in the gut microbiomes of colorectal cancer patients. The conservation of both the targets and enzymes for metabolism of therapeutics across domains highlights the need to distinguish the relative contributions of human and microbial cells to drug efficacy and side-effect profiles.


Assuntos
Antineoplásicos , Escherichia coli , Animais , Antineoplásicos/metabolismo , Antineoplásicos/farmacologia , Bactérias/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Fluoruracila/metabolismo , Fluoruracila/farmacologia , Humanos , Mamíferos , Redes e Vias Metabólicas , Camundongos
5.
Genome Biol ; 20(1): 167, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-31416467

RESUMO

The CRISPR/Cas system is a highly specific genome editing tool capable of distinguishing alleles differing by even a single base pair. Target sites might carry genetic variations that are not distinguishable by sgRNA designing tools based on one reference genome. AlleleAnalyzer is an open-source software that incorporates single-nucleotide variants and short insertions and deletions to design sgRNAs for precisely editing 1 or multiple haplotypes of a sequenced genome, currently supporting 11 Cas proteins. It also leverages patterns of shared genetic variation to optimize sgRNA design for different human populations. AlleleAnalyzer is available at https://github.com/keoughkath/AlleleAnalyzer .


Assuntos
Alelos , RNA Guia de Cinetoplastídeos/genética , Software , Sequência de Bases , Proteínas Associadas a CRISPR/metabolismo , Humanos , Polimorfismo Genético
6.
Science ; 364(6446): 1156-1162, 2019 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-31221853

RESUMO

Glycosylation alterations are indicative of tissue inflammation and neoplasia, but whether these alterations contribute to disease pathogenesis is largely unknown. To study the role of glycan changes in pancreatic disease, we inducibly expressed human fucosyltransferase 3 and ß1,3-galactosyltransferase 5 in mice, reconstituting the glycan sialyl-Lewisa, also known as carbohydrate antigen 19-9 (CA19-9). Notably, CA19-9 expression in mice resulted in rapid and severe pancreatitis with hyperactivation of epidermal growth factor receptor (EGFR) signaling. Mechanistically, CA19-9 modification of the matricellular protein fibulin-3 increased its interaction with EGFR, and blockade of fibulin-3, EGFR ligands, or CA19-9 prevented EGFR hyperactivation in organoids. CA19-9-mediated pancreatitis was reversible and could be suppressed with CA19-9 antibodies. CA19-9 also cooperated with the KrasG12D oncogene to produce aggressive pancreatic cancer. These findings implicate CA19-9 in the etiology of pancreatitis and pancreatic cancer and nominate CA19-9 as a therapeutic target.


Assuntos
Antígeno CA-19-9/metabolismo , Carcinoma Ductal Pancreático/metabolismo , Receptores ErbB/metabolismo , Neoplasias Pancreáticas/metabolismo , Pancreatite/metabolismo , Doença Aguda , Animais , Antígeno CA-19-9/imunologia , Carcinogênese/metabolismo , Carcinoma Ductal Pancreático/patologia , Linhagem Celular Tumoral , Doença Crônica , Proteínas da Matriz Extracelular/metabolismo , Fucosiltransferases/genética , Fucosiltransferases/metabolismo , Galactosiltransferases/genética , Galactosiltransferases/metabolismo , Glicosilação , Humanos , Camundongos , Terapia de Alvo Molecular/métodos , Neoplasias Pancreáticas/patologia , Pancreatite/patologia
7.
mSystems ; 4(4)2019.
Artigo em Inglês | MEDLINE | ID: mdl-31098399

RESUMO

While recent research indicates that human health is affected by the gut microbiome, the functional mechanisms that underlie host-microbiome interactions remain poorly resolved. Metagenomic clinical studies can address this problem by revealing specific microbial functions that stratify healthy and diseased individuals. To improve our understanding of the relationship between the gut microbiome and health, we conducted the first integrative functional analysis of nearly 2,000 publicly available fecal metagenomic samples obtained from eight clinical studies. We identified characteristics of the gut microbiome that associate generally with disease, including functional alpha-diversity, beta-diversity, and beta-dispersion. Using regression modeling, we identified specific microbial functions that robustly stratify diseased individuals from healthy controls. Many of these functions overlapped multiple diseases, suggesting a general role in host health, while others were specific to a single disease and may indicate disease-specific etiologies. Our results clarify potential microbiome-mediated mechanisms of disease and reveal features of the microbiome that may be useful for the development of microbiome-based diagnostics. IMPORTANCE The composition of the gut microbiome associates with a wide range of human diseases, but the mechanisms underpinning these associations are not well understood. To shift toward a mechanistic understanding, we integrated distinct metagenomic data sets to identify functions encoded in the gut microbiome that associate with multiple diseases, which may be important to human health. Additionally, we identified functions that associate with specific diseases, which may elucidate disease-specific etiologies. We demonstrated that the functions encoded in the microbiome can be used to classify disease status, but the inclusion of additional patient covariates may be necessary to obtain sufficient accuracy. Ultimately, this analysis advances our understanding of the gut microbiome functions that constitute a healthy microbiome and identifies potential targets for microbiome-based diagnostics and therapeutics.

8.
Proc Natl Acad Sci U S A ; 116(6): 2175-2180, 2019 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-30659153

RESUMO

The potential impact of structural variants includes not only the duplication or deletion of coding sequences, but also the perturbation of noncoding DNA regulatory elements and structural chromatin features, including topological domains (TADs). Structural variants disrupting TAD boundaries have been implicated both in cancer and developmental disease; this likely occurs via "enhancer hijacking," whereby removal of the TAD boundary exposes enhancers to new target transcription start sites (TSSs). With this functional role, we hypothesized that boundaries would display evidence for negative selection. Here we demonstrate that the chromatin landscape constrains structural variation both within healthy humans and across primate evolution. In contrast, in patients with developmental delay, variants occur remarkably uniformly across genomic features, suggesting a potentially broad role for enhancer hijacking in human disease.


Assuntos
Cromatina/química , Cromatina/genética , Evolução Molecular , Variação Genética , Animais , Transtorno Autístico/genética , Cromatina/metabolismo , Deficiências do Desenvolvimento/genética , Duplicação Gênica , Genoma , Genômica/métodos , Hominidae/genética , Humanos , Deleção de Sequência , Sítio de Iniciação de Transcrição
9.
Cell ; 175(7): 1931-1945.e18, 2018 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-30550790

RESUMO

Mosquito-borne flaviviruses, including dengue virus (DENV) and Zika virus (ZIKV), are a growing public health concern. Systems-level analysis of how flaviviruses hijack cellular processes through virus-host protein-protein interactions (PPIs) provides information about their replication and pathogenic mechanisms. We used affinity purification-mass spectrometry (AP-MS) to compare flavivirus-host interactions for two viruses (DENV and ZIKV) in two hosts (human and mosquito). Conserved virus-host PPIs revealed that the flavivirus NS5 protein suppresses interferon stimulated genes by inhibiting recruitment of the transcription complex PAF1C and that chemical modulation of SEC61 inhibits DENV and ZIKV replication in human and mosquito cells. Finally, we identified a ZIKV-specific interaction between NS4A and ANKLE2, a gene linked to hereditary microcephaly, and showed that ZIKV NS4A causes microcephaly in Drosophila in an ANKLE2-dependent manner. Thus, comparative flavivirus-host PPI mapping provides biological insights and, when coupled with in vivo models, can be used to unravel pathogenic mechanisms.


Assuntos
Vírus da Dengue , Dengue , Proteínas de Membrana , Proteínas Nucleares , Proteínas não Estruturais Virais , Infecção por Zika virus , Zika virus , Animais , Linhagem Celular Tumoral , Culicidae , Dengue/genética , Dengue/metabolismo , Dengue/patologia , Vírus da Dengue/genética , Vírus da Dengue/metabolismo , Vírus da Dengue/patogenicidade , Células HEK293 , Humanos , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Mapeamento de Interação de Proteínas , Proteínas não Estruturais Virais/genética , Proteínas não Estruturais Virais/metabolismo , Zika virus/genética , Zika virus/metabolismo , Zika virus/patogenicidade , Infecção por Zika virus/genética , Infecção por Zika virus/metabolismo , Infecção por Zika virus/patologia
10.
mSystems ; 3(3)2018.
Artigo em Inglês | MEDLINE | ID: mdl-29795809

RESUMO

Although much work has linked the human microbiome to specific phenotypes and lifestyle variables, data from different projects have been challenging to integrate and the extent of microbial and molecular diversity in human stool remains unknown. Using standardized protocols from the Earth Microbiome Project and sample contributions from over 10,000 citizen-scientists, together with an open research network, we compare human microbiome specimens primarily from the United States, United Kingdom, and Australia to one another and to environmental samples. Our results show an unexpected range of beta-diversity in human stool microbiomes compared to environmental samples; demonstrate the utility of procedures for removing the effects of overgrowth during room-temperature shipping for revealing phenotype correlations; uncover new molecules and kinds of molecular communities in the human stool metabolome; and examine emergent associations among the microbiome, metabolome, and the diversity of plants that are consumed (rather than relying on reductive categorical variables such as veganism, which have little or no explanatory power). We also demonstrate the utility of the living data resource and cross-cohort comparison to confirm existing associations between the microbiome and psychiatric illness and to reveal the extent of microbiome change within one individual during surgery, providing a paradigm for open microbiome research and education. IMPORTANCE We show that a citizen science, self-selected cohort shipping samples through the mail at room temperature recaptures many known microbiome results from clinically collected cohorts and reveals new ones. Of particular interest is integrating n = 1 study data with the population data, showing that the extent of microbiome change after events such as surgery can exceed differences between distinct environmental biomes, and the effect of diverse plants in the diet, which we confirm with untargeted metabolomics on hundreds of samples.

11.
J Virol ; 92(3)2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29142137

RESUMO

The human genome is structurally organized in three-dimensional space to facilitate functional partitioning of transcription. We learned that the latent episome of the human Epstein-Barr virus (EBV) preferentially associates with gene-poor chromosomes and avoids gene-rich chromosomes. Kaposi's sarcoma-associated herpesvirus behaves similarly, but human papillomavirus does not. Contacts on the EBV side localize to OriP, the latent origin of replication. This genetic element and the EBNA1 protein that binds there are sufficient to reconstitute chromosome association preferences of the entire episome. Contacts on the human side localize to gene-poor and AT-rich regions of chromatin distant from transcription start sites. Upon reactivation from latency, however, the episome moves away from repressive heterochromatin and toward active euchromatin. Our work adds three-dimensional relocalization to the molecular events that occur during reactivation. Involvement of myriad interchromosomal associations also suggests a role for this type of long-range association in gene regulation.IMPORTANCE The human genome is structurally organized in three-dimensional space, and this structure functionally affects transcriptional activity. We set out to investigate whether a double-stranded DNA virus, Epstein-Barr virus (EBV), uses mechanisms similar to those of the human genome to regulate transcription. We found that the EBV genome associates with repressive compartments of the nucleus during latency and with active compartments during reactivation. This study advances our knowledge of the EBV life cycle, adding three-dimensional relocalization as a novel component to the molecular events that occur during reactivation. Furthermore, the data add to our understanding of nuclear compartments, showing that disperse interchromosomal interactions may be important for regulating transcription.


Assuntos
Cromatina/genética , Antígenos Nucleares do Vírus Epstein-Barr/metabolismo , Herpesvirus Humano 4/fisiologia , Plasmídeos/genética , Linhagem Celular , Núcleo Celular/genética , Núcleo Celular/virologia , Cromatina/virologia , Cromossomos Humanos/genética , Cromossomos Humanos/virologia , Humanos , Células K562 , Origem de Replicação
12.
BMC Bioinformatics ; 18(Suppl 2): 63, 2017 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-28251868

RESUMO

BACKGROUND: Cluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices. This visualization technique has high data density and reveal clusters better than unordered heatmaps alone. However, cluster heatmaps have known issues making them both time consuming to use and prone to error. We hypothesize that visualization techniques without the rigid grid constraint of cluster heatmaps will perform better at clustering-related tasks. RESULTS: We developed an approach to "unbox" the heatmap values and embed them directly in the hierarchical clustering results, allowing us to use standard hierarchical visualization techniques as alternatives to cluster heatmaps. We then tested our hypothesis by conducting a survey of 45 practitioners to determine how cluster heatmaps are used, prototyping alternatives to cluster heatmaps using pair analytics with a computational biologist, and evaluating those alternatives with hour-long interviews of 5 practitioners and an Amazon Mechanical Turk user study with approximately 200 participants. We found statistically significant performance differences for most clustering-related tasks, and in the number of perceived visual clusters. Visit git.io/vw0t3 for our results. CONCLUSIONS: The optimal technique varied by task. However, gapmaps were preferred by the interviewed practitioners and outperformed or performed as well as cluster heatmaps for clustering-related tasks. Gapmaps are similar to cluster heatmaps, but relax the heatmap grid constraints by introducing gaps between rows and/or columns that are not closely clustered. Based on these results, we recommend users adopt gapmaps as an alternative to cluster heatmaps.


Assuntos
Análise por Conglomerados , Biologia Computacional , Linhagem Celular Tumoral , Humanos , Células K562
13.
PLoS Comput Biol ; 11(11): e1004573, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26565399

RESUMO

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.


Assuntos
Mapeamento Cromossômico/métodos , Metagenoma/genética , Metagenômica/métodos , Microbiota/genética , Simulação por Computador , Doença de Crohn/microbiologia , Marcadores Genéticos/genética , Humanos , Modelos Genéticos
14.
Nucleic Acids Res ; 43(11): 5307-17, 2015 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-25934800

RESUMO

Cancer-associated somatic mutations outside protein-coding regions remain largely unexplored. Analyses of the TERT locus have indicated that non-coding regulatory mutations can be more frequent than previously suspected and play important roles in oncogenesis. Using a computational method called SASE-hunter, developed here, we identified a novel signature of accelerated somatic evolution (SASE) marked by a significant excess of somatic mutations localized in a genomic locus, and prioritized those loci that carried the signature in multiple cancer patients. Interestingly, even when an affected locus carried the signature in multiple individuals, the mutations contributing to SASE themselves were rarely recurrent at the base-pair resolution. In a pan-cancer analysis of 906 samples from 12 tumor types, we detected SASE in the promoters of several genes, including known cancer genes such as MYC, BCL2, RBM5 and WWOX. Nucleotide substitution patterns consistent with oxidative DNA damage and local somatic hypermutation appeared to contribute to this signature in selected gene promoters (e.g. MYC). SASEs in selected cancer gene promoters were associated with over-expression, and also correlated with the age of onset of cancer, aggressiveness of the disease and survival. Taken together, our work detects a hitherto under-appreciated and clinically important class of regulatory changes in cancer genomes.


Assuntos
Mutação , Neoplasias/genética , Regiões Promotoras Genéticas , Adulto , Expressão Gênica , Genômica , Humanos , Pessoa de Meia-Idade , Neoplasias/diagnóstico , Software
15.
Mol Cell ; 52(3): 314-24, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-24207025

RESUMO

Lysine acetylation regulates transcription by targeting histones and nonhistone proteins. Here we report that the central regulator of transcription, RNA polymerase II, is subject to acetylation in mammalian cells. Acetylation occurs at eight lysines within the C-terminal domain (CTD) of the largest polymerase subunit and is mediated by p300/KAT3B. CTD acetylation is specifically enriched downstream of the transcription start sites of polymerase-occupied genes genome-wide, indicating a role in early stages of transcription initiation or elongation. Mutation of lysines or p300 inhibitor treatment causes the loss of epidermal growth-factor-induced expression of c-Fos and Egr2, immediate-early genes with promoter-proximally paused polymerases, but does not affect expression or polymerase occupancy at housekeeping genes. Our studies identify acetylation as a new modification of the mammalian RNA polymerase II required for the induction of growth factor response genes.


Assuntos
Histonas/genética , Lisina/genética , RNA Polimerase II/metabolismo , Transcrição Gênica , Acetilação , Animais , Proteína 2 de Resposta de Crescimento Precoce/biossíntese , Células-Tronco Embrionárias/citologia , Regulação da Expressão Gênica , Genes fos/genética , Histonas/metabolismo , Humanos , Regiões Promotoras Genéticas , RNA Polimerase II/genética , Fatores de Transcrição de p300-CBP/genética , Fatores de Transcrição de p300-CBP/metabolismo
16.
PLoS Comput Biol ; 8(6): e1002567, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22761559

RESUMO

The evolutionary history of a protein reflects the functional history of its ancestors. Recent phylogenetic studies identified distinct evolutionary signatures that characterize proteins involved in cancer, Mendelian disease, and different ontogenic stages. Despite the potential to yield insight into the cellular functions and interactions of proteins, such comparative phylogenetic analyses are rarely performed, because they require custom algorithms. We developed ProteinHistorian to make tools for performing analyses of protein origins widely available. Given a list of proteins of interest, ProteinHistorian estimates the phylogenetic age of each protein, quantifies enrichment for proteins of specific ages, and compares variation in protein age with other protein attributes. ProteinHistorian allows flexibility in the definition of protein age by including several algorithms for estimating ages from different databases of evolutionary relationships. We illustrate the use of ProteinHistorian with three example analyses. First, we demonstrate that proteins with high expression in human, compared to chimpanzee and rhesus macaque, are significantly younger than those with human-specific low expression. Next, we show that human proteins with annotated regulatory functions are significantly younger than proteins with catalytic functions. Finally, we compare protein length and age in many eukaryotic species and, as expected from previous studies, find a positive, though often weak, correlation between protein age and length. ProteinHistorian is available through a web server with an intuitive interface and as a set of command line tools; this allows biologists and bioinformaticians alike to integrate these approaches into their analysis pipelines. ProteinHistorian's modular, extensible design facilitates the integration of new datasets and algorithms. The ProteinHistorian web server, source code, and pre-computed ages for 32 eukaryotic genomes are freely available under the GNU public license at http://lighthouse.ucsf.edu/ProteinHistorian/.


Assuntos
Evolução Molecular , Modelos Genéticos , Proteínas/genética , Software , Algoritmos , Animais , Biologia Computacional , Simulação por Computador , Bases de Dados de Proteínas , Expressão Gênica , Humanos , Filogenia , Proteínas/química , Proteínas/fisiologia , Especificidade da Espécie , Fatores de Tempo
17.
Stat Appl Genet Mol Biol ; 7(1): Article 33, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-19049489

RESUMO

We introduce a novel statistical concept, called a supervised distance matrix, which quantifies pairwise similarity between variables in terms of their association with an outcome. Supervised distance matrices are derived in two stages. First, the observed data is transformed based on particular working models for association. Examples of transformations include residuals or influence curves from regression models. In the second stage, a choice of distance measure is used to compute all pairwise distances between variables in the transformed data. We present consistent estimators of the resulting distance matrix, including an inverse probability of censoring weighted estimator for use with right-censored outcomes. Supervised distance matrices can be used with standard (unsupervised) clustering algorithms to identify groups of similarly predictive variables and to discover subpopulations of related samples. This approach is illustrated using simulations and an analysis of gene expression data with a censored survival outcome. The proposed methods are widely applicable in genomics and other fields where high-dimensional data is collected on each subject.


Assuntos
Biometria/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Algoritmos , Análise por Conglomerados , Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Humanos , Modelos Lineares , Linfoma de Célula do Manto/genética , Família Multigênica
18.
BMC Syst Biol ; 1: 56, 2007 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-18039394

RESUMO

BACKGROUND: The molecular events underlying mammary development during pregnancy, lactation, and involution are incompletely understood. RESULTS: Mammary gland microarray data, cellular localization data, protein-protein interactions, and literature-mined genes were integrated and analyzed using statistics, principal component analysis, gene ontology analysis, pathway analysis, and network analysis to identify global biological principles that govern molecular events during pregnancy, lactation, and involution. CONCLUSION: Several key principles were derived: (1) nearly a third of the transcriptome fluctuates to build, run, and disassemble the lactation apparatus; (2) genes encoding the secretory machinery are transcribed prior to lactation; (3) the diversity of the endogenous portion of the milk proteome is derived from fewer than 100 transcripts; (4) while some genes are differentially transcribed near the onset of lactation, the lactation switch is primarily post-transcriptionally mediated; (5) the secretion of materials during lactation occurs not by up-regulation of novel genomic functions, but by widespread transcriptional suppression of functions such as protein degradation and cell-environment communication; (6) the involution switch is primarily transcriptionally mediated; and (7) during early involution, the transcriptional state is partially reverted to the pre-lactation state. A new hypothesis for secretory diminution is suggested - milk production gradually declines because the secretory machinery is not transcriptionally replenished. A comprehensive network of protein interactions during lactation is assembled and new regulatory gene targets are identified. Less than one fifth of the transcriptionally regulated nodes in this lactation network have been previously explored in the context of lactation. Implications for future research in mammary and cancer biology are discussed.


Assuntos
Mama/metabolismo , Regulação da Expressão Gênica/fisiologia , Lactação/metabolismo , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Biologia Computacional/métodos , Simulação por Computador , Feminino , Humanos
19.
Math Biosci ; 176(1): 99-121, 2002 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11867086

RESUMO

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function theta=Phi(P) of the true data generating distribution P, and an estimate is obtained by applying this function to the empirical distribution P(n). We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distribution of Phi(P(n)). The method is illustrated on a publicly available data set.


Assuntos
Análise por Conglomerados , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Simulação por Computador , Humanos , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA