RESUMO
BACKGROUND: There is a growing appreciation of the role of proteolytic processes in human health and disease, but tools for analysis of such processes on a proteome-wide scale are limited. Furin is a ubiquitous proprotein convertase that cleaves after basic residues and transforms secretory proproteins into biologically active proteins. Despite this important role, many furin substrates remain unknown in the human proteome. METHODOLOGY/PRINCIPAL FINDINGS: We devised an approach for proteinase target identification that combines an in silico discovery pipeline with highly multiplexed proteinase activity assays. We performed in silico analysis of the human proteome and identified over 1,050 secretory proteins as potential furin substrates. We then used a multiplexed protease assay to validate these tentative targets. The assay was carried out on over 3,260 overlapping peptides designed to represent P7-P1' and P4-P4' positions of furin cleavage sites in the candidate proteins. The obtained results greatly increased our knowledge of the unique cleavage preferences of furin, revealed the importance of both short-range (P4-P1) and long-range (P7-P6) interactions in defining furin cleavage specificity, demonstrated that the R-X-R/K/X-R ↓ motif alone is insufficient for predicting furin proteolysis of the substrate, and identified ≈ 490 potential protein substrates of furin in the human proteome. CONCLUSIONS/SIGNIFICANCE: The assignment of these substrates to cellular pathways suggests an important role of furin in development, including axonal guidance, cardiogenesis, and maintenance of stem cell pluripotency. The novel approach proposed in this study can be readily applied to other proteinases.
Assuntos
Furina/química , Furina/metabolismo , Proteoma/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Estrutura Secundária de Proteína , Proteólise , Reprodutibilidade dos Testes , Especificidade por SubstratoRESUMO
We report a scalable and cost-effective technology for generating and screening high-complexity customizable peptide sets. The peptides are made as peptide-cDNA fusions by in vitro transcription/translation from pools of DNA templates generated by microarray-based synthesis. This approach enables large custom sets of peptides to be designed in silico, manufactured cost-effectively in parallel, and assayed efficiently in a multiplexed fashion. The utility of our peptide-cDNA fusion pools was demonstrated in two activity-based assays designed to discover protease and kinase substrates. In the protease assay, cleaved peptide substrates were separated from uncleaved and identified by digital sequencing of their cognate cDNAs. We screened the 3,011 amino acid HCV proteome for susceptibility to cleavage by the HCV NS3/4A protease and identified all 3 known trans cleavage sites with high specificity. In the kinase assay, peptide substrates phosphorylated by tyrosine kinases were captured and identified by sequencing of their cDNAs. We screened a pool of 3,243 peptides against Abl kinase and showed that phosphorylation events detected were specific and consistent with the known substrate preferences of Abl kinase. Our approach is scalable and adaptable to other protein-based assays.
Assuntos
DNA Complementar/genética , Hepacivirus/genética , Peptídeo Hidrolases/metabolismo , Peptídeos/genética , Fosfotransferases/metabolismo , Proteômica/métodos , DNA Complementar/metabolismo , Análise em Microsséries/métodos , Peptídeos/metabolismo , Fosforilação , Especificidade por Substrato , Proteínas não Estruturais Virais/metabolismoRESUMO
BACKGROUND: The hepatitis C virus (HCV) genome encodes a long polyprotein, which is processed by host cell and viral proteases to the individual structural and non-structural (NS) proteins. HCV NS3/4A serine proteinase (NS3/4A) is a non-covalent heterodimer of the N-terminal, â¼180-residue portion of the 631-residue NS3 protein with the NS4A co-factor. NS3/4A cleaves the polyprotein sequence at four specific regions. NS3/4A is essential for viral replication and has been considered an attractive drug target. METHODOLOGY/PRINCIPAL FINDINGS: Using a novel multiplex cleavage assay and over 2,660 peptide sequences derived from the polyprotein and from introducing mutations into the known NS3/4A cleavage sites, we obtained the first detailed fingerprint of NS3/4A cleavage preferences. Our data identified structural requirements illuminating the importance of both the short-range (P1-P1') and long-range (P6-P5) interactions in defining the NS3/4A substrate cleavage specificity. A newly observed feature of NS3/4A was a high frequency of either Asp or Glu at both P5 and P6 positions in a subset of the most efficient NS3/4A substrates. In turn, aberrations of this negatively charged sequence such as an insertion of a positively charged or hydrophobic residue between the negatively charged residues resulted in inefficient substrates. Because NS5B misincorporates bases at a high rate, HCV constantly mutates as it replicates. Our analysis revealed that mutations do not interfere with polyprotein processing in over 5,000 HCV isolates indicating a pivotal role of NS3/4A proteolysis in the virus life cycle. CONCLUSIONS/SIGNIFICANCE: Our multiplex assay technology in light of the growing appreciation of the role of proteolytic processes in human health and disease will likely have widespread applications in the proteolysis research field and provide new therapeutic opportunities.
Assuntos
Serina Endopeptidases/química , Proteínas não Estruturais Virais/química , Sequência de Aminoácidos , Ensaios de Triagem em Larga Escala , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Mutação , Peptídeos/análise , Peptídeos/síntese química , Poliproteínas/química , Processamento de Proteína Pós-Traducional , Proteólise , Serina Endopeptidases/genética , Serina Endopeptidases/metabolismo , Especificidade por Substrato , Proteínas não Estruturais Virais/genética , Proteínas não Estruturais Virais/metabolismoRESUMO
A central problem in crystallography is crystal structure determination directly from diffraction intensities. For structures of small molecules, this problem has been addressed by probabilistic direct methods that allow one to obtain the structure coordinates with a high degree of certainty given a sufficiently large set of intensities. In contrast, deterministic algebraic methods that could guarantee a solution and may be applicable to macromolecules have not yet emerged. In this study a basic algebraic question is posed: how many crystal structures can be obtained from a given set of intensities? Recently, by using a new origin definition and the method of elementary symmetrical polynomials, all small (N ≤ 4 atoms) one-dimensional crystal structures that could be obtained from the minimum set of N - 1 lowest-resolution intensities were enumerated. Here, by using methods of modern algebraic geometry the maximum number of one-dimensional crystal structures that can be determined from the minimum set of intensities for N > 4 is obtained. It is demonstrated that this ambiguity increases exponentially with the increasing number of atoms in the structure N (~4(N)/N(3/2) for N >> 1) and includes non-homometric structures. Therefore, a minimum set of intensities, even in principle, is insufficient for structure determination for all but very small structures.
Assuntos
Cristalografia por Raios X , Substâncias Macromoleculares/química , Proteínas/química , Modelos Moleculares , Difração de Raios XRESUMO
BACKGROUND: We sought to identify candidate serum biomarkers for the detection and surveillance of EOC. Based on RNA-Seq transcriptome analysis of patient-derived tumors, highly expressed secreted proteins were identified using a bioinformatic approach. METHODS: RNA-Seq was used to quantify papillary serous ovarian cancer transcriptomes. Paired end sequencing of 22 flash frozen tumors was performed. Sequence alignments were processed with the program ELAND, expression levels with ERANGE and then bioinformatically screened for secreted protein signatures. Serum samples from women with benign and malignant pelvic masses and serial samples from women during chemotherapy regimens were measured for IGFBP-4 by ELISA. Student's t Test, ANOVA, and ROC curves were used for statistical analysis. RESULTS: Insulin-like growth factor binding protein (IGFBP-4) was consistently present in the top 7.5% of all expressed genes in all tumor samples. We then screened serum samples to determine if increased tumor expression correlated with serum expression. In an initial discovery set of 21 samples, IGFBP-4 levels were found to be elevated in patients, including those with early stage disease and normal CA125 levels. In a larger and independent validation set (82 controls, 78 cases), IGFBP-4 levels were significantly increased (p < 5 × 10-5). IGFBP-4 levels were ~3× greater in women with malignant pelvic masses compared to women with benign masses. ROC sensitivity was 73% at 93% specificity (AUC 0.816). In women receiving chemotherapy, average IGFBP-4 levels were below the ROC-determined threshold and lower in NED patients compared to AWD patients. CONCLUSIONS: This study, the first to our knowledge to use RNA-Seq for biomarker discovery, identified IGFBP-4 as overexpressed in ovarian cancer patients. Beyond this, these studies identified two additional intriguing findings. First, IGFBP-4 can be elevated in early stage disease without elevated CA125. Second, IGFBP-4 levels are significantly elevated with malignant versus benign disease. These findings provide the rationale for future validation studies.
RESUMO
BACKGROUND: The prognosis of hepatocellular carcinoma (HCC) varies following surgical resection and the large variation remains largely unexplained. Studies have revealed the ability of clinicopathologic parameters and gene expression to predict HCC prognosis. However, there has been little systematic effort to compare the performance of these two types of predictors or combine them in a comprehensive model. METHODS: Tumor and adjacent non-tumor liver tissues were collected from 272 ethnic Chinese HCC patients who received curative surgery. We combined clinicopathologic parameters and gene expression data (from both tissue types) in predicting HCC prognosis. Cross-validation and independent studies were employed to assess prediction. RESULTS: HCC prognosis was significantly associated with six clinicopathologic parameters, which can partition the patients into good- and poor-prognosis groups. Within each group, gene expression data further divide patients into distinct prognostic subgroups. Our predictive genes significantly overlap with previously published gene sets predictive of prognosis. Moreover, the predictive genes were enriched for genes that underwent normal-to-tumor gene network transformation. Previously documented liver eSNPs underlying the HCC predictive gene signatures were enriched for SNPs that associated with HCC prognosis, providing support that these genes are involved in key processes of tumorigenesis. CONCLUSION: When applied individually, clinicopathologic parameters and gene expression offered similar predictive power for HCC prognosis. In contrast, a combination of the two types of data dramatically improved the power to predict HCC prognosis. Our results also provided a framework for understanding the impact of gene expression on the processes of tumorigenesis and clinical outcome.
Assuntos
Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patologia , Carcinoma Hepatocelular/metabolismo , Carcinoma Hepatocelular/cirurgia , Transformação Celular Neoplásica/genética , Intervalo Livre de Doença , Feminino , Expressão Gênica , Perfilação da Expressão Gênica , Humanos , Neoplasias Hepáticas/metabolismo , Neoplasias Hepáticas/cirurgia , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , PrognósticoRESUMO
BACKGROUND: In hepatocellular carcinoma (HCC) genes predictive of survival have been found in both adjacent normal (AN) and tumor (TU) tissues. The relationships between these two sets of predictive genes and the general process of tumorigenesis and disease progression remains unclear. METHODOLOGY/PRINCIPAL FINDINGS: Here we have investigated HCC tumorigenesis by comparing gene expression, DNA copy number variation and survival using â¼250 AN and TU samples representing, respectively, the pre-cancer state, and the result of tumorigenesis. Genes that participate in tumorigenesis were defined using a gene-gene correlation meta-analysis procedure that compared AN versus TU tissues. Genes predictive of survival in AN (AN-survival genes) were found to be enriched in the differential gene-gene correlation gene set indicating that they directly participate in the process of tumorigenesis. Additionally the AN-survival genes were mostly not predictive after tumorigenesis in TU tissue and this transition was associated with and could largely be explained by the effect of somatic DNA copy number variation (sCNV) in cis and in trans. The data was consistent with the variance of AN-survival genes being rate-limiting steps in tumorigenesis and this was confirmed using a treatment that promotes HCC tumorigenesis that selectively altered AN-survival genes and genes differentially correlated between AN and TU. CONCLUSIONS/SIGNIFICANCE: This suggests that the process of tumor evolution involves rate-limiting steps related to the background from which the tumor evolved where these were frequently predictive of clinical outcome. Additionally treatments that alter the likelihood of tumorigenesis occurring may act by altering AN-survival genes, suggesting that the process can be manipulated. Further sCNV explains a substantial fraction of tumor specific expression and may therefore be a causal driver of tumor evolution in HCC and perhaps many solid tumor types.
Assuntos
Carcinoma Hepatocelular/genética , Variações do Número de Cópias de DNA , Perfilação da Expressão Gênica , Neoplasias Hepáticas/genética , Fígado/metabolismo , Adulto , Idoso , Animais , Linhagem Celular Tumoral , Cromossomos Humanos Par 1/genética , Feminino , Redes Reguladoras de Genes , Humanos , Fígado/patologia , Masculino , Camundongos , Camundongos Transgênicos , Pessoa de Meia-Idade , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas Proto-Oncogênicas c-met/genética , Análise de RegressãoRESUMO
To map the genetics of gene expression in metabolically relevant tissues and investigate the diversity of expression SNPs (eSNPs) in multiple tissues from the same individual, we collected four tissues from approximately 1000 patients undergoing Roux-en-Y gastric bypass (RYGB) and clinical traits associated with their weight loss and co-morbidities. We then performed high-throughput genotyping and gene expression profiling and carried out a genome-wide association analyses for more than 100,000 gene expression traits representing four metabolically relevant tissues: liver, omental adipose, subcutaneous adipose, and stomach. We successfully identified 24,531 eSNPs corresponding to about 10,000 distinct genes. This represents the greatest number of eSNPs identified to our knowledge by any study to date and the first study to identify eSNPs from stomach tissue. We then demonstrate how these eSNPs provide a high-quality disease map for each tissue in morbidly obese patients to not only inform genetic associations identified in this cohort, but in previously published genome-wide association studies as well. These data can aid in elucidating the key networks associated with morbid obesity, response to RYGB, and disease as a whole.
Assuntos
Mucosa Gástrica/metabolismo , Fígado/metabolismo , Obesidade Mórbida/epidemiologia , Obesidade Mórbida/genética , Adiposidade/genética , Adulto , Estudos de Coortes , Comorbidade , Bases de Dados Genéticas , Feminino , Derivação Gástrica , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Obesidade Mórbida/cirurgia , Polimorfismo de Nucleotídeo Único , Redução de PesoRESUMO
Liver cytochrome P450s (P450s) play critical roles in drug metabolism, toxicology, and metabolic processes. Despite rapid progress in the understanding of these enzymes, a systematic investigation of the full spectrum of functionality of individual P450s, the interrelationship or networks connecting them, and the genetic control of each gene/enzyme is lacking. To this end, we genotyped, expression-profiled, and measured P450 activities of 466 human liver samples and applied a systems biology approach via the integration of genetics, gene expression, and enzyme activity measurements. We found that most P450s were positively correlated among themselves and were highly correlated with known regulators as well as thousands of other genes enriched for pathways relevant to the metabolism of drugs, fatty acids, amino acids, and steroids. Genome-wide association analyses between genetic polymorphisms and P450 expression or enzyme activities revealed sets of SNPs associated with P450 traits, and suggested the existence of both cis-regulation of P450 expression (especially for CYP2D6) and more complex trans-regulation of P450 activity. Several novel SNPs associated with CYP2D6 expression and enzyme activity were validated in an independent human cohort. By constructing a weighted coexpression network and a Bayesian regulatory network, we defined the human liver transcriptional network structure, uncovered subnetworks representative of the P450 regulatory system, and identified novel candidate regulatory genes, namely, EHHADH, SLC10A1, and AKR1D1. The P450 subnetworks were then validated using gene signatures responsive to ligands of known P450 regulators in mouse and rat. This systematic survey provides a comprehensive view of the functionality, genetic control, and interactions of P450s.
Assuntos
Sistema Enzimático do Citocromo P-450/genética , Sistema Enzimático do Citocromo P-450/metabolismo , Regulação Enzimológica da Expressão Gênica , Genômica , Fígado/enzimologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Criança , Pré-Escolar , Feminino , Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Lactente , Recém-Nascido , Masculino , Camundongos , Pessoa de Meia-Idade , Preparações Farmacêuticas/metabolismo , Polimorfismo de Nucleotídeo Único , Ratos , Biologia de Sistemas , Transcrição Gênica , Adulto JovemRESUMO
Genome-wide association studies (GWAS) may be biased by population stratification (PS). We conducted empirical quantification of the magnitude of PS among human populations and its impact on GWAS. Liver tissues were collected from 979, 59 and 49 Caucasian Americans (CA), African Americans (AA) and Hispanic Americans (HA), respectively, and genotyped using Illumina650Y (Ilmn650Y) arrays. RNA was also isolated and hybridized to Agilent whole-genome gene expression arrays. We propose a new method (i.e., hgdp-eigen) for detecting PS by projecting genotype vectors for each sample to the eigenvector space defined by the Human Genetic Diversity Panel (HGDP). Further, we conducted GWAS to map expression quantitative trait loci (eQTL) for the approximately 40,000 liver gene expression traits monitored by the Agilent arrays. HGDP-eigen performed similarly to the conventional self-eigen methods in capturing PS. However, leveraging the HGDP offered a significant advantage in revealing the origins, directions and magnitude of PS. Adjusting for eigenvectors had minor impacts on eQTL detection rates in CA. In contrast, for AA and HA, adjustment dramatically reduced association findings. At an FDR = 10%, we identified 65 eQTLs in AA with the unadjusted analysis, but only 18 eQTLs after the eigenvector adjustment. Strikingly, 55 out of the 65 unadjusted AA eQTLs were validated in CA, indicating that the adjustment procedure significantly reduced GWAS power. A number of the 55 AA eQTLs validated in CA overlapped with published disease associated SNPs. For example, rs646776 and rs10903129 have previously been associated with lipid levels and coronary heart disease risk, however, the rs10903129 eQTL was missed in the eigenvector adjusted analysis.
Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Genoma Humano , Humanos , Fígado/metabolismo , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
BACKGROUND: Although high-throughput genotyping arrays have made whole-genome association studies (WGAS) feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. METHODS: 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y) arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on approximately 40,000 liver mRNA expression traits (eQTL analysis). In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. RESULTS: MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers) was not as successful. It was more challenging to impute genotypes in the African American population, given (1) shorter LD blocks and (2) admixture with Caucasian populations in this population. To address issue (2), we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis-eQTL discoveries detected by various methods can be interpreted as their relative statistical power in the GWAS. In this study, we find that imputation offer modest additional power (by 4%) on top of either Ilmn317K or Ilmn650Y, much less than the power gain from Ilmn317K to Ilmn650Y (13%). CONCLUSION: Current algorithms can accurately impute genotypes for untyped markers, which enables researchers to pool data between studies conducted using different SNP sets. While genotyping itself results in a small error rate (e.g. 0.5%), imputing genotypes is surprisingly accurate. We found that dense marker sets (e.g. Ilmn650Y) outperform sparser ones (e.g. Ilmn317K) in terms of imputation yield and accuracy. We also noticed it was harder to impute genotypes for African American samples, partially due to population admixture, although using a pooled reference boosts performance. Interestingly, GWAS carried out using imputed genotypes only slightly increased power on top of assayed SNPs. The reason is likely due to adding more markers via imputation only results in modest gain in genetic coverage, but worsens the multiple testing penalties. Furthermore, cis-eQTL mapping using dense SNP set derived from imputation achieves great resolution, and locate associate peak closer to causal variants than conventional approach.
Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Genótipo , Polimorfismo de Nucleotídeo Único , Negro ou Afro-Americano/genética , Algoritmos , Mapeamento Cromossômico/métodos , Marcadores Genéticos , Humanos , Fígado/metabolismo , Modelos Estatísticos , RNA Mensageiro/metabolismo , Sensibilidade e Especificidade , População Branca/genéticaRESUMO
To investigate the genetic architecture of severe obesity, we performed a genome-wide association study of 775 cases and 3197 unascertained controls at approximately 550,000 markers across the autosomal genome. We found convincing association to the previously described locus including the FTO gene. We also found evidence of association at a further six of 12 other loci previously reported to influence body mass index (BMI) in the general population and one of three associations to severe childhood and adult obesity and that cases have a higher proportion of risk-conferring alleles than controls. We found no evidence of homozygosity at any locus due to identity-by-descent associating with phenotype which would be indicative of rare, penetrant alleles, nor was there excess genome-wide homozygosity in cases relative to controls. Our results suggest that variants influencing BMI also contribute to severe obesity, a condition at the extreme of the phenotypic spectrum rather than a distinct condition.
Assuntos
Índice de Massa Corporal , Obesidade/genética , Polimorfismo de Nucleotídeo Único , Adolescente , Adulto , Idoso , Estudos de Coortes , Feminino , Marcadores Genéticos , Humanos , Masculino , Pessoa de Meia-Idade , Obesidade/fisiopatologia , Fenótipo , Fatores de RiscoRESUMO
Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process.
Assuntos
Perfilação da Expressão Gênica , Predisposição Genética para Doença/genética , Fígado/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Transcrição Gênica/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Criança , Pré-Escolar , LDL-Colesterol/sangue , LDL-Colesterol/genética , Doença da Artéria Coronariana/genética , Diabetes Mellitus Tipo 1/genética , Feminino , Genes MHC da Classe II/genética , Genoma Humano , Genótipo , Humanos , Lactente , Masculino , Camundongos , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Locos de Características Quantitativas/genética , RNA Mensageiro/análise , RNA Mensageiro/genéticaRESUMO
The recent development of whole genome association studies has lead to the robust identification of several loci involved in different common human diseases. Interestingly, some of the strongest signals of association observed in these studies arise from non-coding regions located in very large introns or far away from any annotated genes, raising the possibility that these regions are involved in the etiology of the disease through some unidentified regulatory mechanisms. These findings highlight the importance of better understanding the mechanisms leading to inter-individual differences in gene expression in humans. Most of the existing approaches developed to identify common regulatory polymorphisms are based on linkage/association mapping of gene expression to genotypes. However, these methods have some limitations, notably their cost and the requirement of extensive genotyping information from all the individuals studied which limits their applications to a specific cohort or tissue. Here we describe a robust and high-throughput method to directly measure differences in allelic expression for a large number of genes using the Illumina Allele-Specific Expression BeadArray platform and quantitative sequencing of RT-PCR products. We show that this approach allows reliable identification of differences in the relative expression of the two alleles larger than 1.5-fold (i.e., deviations of the allelic ratio larger than 60:40) and offers several advantages over the mapping of total gene expression, particularly for studying humans or outbred populations. Our analysis of more than 80 individuals for 2,968 SNPs located in 1,380 genes confirms that differential allelic expression is a widespread phenomenon affecting the expression of 20% of human genes and shows that our method successfully captures expression differences resulting from both genetic and epigenetic cis-acting mechanisms.
Assuntos
Epigênese Genética , Regulação da Expressão Gênica , Genoma Humano , Alelos , Desequilíbrio Alélico , Teste de Complementação Genética , Humanos , Íntrons , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Predicting prognosis in prostate carcinoma remains a challenge when using clinical and pathologic criteria only. We used an array-based DASL assay to identify molecular signatures for predicting prostate cancer relapse in formalin-fixed, paraffin-embedded (FFPE) prostate cancers, through gene expression profiling of 512 prioritized genes. Of the 71 patients that we analyzed, all but 3 had no evidence of residual tumor (defined as negative surgical margins) following radical prostatectomy and no patient received adjuvant therapy following surgery. All of the 71 patients had an undetectable serum PSA following radical prostatectomy. Follow-up period was 44+/-15 months. Highly reproducible gene expression patterns were obtained with these samples (average R(2)=0.99). We identified a panel of 11 genes that correlated positively and 5 genes that correlated negatively with Gleason grade. A gene expression score (GEX) was derived from the expression levels of the 16 genes. We assessed the prognostic value of these genes and found the GEX significantly correlated with disease relapse (p=0.007). These results suggest that the approach we used is effective for expression profiling in heterogeneous FFPE tissues for cancer diagnosis/prognosis biomarker discovery and validation.
Assuntos
Neoplasias da Próstata/genética , Idoso , Idoso de 80 Anos ou mais , Formaldeído , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Inclusão em Parafina , Prognóstico , Prostatectomia , Neoplasias da Próstata/patologia , Neoplasias da Próstata/cirurgia , Recidiva , Fatores de Risco , Fixação de TecidosRESUMO
We have assessed the utility of RNA titration samples for evaluating microarray platform performance and the impact of different normalization methods on the results obtained. As part of the MicroArray Quality Control project, we investigated the performance of five commercial microarray platforms using two independent RNA samples and two titration mixtures of these samples. Focusing on 12,091 genes common across all platforms, we determined the ability of each platform to detect the correct titration response across the samples. Global deviations from the response predicted by the titration ratios were observed. These differences could be explained by variations in relative amounts of messenger RNA as a fraction of total RNA between the two independent samples. Overall, both the qualitative and quantitative correspondence across platforms was high. In summary, titration samples may be regarded as a valuable tool, not only for assessing microarray platform performance and different analysis methods, but also for determining some underlying biological features of the samples.
Assuntos
Análise de Falha de Equipamento/métodos , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/normas , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/normas , RNA/análise , RNA/genética , Algoritmos , Valores de Referência , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estados UnidosRESUMO
Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.
Assuntos
Perfilação da Expressão Gênica/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Garantia da Qualidade dos Cuidados de Saúde/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Perfilação da Expressão Gênica/métodos , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estados UnidosRESUMO
Human embryonic stem (hES) cells originate during an embryonic period of active epigenetic remodeling. DNA methylation patterns are likely to be critical for their self-renewal and pluripotence. We compared the DNA methylation status of 1536 CpG sites (from 371 genes) in 14 independently isolated hES cell lines with five other cell types: 24 cancer cell lines, four adult stem cell populations, four lymphoblastoid cell lines, five normal human tissues, and an embryonal carcinoma cell line. We found that the DNA methylation profile clearly distinguished the hES cells from all of the other cell types. A subset of 49 CpG sites from 40 genes contributed most to the differences among cell types. Another set of 25 sites from 23 genes distinguished hES cells from normal differentiated cells and can be used as biomarkers to monitor differentiation. Our results indicate that hES cells have a unique epigenetic signature that may contribute to their developmental potential.
Assuntos
Metilação de DNA , Embrião de Mamíferos/citologia , Epigênese Genética , Células-Tronco/metabolismo , Diferenciação Celular , Linhagem Celular , Linhagem Celular Tumoral , Linhagem da Célula , Análise por Conglomerados , Feminino , Humanos , Masculino , Células-Tronco Pluripotentes/citologia , Células-Tronco Pluripotentes/metabolismo , Células-Tronco/citologiaRESUMO
BACKGROUND: In order to compare the gene expression profiles of human embryonic stem cell (hESC) lines and their differentiated progeny and to monitor feeder contaminations, we have examined gene expression in seven hESC lines and human fibroblast feeder cells using Illumina bead arrays that contain probes for 24,131 transcript probes. RESULTS: A total of 48 different samples (including duplicates) grown in multiple laboratories under different conditions were analyzed and pairwise comparisons were performed in all groups. Hierarchical clustering showed that blinded duplicates were correctly identified as the closest related samples. hESC lines clustered together irrespective of the laboratory in which they were maintained. hESCs could be readily distinguished from embryoid bodies (EB) differentiated from them and the karyotypically abnormal hESC line BG01V. The embryonal carcinoma (EC) line NTera2 is a useful model for evaluating characteristics of hESCs. Expression of subsets of individual genes was validated by comparing with published databases, MPSS (Massively Parallel Signature Sequencing) libraries, and parallel analysis by microarray and RT-PCR. CONCLUSION: we show that Illumina's bead array platform is a reliable, reproducible and robust method for developing base global profiles of cells and identifying similarities and differences in large number of samples.
Assuntos
Carcinoma Embrionário/patologia , Linhagem Celular , Genoma Humano , Células-Tronco , Pesquisas com Embriões/legislação & jurisprudência , Embrião de Mamíferos/citologia , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Regulamentação Governamental , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Estados UnidosRESUMO
We have developed a high-throughput method for analyzing the methylation status of hundreds of preselected genes simultaneously and have applied it to the discovery of methylation signatures that distinguish normal from cancer tissue samples. Through an adaptation of the GoldenGate genotyping assay implemented on a BeadArray platform, the methylation state of 1536 specific CpG sites in 371 genes (one to nine CpG sites per gene) was measured in a single reaction by multiplexed genotyping of 200 ng of bisulfite-treated genomic DNA. The assay was used to obtain a quantitative measure of the methylation level at each CpG site. After validating the assay in cell lines and normal tissues, we analyzed a panel of lung cancer biopsy samples (N = 22) and identified a panel of methylation markers that distinguished lung adenocarcinomas from normal lung tissues with high specificity. These markers were validated in a second sample set (N = 24). These results demonstrate the effectiveness of the method for reliably profiling many CpG sites in parallel for the discovery of informative methylation markers. The technology should prove useful for DNA methylation analyses in large populations, with potential application to the classification and diagnosis of a broad range of cancers and other diseases.