RESUMEN
Gross Chromosomal Rearrangements (GCRs) play an important role in human diseases, including cancer. Although most of the nonessential Genome Instability Suppressing (GIS) genes in Saccharomyces cerevisiae are known, the essential genes in which mutations can cause increased GCR rates are not well understood. Here 2 S. cerevisiae GCR assays were used to screen a targeted collection of temperature-sensitive mutants to identify mutations that caused increased GCR rates. This identified 94 essential GIS (eGIS) genes in which mutations cause increased GCR rates and 38 candidate eGIS genes that encode eGIS1 protein-interacting or family member proteins. Analysis of TCGA data using the human genes predicted to encode the proteins and protein complexes implicated by the S. cerevisiae eGIS genes revealed a significant enrichment of mutations affecting predicted human eGIS genes in 10 of the 16 cancers analyzed.
Asunto(s)
Genes Supresores , Genoma Fúngico , Inestabilidad Genómica , Neoplasias/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Proteínas Supresoras de Tumor/genética , Daño del ADN , Humanos , Mutación , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas Supresoras de Tumor/metabolismoRESUMEN
The era of whole-genome sequencing has revealed that gene copy-number changes caused by duplication and deletion events have important evolutionary, functional, and phenotypic consequences. Recent studies have therefore focused on revealing the extent of variation in copy-number within natural populations of humans and other species. These studies have found a large number of copy-number variants (CNVs) in humans, many of which have been shown to have clinical or evolutionary importance. For the most part, these studies have failed to detect an important class of gene copy-number polymorphism: gene duplications caused by retrotransposition, which result in a new intron-less copy of the parental gene being inserted into a random location in the genome. Here we describe a computational approach leveraging next-generation sequence data to detect gene copy-number variants caused by retrotransposition (retroCNVs), and we report the first genome-wide analysis of these variants in humans. We find that retroCNVs account for a substantial fraction of gene copy-number differences between any two individuals. Moreover, we show that these variants may often result in expressed chimeric transcripts, underscoring their potential for the evolution of novel gene functions. By locating the insertion sites of these duplicates, we are able to show that retroCNVs have had an important role in recent human adaptation, and we also uncover evidence that positive selection may currently be driving multiple retroCNVs toward fixation. Together these findings imply that retroCNVs are an especially important class of polymorphism, and that future studies of copy-number variation should search for these variants in order to illuminate their potential evolutionary and functional relevance.
Asunto(s)
Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Duplicación de Gen , Retroelementos/genética , Secuencia de Bases , Evolución Biológica , Mapeo Cromosómico , Humanos , Intrones , Fenotipo , Análisis de Secuencia de ADN , Eliminación de SecuenciaRESUMEN
BACKGROUND: Differences in gene expression have a significant role in the diversity of phenotypes in humans. Here we integrated human public data from ENCODE, 1000 Genomes and Geuvadis to explore the populational landscape of INDELs affecting transcription factor-binding sites (TFBS). A significant fraction of TFBS close to the transcription start site of known genes is affected by INDELs with a consequent effect at the expression of the associated gene. RESULTS: Hundreds of TFBS-affecting INDELs (TFBS-ID) show a differential frequency between human populations, suggesting a role of natural selection in the spread of such variant INDELs. A comparison with a dataset of known human genomic regions under natural selection allowed us to identify several cases of TFBS-ID likely involved in populational adaptations. Ontology analyses on the differential TFBS-ID further indicated several biological processes under natural selection in different populations. CONCLUSION: Together, our results strongly suggest that INDELs have an important role in modulating gene expression patterns in humans. The dataset we make available, together with other data reporting variability at both regulatory and coding regions of genes, represent a powerful tool for studies aiming to better understand the evolution of gene regulatory networks in humans.
Asunto(s)
Sitios de Unión/genética , Genoma Humano , Mutación INDEL/genética , Factores de Transcripción/genética , Mapeo Cromosómico , Humanos , Regiones Promotoras Genéticas , Unión Proteica , Sitio de Iniciación de la TranscripciónRESUMEN
Domains can spread among proteins in a process called domain shuffling and this has been identified as one of the major mechanisms leading to the formation of new proteins throughout evolution. This process has an impact on the topology of protein-protein interaction networks as it may create new hubs and also increase interconnectivity.
Asunto(s)
Evolución Molecular , Mapeo de Interacción de Proteínas , Proteínas/química , Animales , Humanos , Intrones , Biosíntesis de Proteínas , Mapas de Interacción de Proteínas , Especificidad de la Especie , Biología de SistemasRESUMEN
Despite evidence that at the interspecific scale, exonic splicing silencers (ESSs) are under negative selection in constitutive exons, little is known about the effects of slightly deleterious polymorphisms on these splicing regulators. Through the application of a modified version of the McDonald-Kreitman test, we compared the normalized proportions of human polymorphisms and human/rhesus substitutions affecting exonic splicing regulators (ESRs) on sequences of constitutive and alternative exons. Our results show a depletion of substitutions and an enrichment of SNPs associated with ESS gain in constitutive exons. Moreover, we show that this evolutionary pattern is also present in a set of ESRs previously involved in the transition from constitutive to skipped exons in the mammalian lineage. The similarity between these two sets of ESRs suggests that the transition from constitutive to skipped exons in mammals is more frequently associated with the inhibition than with the promotion of splicing signals. This is in accordance with the hypothesis of a constitutive origin of exon skipping and corroborates previous findings about the antagonistic role of certain exonic splicing enhancers.
Asunto(s)
Evolución Biológica , Exones , Polimorfismo de Nucleótido Simple , Empalme del ARN , Secuencias Reguladoras de Ácidos Nucleicos , Selección Genética , Animales , Elementos de Facilitación Genéticos , Humanos , Mamíferos/genética , Modelos GenéticosRESUMEN
With the availability of a large amount of genomic data it is expected that the influence of single nucleotide variations (SNVs) in many biological phenomena will be elucidated. Here, we approached the problem of how SNVs affect alternative splicing. First, we observed that SNVs and exonic splicing regulators (ESRs) independently show a biased distribution in alternative exons. More importantly, SNVs map more frequently in ESRs located in alternative exons than in ESRs located in constitutive exons. By looking at SNVs associated with alternative exon/intron borders (by their common presence in the same cDNA molecule), we observed that a specific type of ESR, the exonic splicing silencers (ESSs), are more frequently modified by SNVs. Our results establish a clear association between genetic diversity and alternative splicing involving ESSs.
Asunto(s)
Empalme Alternativo , Exones , Polimorfismo de Nucleótido Simple , Secuencias Reguladoras de Ácido Ribonucleico , Humanos , IntronesRESUMEN
Although patterns of somatic alterations have been reported for tumor genomes, little is known on how they compare with alterations present in non-tumor genomes. A comparison of the two would be crucial to better characterize the genetic alterations driving tumorigenesis. We sequenced the genomes of a lymphoblastoid (HCC1954BL) and a breast tumor (HCC1954) cell line derived from the same patient and compared the somatic alterations present in both. The lymphoblastoid genome presents a comparable number and similar spectrum of nucleotide substitutions to that found in the tumor genome. However, a significant difference in the ratio of non-synonymous to synonymous substitutions was observed between both genomes (P = 0.031). Protein-protein interaction analysis revealed that mutations in the tumor genome preferentially affect hub-genes (P = 0.0017) and are co-selected to present synergistic functions (P < 0.0001). KEGG analysis showed that in the tumor genome most mutated genes were organized into signaling pathways related to tumorigenesis. No such organization or synergy was observed in the lymphoblastoid genome. Our results indicate that endogenous mutagens and replication errors can generate the overall number of mutations required to drive tumorigenesis and that it is the combination rather than the frequency of mutations that is crucial to complete tumorigenic transformation.
Asunto(s)
Neoplasias de la Mama/genética , Variación Genética , Genoma Humano , Línea Celular Transformada , Línea Celular Tumoral , Aberraciones Cromosómicas , Femenino , Humanos , Linfocitos , Persona de Mediana Edad , Mutación , Mutación Puntual , Mapeo de Interacción de Proteínas , Análisis de Secuencia de ADNRESUMEN
Exon shuffling has been characterized as one of the major evolutionary forces shaping both the genome and the proteome of eukaryotes. This mechanism was particularly important in the creation of multidomain proteins during animal evolution, bringing a number of functional genetic novelties. Here, genome information from a variety of eukaryotic species was used to address several issues related to the evolutionary history of exon shuffling. By comparing all protein sequences within each species, we were able to characterize exon shuffling signatures throughout metazoans. Intron phase (the position of the intron regarding the codon) and exon symmetry (the pattern of flanking introns for a given exon or block of adjacent exons) were features used to evaluate exon shuffling. We confirmed previous observations that exon shuffling mediated by phase 1 introns (1-1 exon shuffling) is the predominant kind in multicellular animals. Evidence is provided that such pattern was achieved since the early steps of animal evolution, supported by a detectable presence of 1-1 shuffling units in Trichoplax adhaerens and a considerable prevalence of them in Nematostella vectensis. In contrast, Monosiga brevicollis, one of the closest relatives of metazoans, and Arabidopsis thaliana, showed no evidence of 1-1 exon or domain shuffling above what it would be expected by chance. Instead, exon shuffling events are less abundant and predominantly mediated by phase 0 introns (0-0 exon shuffling) in those non-metazoan species. Moreover, an intermediate pattern of 1-1 and 0-0 exon shuffling was observed for the placozoan T. adhaerens, a primitive animal. Finally, characterization of flanking intron phases around domain borders allowed us to identify a common set of symmetric 1-1 domains that have been shuffled throughout the metazoan lineage.
Asunto(s)
Evolución Molecular , Exones , Recombinación Genética , Animales , Análisis por Conglomerados , Biología Computacional/métodos , Humanos , Intrones , Sistemas de Lectura Abierta/genética , Plantas/genética , Dominios y Motivos de Interacción de Proteínas/genéticaRESUMEN
Understanding alternative splicing is crucial to elucidate the mechanisms behind several biological phenomena, including diseases. The huge amount of expressed sequences available nowadays represents an opportunity and a challenge to catalog and display alternative splicing events (ASEs). Although several groups have faced this challenge with relative success, we still lack a computational tool that uses a simple and straightforward method to retrieve, name and present ASEs. Here we present SPLOOCE, a portal for the analysis of human splicing variants. SPLOOCE uses a method based on regular expressions for retrieval of ASEs. We propose a simple syntax that is able to capture the complexity of ASEs.
Asunto(s)
Empalme Alternativo , Biología Computacional , Bases de Datos de Ácidos Nucleicos , Sitios de Empalme de ARN , Humanos , Internet , Análisis de Secuencia por Matrices de OligonucleótidosRESUMEN
We have identified new genomic alterations in the breast cancer cell line HCC1954, using high-throughput transcriptome sequencing. With 120 Mb of cDNA sequences, we were able to identify genomic rearrangement events leading to fusions or truncations of genes including MRE11 and NSD1, genes already implicated in oncogenesis, and 7 rearrangements involving other additional genes. This approach demonstrates that high-throughput transcriptome sequencing is an effective strategy for the characterization of genomic rearrangements in cancers.
Asunto(s)
Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , Reordenamiento Génico , Genoma Humano/genética , Secuencia de Bases , Proteínas Portadoras/genética , Línea Celular Tumoral , ADN Complementario , Proteínas de Unión al ADN/genética , Femenino , N-Metiltransferasa de Histona-Lisina , Humanos , Proteína Homóloga de MRE11 , Proteínas de Neoplasias/genética , Proteínas Nucleares/genéticaRESUMEN
Genetic predisposition accounts for nearly 10% of all melanoma cases and has been associated with a dozen moderate- to high-penetrance genes, including CDKN2A, CDK4, POT1 and BAP1. However, in most melanoma-prone families, the genetic etiology of cancer predisposition remains undetermined. The goal of this study was to identify rare genomic variants associated with cutaneous melanoma susceptibility in melanoma-prone families. Whole-exome sequencing was performed in 2 affected individuals of 5 melanoma-prone families negative for mutations in CDKN2A and CDK4, the major cutaneous melanoma risk genes. A total of 288 rare coding variants shared by the affected relatives of each family were identified, including 7 loss-of-function variants. By performing in silico analyses of gene function, biological pathways, and variant pathogenicity prediction, we underscored the putative role of several genes for melanoma risk, including previously described genes such as MYO7A and WRN, as well as new putative candidates, such as SERPINB4, HRNR, and NOP10. In conclusion, our data revealed rare germline variants in melanoma-prone families contributing with a novel set of potential candidate genes to be further investigated in future studies.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Melanoma/genética , Mutación/genética , Neoplasias Cutáneas/genética , Adolescente , Adulto , Anciano , Brasil , Femenino , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Linaje , Penetrancia , Secuenciación del Exoma/métodos , Melanoma Cutáneo MalignoRESUMEN
Extended phenotypes are manifestations of genes that occur outside of the organism that possess those genes. In spite of their widespread occurrence, the role of extended phenotypes in evolutionary biology is still a matter of debate. Here, we explore the indirect effects of extended phenotypes, especially their shared use, in the fitness of simulated individuals and populations. A computer simulation platform was developed in which different populations were compared regarding their ability to produce, use, and share extended phenotypes. Our results show that populations that produce and share extended phenotypes outrun populations that only produce them. A specific parameter in the simulations, a bonus for sharing extended phenotypes among conspecifics, has a more significant impact in defining which population will prevail. All these findings strongly support the view, postulated by the extended fitness hypothesis (EFH) that extended phenotypes play a significant role at the population level and their shared use increases population fitness. Our simulation platform is available at https://github.com/guilherme-araujo/gsop-dist.
RESUMEN
BACKGROUND: Physical protein-protein interaction (PPI) is a critical phenomenon for the function of most proteins in living organisms and a significant fraction of PPIs are the result of domain-domain interactions. Exon shuffling, intron-mediated recombination of exons from existing genes, is known to have been a major mechanism of domain shuffling in metazoans. Thus, we hypothesized that exon shuffling could have a significant influence in shaping the topology of PPI networks. RESULTS: We tested our hypothesis by compiling exon shuffling and PPI data from six eukaryotic species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Cryptococcus neoformans and Arabidopsis thaliana. For all four metazoan species, genes enriched in exon shuffling events presented on average higher vertex degree (number of interacting partners) in PPI networks. Furthermore, we verified that a set of protein domains that are simultaneously promiscuous (known to interact to multiple types of other domains), self-interacting (able to interact with another copy of themselves) and abundant in the genomes presents a stronger signal for exon shuffling. CONCLUSIONS: Exon shuffling appears to have been a recurrent mechanism for the emergence of new PPIs along metazoan evolution. In metazoan genomes, exon shuffling also promoted the expansion of some protein domains. We speculate that their promiscuous and self-interacting properties may have been decisive for that expansion.
Asunto(s)
Evolución Molecular , Exones/genética , Unión Proteica/genética , Estructura Terciaria de Proteína/genética , Proteínas/metabolismo , Recombinación Genética/genética , Precursor de Proteína beta-Amiloide/genética , Animales , Humanos , Mapeo de Interacción de Proteínas , Isoformas de Proteínas/genéticaRESUMEN
BACKGROUND: Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human diseases. Cancer-associated transcripts are potential molecular markers and may contribute to the development of more accurate diagnostic and prognostic methods and also serve as therapeutic targets. Alternative splicing-enriched cDNA libraries have been used to explore the variability generated by alternative splicing. In this study, by combining the use of trapping heteroduplexes and RNA amplification, we developed a powerful approach that enables transcriptome-wide exploration of the AS repertoire for identifying AS variants associated with breast tumor cells modulated by ERBB2 (HER-2/neu) oncogene expression. RESULTS: The human breast cell line (C5.2) and a pool of 5 ERBB2 over-expressing breast tumor samples were used independently for the construction of two AS-enriched libraries. In total, 2,048 partial cDNA sequences were obtained, revealing 214 alternative splicing sequence-enriched tags (ASSETs). A subset with 79 multiple exon ASSETs was compared to public databases and reported 138 different AS events. A high success rate of RT-PCR validation (94.5%) was obtained, and 2 novel AS events were identified. The influence of ERBB2-mediated expression on AS regulation was evaluated by capillary electrophoresis and probe-ligation approaches in two mammary cell lines (Hb4a and C5.2) expressing different levels of ERBB2. The relative expression balance between AS variants from 3 genes was differentially modulated by ERBB2 in this model system. CONCLUSIONS: In this study, we presented a method for exploring AS from any RNA source in a transcriptome-wide format, which can be directly easily adapted to next generation sequencers. We identified AS transcripts that were differently modulated by ERBB2-mediated expression and that can be tested as molecular markers for breast cancer. Such a methodology will be useful for completely deciphering the cancer cell transcriptome diversity resulting from AS and for finding more precise molecular markers.
Asunto(s)
Empalme Alternativo/genética , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica , Biblioteca de Genes , Variación Genética , Receptor ErbB-2/metabolismo , Línea Celular Tumoral , Clonación Molecular , Biología Computacional , Femenino , Humanos , Oligonucleótidos/genética , Receptor ErbB-2/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Análisis de Secuencia de ADNRESUMEN
Cancer/testis Antigens (CTAs) are immunogenic proteins with a restricted expression pattern in normal tissues and aberrant expression in different types of tumors being considered promising candidates for immunotherapy. We used the alignment between EST sequences and the human genome sequence to identify novel CT genes. By examining the EST tissue composition of known CT clusters we defined parameters for the selection of 1184 EST clusters corresponding to putative CT genes. The expression pattern of 70 CT gene candidates was evaluated by RT-PCR in 21 normal tissues, 17 tumor cell lines and 160 primary tumors. We were able to identify 4 CT genes expressed in different types of tumors. The presence of antibodies against the protein encoded by 1 of these 4 CT genes (FAM46D) was exclusively detected in plasma samples from cancer patients. Due to its restricted expression pattern and immunogenicity FAM46D represents a novel target for cancer immunotherapy.
Asunto(s)
Antígenos de Neoplasias/inmunología , Etiquetas de Secuencia Expresada , Proteínas de Neoplasias/inmunología , Neoplasias/sangre , Antígenos de Neoplasias/genética , Estudios de Casos y Controles , Bases de Datos de Ácidos Nucleicos , Genoma Humano , Humanos , Masculino , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias/patología , Nucleotidiltransferasas , Proteínas Recombinantes/genética , Proteínas Recombinantes/inmunología , Testículo/inmunología , Células Tumorales CultivadasRESUMEN
Studies on the peopling of South America have been limited by the paucity of sequence data from Native Americans, especially from the east part of the Amazon region. Here, we investigate the whole exome variation from 58 Native American individuals (eight different populations) from the Amazon region and draw insights into the peopling of South America. By using the sequence data generated here together with data from the public domain, we confirmed a strong genetic distinction between Andean and Amazonian populations. By testing distinct demographic models, our analysis supports a scenario of South America occupation that involves migrations along the Pacific and Atlantic coasts. Occupation of the southeast part of South America would involve migrations from the north, rather than from the west of the continent.
RESUMEN
BACKGROUND: Cancer neoantigens have attracted great interest in immunotherapy due to their capacity to elicit antitumoral responses. These molecules arise from somatic mutations in cancer cells, resulting in alterations on the original protein. Neoantigens identification remains a challenging task due largely to a high rate of false-positives. RESULTS: We have developed an efficient and automated pipeline for the identification of potential neoantigens. neoANT-HILL integrates several immunogenomic analyses to improve neoantigen detection from Next Generation Sequence (NGS) data. The pipeline has been compiled in a pre-built Docker image such that minimal computational background is required for download and setup. NeoANT-HILL was applied in The Cancer Genome Atlas (TCGA) melanoma dataset and found several putative neoantigens including ones derived from the recurrent RAC1:P29S and SERPINB3:E250K mutations. neoANT-HILL was also used to identify potential neoantigens in RNA-Seq data with a high sensitivity and specificity. CONCLUSION: neoANT-HILL is a user-friendly tool with a graphical interface that performs neoantigens prediction efficiently. neoANT-HILL is able to process multiple samples, provides several binding predictors, enables quantification of tumor-infiltrating immune cells and considers RNA-Seq data for identifying potential neoantigens. The software is available through github at https://github.com/neoanthill/neoANT-HILL.
Asunto(s)
Antígenos de Neoplasias , Bases de Datos Genéticas , Melanoma , RNA-Seq , Programas Informáticos , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/inmunología , Humanos , Melanoma/genética , Melanoma/inmunologíaRESUMEN
Tumor DNA has been detected in body fluids of cancer patients. Somatic tumor mutations are being used as biomarkers in body fluids to monitor chemotherapy response as a minimally invasive tool. In this study, we evaluated the potential of tracking somatic mutations in free DNA of plasma and urine collected from Wilms tumor (WT) patients for monitoring treatment response. Wilms tumor is a pediatric renal tumor resulting from cell differentiation errors during nephrogenesis. Its mutational repertoire is not completely defined. Thus, for identifying somatic mutations from tumor tissue DNA, we screened matched tumor/leukocyte DNAs using either a panel containing 16 WT-associated genes or whole-exome sequencing (WES). The identified somatic tumor mutations were tracked in urine and plasma DNA collected before, during and after treatment. At least one somatic mutation was identified in five out of six WT tissue samples analyzed. Somatic mutations were detected in body fluids before treatment in all five patients (three patients in urine, three in plasma, and one in both body fluids). In all patients, a decrease of the variant allele fraction of somatic mutations was observed in body fluids during neoadjuvant chemotherapy. Interestingly, the persistence of somatic mutations in body fluids was in accordance with clinical parameters. For one patient who progressed to death, it persisted in high levels in serial body fluid samples during treatment. For three patients without disease progression, somatic mutations were not consistently detected in samples throughout monitoring. For one patient with bilateral disease, a somatic mutation was detected at low levels with no support of clinical manifestation. Our results demonstrated the potential of tracking somatic mutations in urine and plasma DNA as a minimally invasive tool for monitoring WT patients. Additional investigation is needed to check the clinical value of insistent somatic mutations in body fluids.
Asunto(s)
ADN de Neoplasias/genética , Neoplasias Renales/genética , Mutación , Tumor de Wilms/genética , Alelos , Quimioterapia Adyuvante , Preescolar , ADN de Neoplasias/sangre , ADN de Neoplasias/orina , Femenino , Humanos , Lactante , Neoplasias Renales/sangre , Neoplasias Renales/tratamiento farmacológico , Neoplasias Renales/orina , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/secundario , Terapia Neoadyuvante , Secuenciación del Exoma , Tumor de Wilms/sangre , Tumor de Wilms/tratamiento farmacológico , Tumor de Wilms/orinaRESUMEN
Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the 'curse of dimensionality' many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.
Asunto(s)
Neoplasias de la Mama/genética , Regulación Neoplásica de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Progresión de la Enfermedad , Femenino , Perfilación de la Expresión Génica/métodos , Humanos , Modelos Logísticos , Células MCF-7 , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Factores de Transcripción/genéticaRESUMEN
BACKGROUND: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. RESULTS: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. CONCLUSION: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/. S3T source code and datasets can also be downloaded from the aforementioned website.