Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Front Genet ; 13: 769919, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35571023

RESUMEN

Genomics policy development involves assessing a wide range of issues extending from specimen collection and data sharing to whether and how to utilize advanced technologies in clinical practice and public health initiatives. A survey was conducted among African scientists and stakeholders with an interest in genomic medicine, seeking to evaluate: 1) Their knowledge and understanding of the field. 2) The institutional environment and infrastructure available to them. 3) The state and awareness of the field in their country. 4) Their perception of potential barriers to implementation of precision medicine. We discuss how the information gathered in the survey could instruct the policies of African institutions seeking to implement precision, and more specifically, genomic medicine approaches in their health care systems in the following areas: 1) Prioritization of infrastructures. 2) Need for translational research. 3) Information dissemination to potential users. 4) Training programs for specialized personnel. 5) Engaging political stakeholders and the public. A checklist with key requirements to assess readiness for implementation of genomic medicine programs is provided to guide the process from scientific discovery to clinical application.

3.
PLoS Biol ; 18(1): e3000583, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31971940

RESUMEN

We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.


Asunto(s)
Algoritmos , Nube Computacional , Minería de Datos/métodos , Genómica/métodos , Programas Informáticos , Análisis por Conglomerados , Biología Computacional/métodos , Análisis de Datos , Conjuntos de Datos como Asunto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Conocimiento , Aprendizaje Automático , Metabolómica/métodos
4.
BMC Bioinformatics ; 19(1): 457, 2018 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-30486782

RESUMEN

BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , África , Humanos , Reproducibilidad de los Resultados
5.
AAS Open Res ; 1: 9, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-32382696

RESUMEN

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

6.
PLoS Comput Biol ; 13(6): e1005419, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28570565

RESUMEN

The H3ABioNet pan-African bioinformatics network, which is funded to support the Human Heredity and Health in Africa (H3Africa) program, has developed node-assessment exercises to gauge the ability of its participating research and service groups to analyze typical genome-wide datasets being generated by H3Africa research groups. We describe a framework for the assessment of computational genomics analysis skills, which includes standard operating procedures, training and test datasets, and a process for administering the exercise. We present the experiences of 3 research groups that have taken the exercise and the impact on their ability to manage complex projects. Finally, we discuss the reasons why many H3ABioNet nodes have declined so far to participate and potential strategies to encourage them to do so.


Asunto(s)
Población Negra/genética , Bases de Datos Genéticas , Genómica/métodos , Sistemas de Administración de Bases de Datos , Países en Desarrollo , Humanos , Nigeria , Sudáfrica
7.
Genome Res ; 26(2): 271-7, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26627985

RESUMEN

The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet.


Asunto(s)
Población Negra/genética , Promoción de la Salud , África , Biología Computacional , Sistemas de Computación , Variación Genética , Genética Médica , Genómica , Humanos
8.
BMC Genomics ; 13: 241, 2012 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-22702538

RESUMEN

BACKGROUND: Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS: Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION: Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Polimorfismo de Nucleótido Simple , Algoritmos , Estudios de Cohortes , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Distribución Normal , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Componente Principal
9.
Nat Genet ; 44(2): 133-9, 2011 Dec 25.
Artículo en Inglés | MEDLINE | ID: mdl-22197931

RESUMEN

We performed exome sequencing to detect somatic mutations in protein-coding regions in seven melanoma cell lines and donor-matched germline cells. All melanoma samples had high numbers of somatic mutations, which showed the hallmark of UV-induced DNA repair. Such a hallmark was absent in tumor sample-specific mutations in two metastases derived from the same individual. Two melanomas with non-canonical BRAF mutations harbored gain-of-function MAP2K1 and MAP2K2 (MEK1 and MEK2, respectively) mutations, resulting in constitutive ERK phosphorylation and higher resistance to MEK inhibitors. Screening a larger cohort of individuals with melanoma revealed the presence of recurring somatic MAP2K1 and MAP2K2 mutations, which occurred at an overall frequency of 8%. Furthermore, missense and nonsense somatic mutations were frequently found in three candidate melanoma genes, FAT4, LRP1B and DSC1.


Asunto(s)
Exoma/genética , MAP Quinasa Quinasa 1/genética , MAP Quinasa Quinasa 2/genética , Melanoma/genética , Proteína Quinasa 1 Activada por Mitógenos/genética , Mutación , Neoplasias Cutáneas/genética , Secuencia de Bases , Cadherinas/genética , Línea Celular Tumoral , Estudios de Cohortes , Reparación del ADN/genética , Desmocolinas , Humanos , MAP Quinasa Quinasa 1/antagonistas & inhibidores , MAP Quinasa Quinasa 2/antagonistas & inhibidores , Datos de Secuencia Molecular , Proteínas Proto-Oncogénicas B-raf/genética , Receptores de LDL/genética , Proteínas Supresoras de Tumor/genética , Rayos Ultravioleta/efectos adversos
10.
PLoS One ; 6(4): e18369, 2011 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-21494657

RESUMEN

Cancer genomes frequently contain somatic copy number alterations (SCNA) that can significantly perturb the expression level of affected genes and thus disrupt pathways controlling normal growth. In melanoma, many studies have focussed on the copy number and gene expression levels of the BRAF, PTEN and MITF genes, but little has been done to identify new genes using these parameters at the genome-wide scale. Using karyotyping, SNP and CGH arrays, and RNA-seq, we have identified SCNA affecting gene expression ('SCNA-genes') in seven human metastatic melanoma cell lines. We showed that the combination of these techniques is useful to identify candidate genes potentially involved in tumorigenesis. Since few of these alterations were recurrent across our samples, we used a protein network-guided approach to determine whether any pathways were enriched in SCNA-genes in one or more samples. From this unbiased genome-wide analysis, we identified 28 significantly enriched pathway modules. Comparison with two large, independent melanoma SCNA datasets showed less than 10% overlap at the individual gene level, but network-guided analysis revealed 66% shared pathways, including all but three of the pathways identified in our data. Frequently altered pathways included WNT, cadherin signalling, angiogenesis and melanogenesis. Additionally, our results emphasize the potential of the EPHA3 and FRS2 gene products, involved in angiogenesis and migration, as possible therapeutic targets in melanoma. Our study demonstrates the utility of network-guided approaches, for both large and small datasets, to identify pathways recurrently perturbed in cancer.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes/genética , Genes Relacionados con las Neoplasias/genética , Melanoma/genética , Melanoma/patología , Transducción de Señal/genética , Línea Celular Tumoral , Hibridación Genómica Comparativa , Bases de Datos Genéticas , Humanos , Hibridación Fluorescente in Situ , Cariotipificación , Metástasis de la Neoplasia , Polimorfismo de Nucleótido Simple/genética , Proteínas Proto-Oncogénicas c-mdm2/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo
12.
Database (Oxford) ; 2010: baq024, 2010 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-20940178

RESUMEN

Type 2 diabetes mellitus (T2DM) is a major disease affecting nearly 280 million people worldwide. Whilst the pathophysiological mechanisms leading to disease are poorly understood, dysfunction of the insulin-producing pancreatic beta-cells is key event for disease development. Monitoring the gene expression profiles of pancreatic beta-cells under several genetic or chemical perturbations has shed light on genes and pathways involved in T2DM. The EuroDia database has been established to build a unique collection of gene expression measurements performed on beta-cells of three organisms, namely human, mouse and rat. The Gene Expression Data Analysis Interface (GEDAI) has been developed to support this database. The quality of each dataset is assessed by a series of quality control procedures to detect putative hybridization outliers. The system integrates a web interface to several standard analysis functions from R/Bioconductor to identify differentially expressed genes and pathways. It also allows the combination of multiple experiments performed on different array platforms of the same technology. The design of this system enables each user to rapidly design a custom analysis pipeline and thus produce their own list of genes and pathways. Raw and normalized data can be downloaded for each experiment. The flexible engine of this database (GEDAI) is currently used to handle gene expression data from several laboratory-run projects dealing with different organisms and platforms. Database URL: http://eurodia.vital-it.ch.


Asunto(s)
Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Células Secretoras de Insulina , Interfaz Usuario-Computador , Animales , Minería de Datos , Diabetes Mellitus Tipo 2/metabolismo , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Almacenamiento y Recuperación de la Información , Células Secretoras de Insulina/metabolismo , Internet , Ratones , Ratas , Programas Informáticos
13.
Proc Natl Acad Sci U S A ; 105(51): 20422-7, 2008 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-19088187

RESUMEN

Cancer/Testis (CT) genes, normally expressed in germ line cells but also activated in a wide range of cancer types, often encode antigens that are immunogenic in cancer patients, and present potential for use as biomarkers and targets for immunotherapy. Using multiple in silico gene expression analysis technologies, including twice the number of expressed sequence tags used in previous studies, we have performed a comprehensive genome-wide survey of expression for a set of 153 previously described CT genes in normal and cancer expression libraries. We find that although they are generally highly expressed in testis, these genes exhibit heterogeneous gene expression profiles, allowing their classification into testis-restricted (39), testis/brain-restricted (14), and a testis-selective (85) group of genes that show additional expression in somatic tissues. The chromosomal distribution of these genes confirmed the previously observed dominance of X chromosome location, with CT-X genes being significantly more testis-restricted than non-X CT. Applying this core classification in a genome-wide survey we identified >30 CT candidate genes; 3 of them, PEPP-2, OTOA, and AKAP4, were confirmed as testis-restricted or testis-selective using RT-PCR, with variable expression frequencies observed in a panel of cancer cell lines. Our classification provides an objective ranking for potential CT genes, which is useful in guiding further identification and characterization of these potentially important diagnostic and therapeutic targets.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genoma Humano , Neoplasias Testiculares/genética , Testículo , Proteínas de Anclaje a la Quinasa A , Línea Celular Tumoral , Cromosomas Humanos , Cromosomas Humanos X , Biología Computacional , Proteínas Ligadas a GPI , Genómica/métodos , Proteínas de Homeodominio/genética , Humanos , Masculino , Proteínas de la Membrana/genética
14.
Cancer Immun ; 8: 11, 2008 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-18581998

RESUMEN

Despite the high prevalence of colon cancer in the world and the great interest in targeted anti-cancer therapy, only few tumor-specific gene products have been identified that could serve as targets for the immunological treatment of colorectal cancers. The aim of our study was therefore to identify frequently expressed colon cancer-specific antigens. We performed a large-scale analysis of genes expressed in normal colon and colon cancer tissues isolated from colorectal cancer patients using massively parallel signal sequencing (MPSS). Candidates were additionally subjected to experimental evaluation by semi-quantitative RT-PCR on a cohort of colorectal cancer patients. From a pool of more than 6000 genes identified unambiguously in the analysis, we found 2124 genes that were selectively expressed in colon cancer tissue and 147 genes that were differentially expressed to a significant degree between normal and cancer cells. Differential expression of many genes was confirmed by RT-PCR on a cohort of patients. Despite the fact that deregulated genes were involved in many different cellular pathways, we found that genes expressed in the extracellular space were significantly over-represented in colorectal cancer. Strikingly, we identified a transcript from a chromosome X-linked member of the human endogenous retrovirus (HERV) H family that was frequently and selectively expressed in colon cancer but not in normal tissues. Our data suggest that this sequence should be considered as a target of immunological interventions against colorectal cancer.


Asunto(s)
Antígenos de Neoplasias/genética , Biomarcadores de Tumor/genética , Neoplasias Colorrectales/genética , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Antígenos de Neoplasias/análisis , Biomarcadores de Tumor/análisis , Neoplasias Colorrectales/inmunología , Neoplasias Colorrectales/metabolismo , Regulación hacia Abajo , Retrovirus Endógenos/genética , Humanos
15.
BMC Genomics ; 8: 398, 2007 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-17973996

RESUMEN

BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints. Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements. Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes.


Asunto(s)
ADN Intergénico/genética , Evolución Molecular , Genoma/genética , Vertebrados/genética , Animales , Secuencia Conservada , Drosophila/genética , Genoma de los Insectos/genética , Humanos , Factores de Tiempo
16.
PLoS One ; 2(6): e579, 2007 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-17593978

RESUMEN

Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma , Análisis de Secuencia de ADN , Animales , Biología Computacional , Humanos , Ratones , Programas Informáticos
17.
Nucleic Acids Res ; 35(Web Server issue): W433-7, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17545200

RESUMEN

The MyHits web site (http://myhits.isb-sib.ch) is an integrated service dedicated to the analysis of protein sequences. Since its first description in 2004, both the user interface and the back end of the server were improved. A number of tools (e.g. MAFFT, Jacop, Dotlet, Jalview, ESTScan) were added or updated to improve the usability of the service. The MySQL schema and its associated API were revamped and the database engine (HitKeeper) was separated from the web interface. This paper summarizes the current status of the server, with an emphasis on the new services.


Asunto(s)
Biología Computacional/métodos , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína , Programas Informáticos , Gráficos por Computador , Bases de Datos de Proteínas , Internet , Lenguajes de Programación , Alineación de Secuencia , Integración de Sistemas , Interfaz Usuario-Computador
18.
BMC Genomics ; 8: 129, 2007 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-17521433

RESUMEN

BACKGROUND: Cancer/testis (CT) genes are normally expressed only in germ cells, but can be activated in the cancer state. This unusual property, together with the finding that many CT proteins elicit an antigenic response in cancer patients, has established a role for this class of genes as targets in immunotherapy regimes. Many families of CT genes have been identified in the human genome, but their biological function for the most part remains unclear. While it has been shown that some CT genes are under diversifying selection, this question has not been addressed before for the class as a whole. RESULTS: To shed more light on this interesting group of genes, we exploited the generation of a draft chimpanzee (Pan troglodytes) genomic sequence to examine CT genes in an organism that is closely related to human, and generated a high-quality, manually curated set of human:chimpanzee CT gene alignments. We find that the chimpanzee genome contains homologues to most of the human CT families, and that the genes are located on the same chromosome and at a similar copy number to those in human. Comparison of putative human:chimpanzee orthologues indicates that CT genes located on chromosome X are diverging faster and are undergoing stronger diversifying selection than those on the autosomes or than a set of control genes on either chromosome X or autosomes. CONCLUSION: Given their high level of diversifying selection, we suggest that CT genes are primarily responsible for the observed rapid evolution of protein-coding genes on the X chromosome.


Asunto(s)
Cromosomas Humanos X/genética , Genes Relacionados con las Neoplasias/genética , Animales , Evolución Molecular , Etiquetas de Secuencia Expresada , Femenino , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Humanos , Inmunoterapia , Masculino , Pan troglodytes , Reacción en Cadena de la Polimerasa , Alineación de Secuencia , Testículo
19.
Breast Cancer Res ; 8(5): R56, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17014703

RESUMEN

INTRODUCTION: Diverse microarray and sequencing technologies have been widely used to characterise the molecular changes in malignant epithelial cells in breast cancers. Such gene expression studies to identify markers and targets in tumour cells are, however, compromised by the cellular heterogeneity of solid breast tumours and by the lack of appropriate counterparts representing normal breast epithelial cells. METHODS: Malignant neoplastic epithelial cells from primary breast cancers and luminal and myoepithelial cells isolated from normal human breast tissue were isolated by immunomagnetic separation methods. Pools of RNA from highly enriched preparations of these cell types were subjected to expression profiling using massively parallel signature sequencing (MPSS) and four different genome wide microarray platforms. Functional related transcripts of the differential tumour epithelial transcriptome were used for gene set enrichment analysis to identify enrichment of luminal and myoepithelial type genes. Clinical pathological validation of a small number of genes was performed on tissue microarrays. RESULTS: MPSS identified 6,553 differentially expressed genes between the pool of normal luminal cells and that of primary tumours substantially enriched for epithelial cells, of which 98% were represented and 60% were confirmed by microarray profiling. Significant expression level changes between these two samples detected only by microarray technology were shown by 4,149 transcripts, resulting in a combined differential tumour epithelial transcriptome of 8,051 genes. Microarray gene signatures identified a comprehensive list of 907 and 955 transcripts whose expression differed between luminal epithelial cells and myoepithelial cells, respectively. Functional annotation and gene set enrichment analysis highlighted a group of genes related to skeletal development that were associated with the myoepithelial/basal cells and upregulated in the tumour sample. One of the most highly overexpressed genes in this category, that encoding periostin, was analysed immunohistochemically on breast cancer tissue microarrays and its expression in neoplastic cells correlated with poor outcome in a cohort of poor prognosis estrogen receptor-positive tumours. CONCLUSION: Using highly enriched cell populations in combination with multiplatform gene expression profiling studies, a comprehensive analysis of molecular changes between the normal and malignant breast tissue was established. This study provides a basis for the identification of novel and potentially important targets for diagnosis, prognosis and therapy in breast cancer.


Asunto(s)
Neoplasias de la Mama/genética , Moléculas de Adhesión Celular/genética , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Biomarcadores de Tumor/análisis , Mama , Células Cultivadas , Células Epiteliales , Femenino , Humanos , Pronóstico , Transcripción Genética , Células Tumorales Cultivadas
20.
BMC Genomics ; 7: 176, 2006 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-16836751

RESUMEN

BACKGROUND: Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS: We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION: We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites.


Asunto(s)
Drosophila melanogaster/genética , Poli A/genética , Poliadenilación/genética , Regiones no Traducidas 3'/genética , Animales , Composición de Base/genética , Secuencia de Bases , Etiquetas de Secuencia Expresada , Humanos , Modelos Genéticos , Procesamiento Postranscripcional del ARN , ARN Mensajero/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...