RESUMEN
BACKGROUND: Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. RESULTS: Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. CONCLUSIONS: Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu . Data are available via ProteomeXchange under identifier PXD010618.
Asunto(s)
Eucariontes/genética , Genoma , Anotación de Secuencia Molecular , Proteogenómica/métodos , Programas Informáticos , Flujo de Trabajo , Secuencia de Aminoácidos , Codón/genética , Espectrometría de Masas , Péptidos/química , Péptidos/metabolismo , Reproducibilidad de los ResultadosRESUMEN
For the C-HPP consortium, dark proteins include not only uPE1, but also missing proteins (MPs, PE2-4), smORFs, proteins from lncRNAs, and products from uncharacterized transcripts. Here, we investigated the expression of dark proteins in the human testis by combining public mRNA and protein expression data for several tissues and performing LC-MS/MS analysis of testis protein extracts. Most uncharacterized proteins are highly expressed in the testis. Thirty could be identified in our data set, of which two were selected for further analyses: (1) A0AOU1RQG5, a putative cancer/testis antigen specifically expressed in the testis, where it accumulates in the cytoplasm of elongated spermatids; and (2) PNMA6E, which is enriched in the testis, where it is found in the germ cell nuclei during most stages of spermatogenesis. Both proteins are coded on Chromosome X. Finally, we studied the expression of other dark proteins, uPE1 and MPs, in a series of human tissues. Most were highly expressed in the testis at both the mRNA and protein levels. The testis appears to be a relevant organ to study the dark proteome, which may have a function related to spermatogenesis and germ cell differentiation. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD009598.
Asunto(s)
Proteoma/química , Testículo/química , Cromatografía Liquida , Minería de Datos , Humanos , Inmunohistoquímica , Masculino , Proteínas/análisis , Proteómica/métodos , ARN Mensajero/análisis , Espectrometría de Masas en TándemRESUMEN
The Chromosome-Centric Human Proteome Project (C-HPP) aims at cataloguing the proteins as gene products encoded by the human genome in a chromosome-centric manner. The existence of products of about 82% of the genes has been confirmed at the protein level. However, the number of so-called "missing proteins" remains significant. It was recently suggested that the expression of proteins that have been systematically missed might be restricted to particular organs or cell types, for example, the testis. Testicular function, and spermatogenesis in particular, is conditioned by the successive activation or repression of thousands of genes and proteins including numerous germ cell- and testis-specific products. Both the testis and postmeiotic germ cells are thus promising sites at which to search for missing proteins, and ejaculated spermatozoa are a potential source of proteins whose expression is restricted to the germ cell lineage. A trans-chromosome-based data analysis was performed to catalog missing proteins in total protein extracts from isolated human spermatozoa. We have identified and manually validated peptide matches to 89 missing proteins in human spermatozoa. In addition, we carefully validated three proteins that were scored as uncertain in the latest neXtProt release (09.19.2014). A focus was then given to the 12 missing proteins encoded on chromosomes 2 and 14, some of which may putatively play roles in ciliation and flagellum mechanistics. The expression pattern of C2orf57 and TEX37 was confirmed in the adult testis by immunohistochemistry. On the basis of transcript expression during human spermatogenesis, we further consider the potential for discovering additional missing proteins in the testicular postmeiotic germ cell lineage and in ejaculated spermatozoa. This project was conducted as part of the C-HPP initiatives on chromosomes 14 (France) and 2 (Switzerland). The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD002367.
Asunto(s)
Mapeo Cromosómico , Modelos Biológicos , Proteínas/genética , Proteoma , Espermatozoides/química , Cromatografía Liquida , Humanos , Masculino , Proteínas/química , Espectrometría de Masas en TándemRESUMEN
EXOSC10 is a catalytic subunit of the nuclear RNA exosome, and possesses a 3'-5' exoribonuclease activity. The enzyme processes and degrades different classes of RNAs. To delineate the role of EXOSC10 during oocyte growth, specific Exosc10 inactivation was performed in oocytes from the primordial follicle stage onward using the Gdf9-iCre; Exosc10 f/- mouse model (Exosc10 cKO(Gdf9)). Exosc10 cKO(Gdf9) female mice are infertile. The onset of puberty and the estrus cycle in mutants are initially normal and ovaries contain all follicle classes. By the age of eight weeks, vaginal smears reveal irregular estrus cycles and mutant ovaries are completely depleted of follicles. Mutant oocytes retrieved from the oviduct are degenerated, and occasionally show an enlarged polar body, which may reflect a defective first meiotic division. Under fertilization conditions, the mutant oocytes do not enter into an embryonic development process. Furthermore, we conducted a comparative proteome analysis of wild type and Exosc10 knockout mouse ovaries, and identified EXOSC10-dependent proteins involved in many biological processes, such as meiotic cell cycle progression and oocyte maturation. Our results unambiguously demonstrate an essential role for EXOSC10 in oogenesis and may serve as a model for primary ovarian insufficiency in humans. Data are available via ProteomeXchange with identifier PXD039417.
Asunto(s)
Fenómenos Biológicos , Reserva Ovárica , Animales , Femenino , Humanos , Lactante , Ratones , Exorribonucleasas/metabolismo , Complejo Multienzimático de Ribonucleasas del Exosoma/metabolismo , Oocitos/metabolismo , Oogénesis/genéticaRESUMEN
Endometriosis is a common chronic gynaecological disease causing various symptoms, such as infertility and chronic pain. The gold standard for its diagnosis is still laparoscopy and the biopsy of endometriotic lesions. Here, we aimed to compare the eutopic endometrium from women with or without endometriosis to identify proteins that may be considered as potential biomarker candidates. Eutopic endometrium was collected from patients with endometriosis (n = 4) and women without endometriosis (n = 5) during a laparoscopy surgery during the mid-secretory phase of their menstrual cycle. Total proteins from tissues were extracted and digested before LC-MS-MS analysis. Among the 5301 proteins identified, 543 were differentially expressed and enriched in two specific KEGG pathways: focal adhesion and PI3K/AKT signaling. Integration of our data with a large-scale proteomics dataset allowed us to highlight 11 proteins that share the same trend of dysregulation in eutopic endometrium, regardless of the phase of the menstrual cycle. Our results constitute the first step towards the identification of potential promising endometrial diagnostic biomarkers. They provide new insights into the mechanisms underlying endometriosis and its etiology. Our results await further confirmation on a larger sample cohort.
RESUMEN
OBJECTIVES: To assess and understand adverse drug reactions (ADRs), a systematic review of reference databases like Pubmed is a necessary and mandatory step in Pharmacovigilance. In order to assist pharmacovigilance team with a computerized tool, we performed a comparative study of 4 different approaches to query Pubmed through ADR-drug terms. The aim of this study is to assess how an ontology of adverse effects, used to normalize and extend queries, could improve this search. MATERIAL AND METHOD: The ontological resource OntoEIM contains 58,000 classes and integrates MedDRA terminology. The entry point is a ADR-Drug term and the four methods are (i) a direct search on Pubmed (ii) a search with a normalized query enhanced with domain-specific Mesh Heading criteria, (iii) a search with the same elaborated query extended to the MeSH sub-hierarchy of the adverse effect entry and (iv) a search with a set of MedDRA terms grouped by subsomption in the OntoEIM ontology. For each of the 16 queries performed and analysed, relevant publications are selected "manually" by two pharmacovigilant experts. RESULTS: The recall is respectively of 63%, 50%, 67% and 74%, the precision of 13%, 26%, 29% and 4%. The best recall is provided by the ontology-based method, for 4 cases out of 16 this method returns relevant publications when the others return no results. CONCLUSION: Results show that an ontology-based search tool improves the recall performance, but other tools and methods are needed to raise the precision.
Asunto(s)
Minería de Datos/métodos , Sistemas de Administración de Bases de Datos , Documentación/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Procesamiento de Lenguaje Natural , PubMed , Terminología como Asunto , Humanos , Vocabulario ControladoRESUMEN
Spermatozoa acquire their fertilizing capacity during a complex maturation process that occurs in the epididymis. This process involves a substantial molecular remodeling at the surface of the gamete. Epididymis is divided into three regions (the caput, corpus, and cauda) or into 19 intraregional segments based on histology. Most studies carried out on epididymal maturation have been performed on sperm samples or tissue extracts. Here, we used MALDI imaging mass spectrometry in the positive and negative ion modes combined with spatial segmentation and automated metabolite annotation to study the precise localization of metabolites directly in the rat epididymis. The spatial segmentation revealed that the rat epididymis could be divided into several molecular clusters different from the 19 intraregional segments. The discriminative m/z values that contributed the most to each molecular cluster were then annotated and corresponded mainly to phosphatidylcholines, sphingolipids, glycerophosphates, triacylglycerols, plasmalogens, phosphatidylethanolamines, and lysophosphatidylcholines. A substantial remodeling of lipid composition during epididymal maturation was observed. It was characterized in particular by an increase in the number of sphingolipids and plasmalogens and a decrease in the proportion of triacylglycerols annotated from caput to cauda. Ion images reveal that molecules belonging to the same family can have very different localizations along the epididymis. For some of them, annotation was confirmed by on-tissue MS/MS experiments. A 3D model of the epididymis head was reconstructed from 61 sections analyzed with a lateral resolution of 50 µm and can be used to obtain information on the localization of a given analyte in the whole volume of the tissue.
Asunto(s)
Epidídimo/diagnóstico por imagen , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Maduración del Esperma/fisiología , Animales , Imagenología Tridimensional , Masculino , Imagen Molecular , Ratas , Ratas Sprague-DawleyRESUMEN
Most countries have developed information systems to report drug adverse effects. However, as in other domains where systematic reviews are needed, there is little guidance on how systematic documentation of drug adverse effects should be performed. The objective of the VigiTermes project is to develop a platform to improve documentation of pharmacovigilance case reports for the pharmaceutical industry and regulatory authorities. In order to improve systematic reviews of adverse drug reactions, we developed a prototype that first reproduces and standardizes search strategies, then extracts information from the Medline abstracts which were retrieved and annotates them. The platform aims at providing transparent access and analysis tools to pharmacovigilance experts investigating relevance of safety signals related to drugs. The platform's architecture consists in the integration of two vendor tools ITM and Luxid and one academic web service for knowledge extraction from medical literature. Whereas a manual search performed by a pharmacovigilance expert retrieved 578 publications, the system proposed a list of 229 publications thus decreasing time required for review by 60%. Recall was 70% and additional developments are required in order to improve exhaustivity.
Asunto(s)
Bases de Datos como Asunto , Documentación , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Preparaciones Farmacéuticas , Integración de SistemasRESUMEN
BACKGROUND: In environmental sequencing studies, fungi can be identified based on nucleic acid sequences, using either highly variable sequences as species barcodes or conserved sequences containing a high-quality phylogenetic signal. For the latter, identification relies on phylogenetic analyses and the adoption of the phylogenetic species concept. Such analysis requires that the reference sequences are well identified and deposited in public-access databases. However, many entries in the public sequence databases are problematic in terms of quality and reliability and these data require screening to ensure correct phylogenetic interpretation. METHODS AND PRINCIPAL FINDINGS: To facilitate phylogenetic inferences and phylogenetic assignment, we introduce a fungal sequence database. The database PHYMYCO-DB comprises fungal sequences from GenBank that have been filtered to satisfy stringent sequence quality criteria. For the first release, two widely used molecular taxonomic markers were chosen: the nuclear SSU rRNA and EF1-α gene sequences. Following the automatic extraction and filtration, a manual curation is performed to remove problematic sequences while preserving relevant sequences useful for phylogenetic studies. As a result of curation, ~20% of the automatically filtered sequences have been removed from the database. To demonstrate how PHYMYCO-DB can be employed, we test a set of environmental Chytridiomycota sequences obtained from deep sea samples. CONCLUSION: PHYMYCO-DB offers the tools necessary to: (i) extract high quality fungal sequences for each of the 5 fungal phyla, at all taxonomic levels, (ii) extract already performed alignments, to act as 'reference alignments', (iii) launch alignments of personal sequences along with stored data. A total of 9120 SSU rRNA and 672 EF1-α high-quality fungal sequences are now available. The PHYMYCO-DB is accessible through the URL http://phymycodb.genouest.org/.