RESUMO
The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.
Assuntos
Software , Humanos , Genoma , Genômica , Disseminação de InformaçãoRESUMO
microPublication Biology (micropublication.org) is a non-profit, community-focused, peer-reviewed journal dedicated to publishing small (single-figure) reports of data, methods and software related to a variety of model organisms. A workshop on microPublications at the Faculty for Undergraduate Neuroscience (FUN) conference in Summer 2023 focused on 1) publishing data-especially student research experiences, and data gathered through course-based research, and 2) using the microPublication platform and article template in teaching and learning. In this article, we further describe the microPublication platform and workflow and how PI's can use this venue to publish student work. We also provide examples of how the microPublication format can be adapted and adopted as tools for undergraduate teaching and learning.
RESUMO
Background and Objectives: Specific Learning Disorder (SLD) is a complex neurobiological disorder characterized by a persistent difficult in reading (dyslexia), written expression (dysgraphia), and mathematics (dyscalculia). The hereditary and genetic component is one of the underlying causes of SLD, but the relationship between genes and the environment should be considered. Several genetic studies were performed in different populations to identify causative genes. Materials and Methods: Here, we show the analysis of 9 multiplex families with at least 2 individuals diagnosed with SLD per family, with a total of 37 persons, 21 of whom are young subjects with SLD, by means of Next-Generation Sequencing (NGS) to identify possible causative mutations in a panel of 15 candidate genes: CCPG1, CYP19A1, DCDC2, DGKI, DIP2A, DYM, GCFC2, KIAA0319, MC5R, MRPL19, NEDD4L, PCNT, PRMT2, ROBO1, and S100B. Results: We detected, in eight families out nine, SNP variants in the DGKI, DIP2A, KIAA0319, and PCNT genes, even if in silico analysis did not show any causative effect on this behavioral condition. In all cases, the mutation was transmitted by one of the two parents, thus excluding the case of de novo mutation. Moreover, the parent carrying the allelic variant transmitted to the children, in six out of seven families, reports language difficulties. Conclusions: Although the present results cannot be considered conclusive due to the limited sample size, the identification of genetic variants in the above genes can provide input for further research on the same, as well as on other genes/mutations, to better understand the genetic basis of this disorder, and from this perspective, to better understand also the neuropsychological and social aspects connected to this disorder, which affects an increasing number of young people.
Assuntos
Transtorno de Aprendizagem Específico , Criança , Humanos , Adolescente , Proteínas do Tecido Nervoso , Receptores Imunológicos , Alelos , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas Associadas aos MicrotúbulosRESUMO
MOTIVATION: Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature-a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses. Thus, automatically identifying publications relevant to a specific topic within a large volume of publications is an important task toward expediting the biocuration process and, in turn, biomedical research. Current methods focus on textual contents, typically extracted from the title-and-abstract. Notably, images and captions are often used in publications to convey pivotal evidence about processes, experiments and results. RESULTS: We present a new document classification scheme, using both image and caption information, in addition to titles-and-abstracts. To use the image information, we introduce a new image representation, namely Figure-word, based on class labels of subfigures. We use word embeddings for representing captions and titles-and-abstracts. To utilize all three types of information, we introduce two information integration methods. The first combines Figure-words and textual features obtained from captions and titles-and-abstracts into a single larger vector for document representation; the second employs a meta-classification scheme. Our experiments and results demonstrate the usefulness of the newly proposed Figure-words for representing images. Moreover, the results showcase the value of Figure-words, captions and titles-and-abstracts in providing complementary information for document classification; these three sources of information when combined, lead to an overall improved classification performance. AVAILABILITY AND IMPLEMENTATION: Source code and the list of PMIDs of the publications in our datasets are available upon request.
Assuntos
Pesquisa Biomédica , Bases de Dados FactuaisRESUMO
WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genes de Helmintos , Animais , Mineração de Dados , Genômica , Internet , Interface Usuário-ComputadorRESUMO
WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.
Assuntos
Bases de Dados Genéticas , Genoma , Nematoides/genética , Animais , Caenorhabditis/genética , Caenorhabditis elegans/genética , Curadoria de Dados , Mineração de Dados , Conjuntos de Dados como Assunto , Modelos Animais de Doenças , Previsões , Ontologia Genética , Humanos , Armazenamento e Recuperação da Informação , Platelmintos/genética , Editoração , Interferência de RNA , Alinhamento de Sequência , Interface Usuário-Computador , NavegadorRESUMO
WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma Helmíntico , Genômica , Nematoides/genética , Animais , Genes de Helmintos , Anotação de Sequência Molecular , Platelmintos/genética , SoftwareRESUMO
WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma Helmíntico , Animais , Internet , Anotação de Sequência Molecular , Nematoides/genéticaRESUMO
Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma Helmíntico , Nematoides/genética , Animais , Caenorhabditis/genética , Caenorhabditis elegans/anatomia & histologia , Gráficos por Computador , Perfilação da Expressão Gênica , Genômica , Internet , Anotação de Sequência Molecular , FenótipoRESUMO
WormBase has been the major repository and knowledgebase of information about the genome and genetics of Caenorhabditis elegans and other nematodes of experimental interest for over 2 decades. We have 3 goals: to keep current with the fast-paced C. elegans research, to provide better integration with other resources, and to be sustainable. Here, we discuss the current state of WormBase as well as progress and plans for moving core WormBase infrastructure to the Alliance of Genome Resources (the Alliance). As an Alliance member, WormBase will continue to interact with the C. elegans community, develop new features as needed, and curate key information from the literature and large-scale projects.
Assuntos
Caenorhabditis elegans , Caenorhabditis elegans/genética , Animais , Bases de Dados Genéticas , Genoma Helmíntico , Genômica/métodosRESUMO
The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR-bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability, and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability, and Technology (TRUST). The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility, and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.
RESUMO
WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.
Assuntos
Caenorhabditis , Nematoides , Animais , Caenorhabditis/genética , Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma , Genômica , Humanos , Nematoides/genéticaRESUMO
Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.
Assuntos
Caenorhabditis elegans/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Bases de Conhecimento , Animais , Caenorhabditis elegans/metabolismo , Internet , Máquina de Vetores de Suporte , Interface Usuário-ComputadorRESUMO
Large volumes of data generated by research laboratories coupled with the required effort and cost of curation present a significant barrier to inclusion of these data in authoritative community databases. Further, many publicly funded experimental observations remain invisible to curation simply because they are never published: results often do not fit within the scope of a standard publication; trainee-generated data are forgotten when the experimenter (e.g. student, post-doc) leaves the lab; results are omitted from science narratives due to publication bias where certain results are considered irrelevant for the publication. While authors are in the best position to curate their own data, they face a steep learning curve to ensure that appropriate referential tags, metadata, and ontologies are applied correctly to their observations, a task sometimes considered beyond the scope of their research and other numerous responsibilities. Getting researchers to adopt a new system of data reporting and curation requires a fundamental change in behavior among all members of the research community. To solve these challenges, we have created a novel scholarly communication platform that captures data from researchers and directly delivers them to information resources via Micropublication. This platform incentivizes authors to publish their unpublished observations along with associated metadata by providing a deliberately fast and lightweight but still peer-reviewed process that results in a citable publication. Our long-term goal is to develop a data ecosystem that improves reproducibility and accountability of publicly funded research and in turn accelerates both basic and translational discovery. Database URL: www.micropublication.org.
Assuntos
Pesquisa Biomédica , Curadoria de Dados/métodos , Bases de Dados Factuais , Humanos , Publicações Periódicas como AssuntoRESUMO
WormBase ( www.wormbase.org ) provides the nematode research community with a centralized database for information pertaining to nematode genes and genomes. As more nematode genome sequences are becoming available and as richer data sets are published, WormBase strives to maintain updated information, displays, and services to facilitate efficient access to and understanding of the knowledge generated by the published nematode genetics literature. This chapter aims to provide an explanation of how to use basic features of WormBase, new features, and some commonly used tools and data queries. Explanations of the curated data and step-by-step instructions of how to access the data via the WormBase website and available data mining tools are provided.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma Helmíntico , Genômica , Animais , Biologia Computacional/métodos , Mineração de Dados/métodos , Epistasia Genética , Ontologia Genética , Genes de Helmintos , Genômica/métodos , Humanos , Fenótipo , Proteoma , Ferramenta de Busca , Software , Transcriptoma , Interface Usuário-Computador , NavegadorRESUMO
Endocytic receptors in the proximal tubule of the mammalian kidney are responsible for the reuptake of numerous ligands, including lipoproteins, sterols, vitamin-binding proteins, and hormones, and they can mediate drug-induced nephrotoxicity. In this paper, we report the first evidence indicating that the pronephric kidneys of Xenopus tadpoles are capable of endocytic transport. We establish that the Xenopus genome harbors genes for the known three endocytic receptors megalin/LRP2, cubilin, and amnionless. The Xenopus endocytic receptor genes share extensive synteny with their mammalian counterparts. In situ hybridizations demonstrated that endocytic receptor expression is highly tissue specific, primarily in the pronephric kidney, and did not occur prior to neurulation. Expression was strictly confined to proximal tubules of the pronephric kidney, which closely resembles the situation reported in mammalian kidneys. By immunohistochemistry, we demonstrated that Xenopus pronephric tubule epithelia express high amounts of the endocytic receptors megalin/lrp2 and cubilin in the apical plasma membrane. Furthermore, functional aspects of the endocytic receptors were revealed by the vesicular localization of retinol-binding protein in the proximal tubules, probably representing endocytosed protein. In summary, we provide here the first comprehensive report of endocytic receptor expression, including amnionless, in a nonmammalian species. Remarkably, renal endocytic receptor expression and function in the Xenopus pronephric kidney closely mirrors the situation in the mammalian kidney. The Xenopus pronephric kidney therefore represents a novel, simple model for physiological studies on the molecular mechanisms underlying renal tubular endocytosis.
Assuntos
Endocitose/fisiologia , Túbulos Renais Proximais/metabolismo , Rim/metabolismo , Animais , Mapeamento Cromossômico , DNA Complementar/biossíntese , DNA Complementar/genética , Perfilação da Expressão Gênica , Imuno-Histoquímica , Hibridização In Situ , Rim/citologia , Rim/embriologia , Túbulos Renais Proximais/citologia , Túbulos Renais Proximais/embriologia , Proteína-2 Relacionada a Receptor de Lipoproteína de Baixa Densidade/biossíntese , Proteína-2 Relacionada a Receptor de Lipoproteína de Baixa Densidade/genética , Proteínas de Membrana , Microscopia Eletrônica , Filogenia , Proteínas/genética , Receptores de Superfície Celular/biossíntese , Receptores de Superfície Celular/genética , Systematized Nomenclature of Medicine , XenopusRESUMO
BACKGROUND: The pronephros, the simplest form of a vertebrate excretory organ, has recently become an important model of vertebrate kidney organogenesis. Here, we elucidated the nephron organization of the Xenopus pronephros and determined the similarities in segmentation with the metanephros, the adult kidney of mammals. RESULTS: We performed large-scale gene expression mapping of terminal differentiation markers to identify gene expression patterns that define distinct domains of the pronephric kidney. We analyzed the expression of over 240 genes, which included members of the solute carrier, claudin, and aquaporin gene families, as well as selected ion channels. The obtained expression patterns were deposited in the searchable European Renal Genome Project Xenopus Gene Expression Database. We found that 112 genes exhibited highly regionalized expression patterns that were adequate to define the segmental organization of the pronephric nephron. Eight functionally distinct domains were discovered that shared significant analogies in gene expression with the mammalian metanephric nephron. We therefore propose a new nomenclature, which is in line with the mammalian one. The Xenopus pronephric nephron is composed of four basic domains: proximal tubule, intermediate tubule, distal tubule, and connecting tubule. Each tubule may be further subdivided into distinct segments. Finally, we also provide compelling evidence that the expression of key genes underlying inherited renal diseases in humans has been evolutionarily conserved down to the level of the pronephric kidney. CONCLUSION: The present study validates the Xenopus pronephros as a genuine model that may be used to elucidate the molecular basis of nephron segmentation and human renal disease.
Assuntos
Regulação da Expressão Gênica no Desenvolvimento , Rim/embriologia , Adulto , Animais , Biomarcadores , Diferenciação Celular , Antiportadores de Cloreto-Bicarbonato/genética , Humanos , Rim/anatomia & histologia , Rim/metabolismo , Nefropatias/genética , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Proteínas de Xenopus/genética , Xenopus laevis/genéticaRESUMO
The nephron, the basic structural and functional unit of the vertebrate kidney, is organized into discrete segments, which are composed of distinct renal epithelial cell types. Each cell type carries out highly specific physiological functions to regulate fluid balance, osmolarity, and metabolic waste excretion. To date, the genetic basis of regionalization of the nephron has remained largely unknown. Here we show that Irx3, a member of the Iroquois (Irx) gene family, acts as a master regulator of intermediate tubule fate. Comparative studies in Xenopus and mouse have identified Irx1, Irx2, and Irx3 as an evolutionary conserved subset of Irx genes, whose expression represents the earliest manifestation of intermediate compartment patterning in the developing vertebrate nephron discovered to date. Intermediate tubule progenitors will give rise to epithelia of Henle's loop in mammals. Loss-of-function studies indicate that irx1 and irx2 are dispensable, whereas irx3 is necessary for intermediate tubule formation in Xenopus. Furthermore, we demonstrate that misexpression of irx3 is sufficient to direct ectopic development of intermediate tubules in the Xenopus mesoderm. Taken together, irx3 is the first gene known to be necessary and sufficient to specify nephron segment fate in vivo.