RESUMEN
Manually curated metabolic databases residing at the Sol Genomics Network comprise two taxon-specific databases for the Solanaceae family, i.e. SolanaCyc and the genus Nicotiana, i.e. NicotianaCyc as well as six species-specific databases for Nicotiana tabacum TN90, N. tabacum K326, Nicotiana benthamiana, N. sylvestris, N. tomentosiformis and N. attenuata. New pathways were created through the extraction, examination and verification of related data from the literature and the aid of external database guided by an expert-led curation process. Here we describe the curation progress that has been achieved in these databases since the first release version 1.0 in 2016, the curation flow and the curation process using the example metabolic pathway for cholesterol in plants. The current content of our databases comprises 266 pathways and 36 superpathways in SolanaCyc and 143 pathways plus 21 superpathways in NicotianaCyc, manually curated and validated specifically for the Solanaceae family and Nicotiana genus, respectively. The curated data have been propagated to the respective Nicotiana-specific databases, which resulted in the enrichment and more accurate presentation of their metabolic networks. The quality and coverage in those databases have been compared with related external databases and discussed in terms of literature support and metabolic content.
Asunto(s)
Colesterol/metabolismo , Bases de Datos Factuales , Redes y Vías Metabólicas , Nicotiana , Nicotiana/clasificación , Nicotiana/metabolismoRESUMEN
BACKGROUND: Important regulation occurs at the level of transcription in Plasmodium falciparum and growing evidence suggests that these apicomplexan parasites have complex regulatory networks. Recent studies implicate long noncoding RNAs (lncRNAs) as transcriptional regulators in P. falciparum. However, due to limited research and the lack of necessary experimental tools, our understanding of their role in the malaria-causing parasite remains largely unelucidated. In this work, we address one of these limitations, the lack of an updated and improved lncRNA annotation in P. falciparum. RESULTS: We generated long-read RNA sequencing data and integrated information extracted and curated from multiple sources to manually annotate lncRNAs. We identified 1119 novel lncRNAs and validated and refined 1250 existing annotations. Utilising the collated datasets, we generated evidence-based ranking scores for each annotation and characterised the distinct genomic contexts and features of P. falciparum lncRNAs. Certain features indicated subsets with potential biological significance such as 25 lncRNAs containing multiple introns, 335 lncRNAs lacking mutations in piggyBac mutagenic studies and lncRNAs associated with specific biologic processes including two new types of lncRNAs found proximal to var genes. CONCLUSIONS: The insights and the annotation presented in this study will serve as valuable tools for researchers seeking to understand the role of lncRNAs in parasite biology through both bioinformatics and experimental approaches.
Asunto(s)
Malaria Falciparum , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , Genómica , Malaria Falciparum/genética , Plasmodium falciparum/genética , Biología ComputacionalRESUMEN
We developed the Genome Atlas of Breast Cancer (GABC), a global map of noncoding events in the human genome associated with breast cancer that provides a valuable reference resource for users to investigate the underlying genome abnormalities in breast cancer patients. Although significant progress has been made in breast cancer treatment, its morbidity and recurrence rates in women are still high worldwide. Curation and integration of breast cancer-related dysregulations from multiple aspects is essential for disease prevention and diagnosis. In this study, we developed the GABC, which contains 10 172 aberrant noncoding events occurring at multiomics levels, including the genome (single nucleotide polymorphism and somatic mutation), transcriptome (long noncoding RNA and microRNA) and epigenome (DNA methylation, enhancer and superenhancer). Each event entry provides descriptions of detailed biological mechanisms specific to the region or element. Users can also check the genome locations and relationships of functional regulators. The GABC provides a flexible and user-friendly interface for users to search, browse and download data. In addition, the GABC provides an interface to submit newly discovered noncoding events that can be included in the database. Therefore, the GABC aims to constantly enhance our understanding of noncoding genomic events in breast cancer.
Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Epigenómica/métodos , Femenino , Humanos , Internet , Polimorfismo de Nucleótido Simple , ARN Largo no Codificante/genética , Reproducibilidad de los Resultados , Transcriptoma/genéticaRESUMEN
BACKGROUND: The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. RESULTS: Our results show that the subset of genes chosen for manual annotation by a research community (3.5-7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species' gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. CONCLUSIONS: In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative.
Asunto(s)
Genes de Insecto/genética , Genoma de los Insectos/genética , Anotación de Secuencia Molecular , Secuencia de Aminoácidos , Composición de Base , Secuencia de Bases , Exones , IntronesRESUMEN
BACKGROUND: The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. RESULTS: Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existing IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. CONCLUSION: By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.
Asunto(s)
Biología Computacional/métodos , Genoma Microbiano , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Conducta Cooperativa , Exactitud de los Datos , Difusión de la Información , Internet , Interfaz Usuario-ComputadorRESUMEN
Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, "activity-guided" approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent "genome mining" approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias toward the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and synthetic biology efforts toward discovering novel fungal enzymes and metabolites.
Asunto(s)
Productos Biológicos , Vías Biosintéticas/genética , Genes Fúngicos , Genoma Fúngico , Familia de Multigenes , Alcaloides , Secuencia de Aminoácidos , Biología Computacional , Curaduría de Datos , Hongos/genética , Filogenia , Policétidos , TerpenosRESUMEN
Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources.
Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes , Genoma de Planta/genética , Genómica , Almacenamiento y Recuperación de la Información , Plantas/genética , Curaduría de Datos , Regulación de la Expresión Génica de las Plantas , Internet , Anotación de Secuencia Molecular , Procesamiento de Lenguaje Natural , TranscriptomaRESUMEN
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
Asunto(s)
Bases de Datos de Proteínas/estadística & datos numéricos , Estudios de Asociación Genética , Genética Médica , Bases del Conocimiento , Proteoma , Programas Informáticos , Secuencia de Aminoácidos , Variación Genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Terminología como AsuntoRESUMEN
BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
RESUMEN
Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read-discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.
Asunto(s)
Genoma , Genómica , Metagenómica , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Variant curation refers to the application of evidence-based methods for the interpretation of genetic variants. Significant variability in this process among laboratories affects clinical practice. For admixed Hispanic/Latino populations, underrepresented in genomic databases, the interpretation of genetic variants for cancer risk is challenging. METHODS: We retrospectively evaluated 601 sequence variants detected in patients participating in the largest Institutional Hereditary Cancer Program in Colombia. VarSome and PathoMAN were used for automated curation, and ACMG/AMP and Sherloc criteria were applied for manual curation. RESULTS: Regarding the automated curation, 11% of the variants (64/601) were reclassified, 59% (354/601) had no changes in its interpretation, and the other 30% (183/601) presented conflicting interpretations. With respect to manual curation, of the 183 variants with conflicting interpretations, 17% (N = 31) were reclassified, 66% (N = 120) had no changes in their initial interpretation, and 17% (N = 32) remained with conflicting interpretation status. Overall, 91% of the VUS were downgraded and 9% were upgraded. CONCLUSIONS: Most VUS were reclassified as benign/likely benign. Since false-positive and -negative results can be obtained with automated tools, manual curation should also be used as a complement. Our results contribute to improving cancer risk assessment and management for a broad range of hereditary cancer syndromes in Hispanic/Latino populations.
Asunto(s)
Variación Genética , Síndromes Neoplásicos Hereditarios , Humanos , Pruebas Genéticas , Predisposición Genética a la Enfermedad , América Latina , Estudios Retrospectivos , Síndromes Neoplásicos Hereditarios/genéticaRESUMEN
OBJECTIVES: High-quality species-specific transposable element (TE) libraries are required for studies to elucidate the evolutionary dynamics of TEs and gain an understanding of their impacts on host genomes. Such high-quality TE resources are severely lacking for species in the fungal kingdom. To facilitate future studies on the putative role of TEs in rapid adaptation observed in the fungal wheat pathogen Zymoseptoria tritici, we produced a manually curated TE library. This was generated by detecting TEs in 19 reference genome assemblies representing the global diversity of the species supplemented by multiple sister species genomes. Improvements over previous TE libraries have been made on TE boundary resolution, detection of ORFs, TE domains, terminal inverted repeats, and class-specific motifs. DATA DESCRIPTION: A TE consensus library for Z. tritici formatted for use with RepeatMasker. This data is relevant to other researchers investigating TE-host evolutionary dynamics in Z. tritici or who are interested in comparative studies of the fungal kingdom. Further, this TE library can be used to improve gene annotation. Finally, this TE library increases the number of manually curated TE datasets, providing resources to further our understanding of TE diversity.
Asunto(s)
Ascomicetos , Elementos Transponibles de ADN , Elementos Transponibles de ADN/genética , Ascomicetos/genética , Anotación de Secuencia Molecular , Biblioteca de GenesRESUMEN
Transposable elements (TEs) exert an increasingly diverse spectrum of influences on eukaryotic genome structure, function, and evolution. A deluge of genomic, transcriptomic, and proteomic data provides the foundation for turning essentially any non-model eukaryotic species into an emerging model to study any and all aspects of organismal biology, ultimately shaping future directions for biomedical, environmental, and biodiversity research. However, identification and annotation of the mobile genome component still lags behind the standards accepted for host gene annotation. To achieve the objective of providing every genome project with a comprehensive description of its mobilome component in addition to the standard genic and transcriptomic datasets, each step of TE identification, classification, and annotation should be focused on improving TE boundary designation, reducing identification error rates, and providing accurate information on the type and integrity of TE insertions. Here, we offer practical advice for generating TE models in de novo assemblies for non-model organisms, provide step-by-step instructions to guide inexperienced TE annotators through some of the commonly utilized TE analysis pipelines, and entertain suggestions for tool improvement which could be implemented by interested developers.
Asunto(s)
Elementos Transponibles de ADN , Eucariontes , Eucariontes/genética , Elementos Transponibles de ADN/genética , Proteómica , Células Eucariotas , Anotación de Secuencia MolecularRESUMEN
There is a widespread awareness that the wealth of preclinical toxicity data that the pharmaceutical industry has generated in recent decades is not exploited as efficiently as it could be. Enhanced data availability for compound comparison ("read-across"), or for data mining to build predictive tools, should lead to a more efficient drug development process and contribute to the reduction of animal use (3Rs principle). In order to achieve these goals, a consortium approach, grouping numbers of relevant partners, is required. The eTOX ("electronic toxicity") consortium represents such a project and is a public-private partnership within the framework of the European Innovative Medicines Initiative (IMI). The project aims at the development of in silico prediction systems for organ and in vivo toxicity. The backbone of the project will be a database consisting of preclinical toxicity data for drug compounds or candidates extracted from previously unpublished, legacy reports from thirteen European and European operation-based pharmaceutical companies. The database will be enhanced by incorporation of publically available, high quality toxicology data. Seven academic institutes and five small-to-medium size enterprises (SMEs) contribute with their expertise in data gathering, database curation, data mining, chemoinformatics and predictive systems development. The outcome of the project will be a predictive system contributing to early potential hazard identification and risk assessment during the drug development process. The concept and strategy of the eTOX project is described here, together with current achievements and future deliverables.
Asunto(s)
Bases de Datos Factuales , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Sistemas Especialistas , Bases del Conocimiento , Animales , Minería de Datos , Evaluación Preclínica de Medicamentos , Humanos , Difusión de la Información , Medición de RiesgoRESUMEN
Rapid advancement in high-throughput sequencing and analytical approaches has seen a steady increase in the generation of genomic resources for helminth parasites. Now, helminth genomes and their annotations are a cornerstone of numerous efforts to compare genetic and transcriptomic variation, from single cells to populations of globally distributed parasites, to genome modifications to understand gene function. Our understanding of helminths is increasingly reliant on these genomic resources, which are primarily static once published and vary widely in quality and completeness between species. This article seeks to highlight the cause and effect of this variation and argues for the continued improvement of these genomic resources - even after their publication - which is necessary to provide a more accurate and complete understanding of the biology of these important pathogens.
Asunto(s)
Helmintos , Parásitos , Animales , Genoma , Genoma de los Helmintos/genética , Genómica , Helmintos/genética , Parásitos/genéticaRESUMEN
BACKGROUND: Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. RESULTS: Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. CONCLUSIONS: We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.
Asunto(s)
COVID-19 , Minería de Datos , Humanos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Aprendizaje AutomáticoRESUMEN
Molecular interaction databases aim to systematically capture and organize the experimental interaction information described in the scientific literature. These data can then be used to perform network analysis, to assign putative roles to uncharacterized proteins and to investigate their involvement in cellular pathways.This chapter gives a brief overview of publicly available molecular interaction databases and focuses on the members of the IMEx Consortium, on their curation policies and standard data formats. All of the goals achieved by IMEx databases over the last 15 years, the data types provided and the many different ways in which such data can be utilized by the research community, are described in detail. The IMEx databases curate molecular interaction data to the highest caliber, following a detailed curation model and supplying rich metadata by employing common curation rules and harmonized standards. The IMEx Consortium provides comprehensively annotated molecular interaction data integrated into a single, non-redundant, open access dataset.
Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas , Manejo de Datos , Bases de Datos de Compuestos Químicos , Bases de Datos de Proteínas , Proteínas/metabolismoRESUMEN
BACKGROUND: Automation has been introduced into variant interpretation, but it is not known how automated variant interpretation performs on a stand-alone basis. The purpose of this study was to evaluate a fully automated computerized approach. METHOD: We reviewed all variants encountered in a set of carrier screening panels over a 1-year interval. Observed variants with high-confidence ClinVar interpretations were included in the analysis; those without high-confidence ClinVar entries were excluded. RESULTS: Discrepancy rates between automated interpretations and high-confidence ClinVar entries were analyzed. Of the variants interpreted as positive (likely pathogenic or pathogenic) based on ClinVar information, 22.6% were classified as negative (variants of uncertain significance, likely benign or benign) variants by the automated method. Of the ClinVar negative variants, 1.7% were classified as positive by the automated software. On a per-case basis, which accounts for variant frequency, 63.4% of cases with a ClinVar high-confidence positive variant were classified as negative by the automated method. CONCLUSION: While automation in genetic variant interpretation holds promise, there is still a need for manual review of the output. Additional validation of automated variant interpretation methods should be conducted.
Asunto(s)
Bases de Datos Genéticas , Variación Genética , Humanos , Programas InformáticosRESUMEN
Meiosis, an essential step in gametogenesis, is the key event in sexually reproducing organisms. Thousands of genes have been reported to be involved in meiosis. Therefore, a specialist database is much needed for scientists to know about the function of these genes quickly and to search for genes with potential roles in meiosis. Here, we developed "MeiosisOnline," a publicly accessible, comprehensive database of known functional genes and potential candidates in meiosis (https://mcg.ustc.edu.cn/bsc/meiosis/index.html). A total of 2,052 meiotic genes were manually curated from literature resource and were classified into different categories. Annotation information was provided for both meiotic genes and predicted candidates, including basic information, function, protein-protein interaction (PPI), and expression data. On the other hand, 165 mouse genes were predicted as potential candidates in meiosis using the "Greed AUC Stepwise" algorithm. Thus, MeiosisOnline provides the most updated and detailed information of experimental verified and predicted genes in meiosis. Furthermore, the searching tools and friendly interface of MeiosisOnline will greatly help researchers in studying meiosis in an easy and efficient way.
RESUMEN
BACKGROUND: Osteoporosis is a common, complex disease of bone with a strong heritable component, characterized by low bone mineral density, microarchitectural deterioration of bone tissue and an increased risk of fracture. Due to limited drug selection for osteoporosis and increasing morbidity, mortality of osteoporotic fractures, osteoporosis has become a major health burden in aging societies. Current researches for identifying specific loci or genes involved in osteoporosis contribute to a greater understanding of the pathogenesis of osteoporosis and the development of better diagnosis, prevention and treatment strategies. However, little is known about how most causal genes work and interact to influence osteoporosis. Therefore, it is greatly significant to collect and analyze the studies involved in osteoporosis-related genes. Unfortunately, the information about all these osteoporosis-related genes is scattered in a large amount of extensive literature. Currently, there is no specialized database for easily accessing relevant information about osteoporosis-related genes and miRNAs. METHODS: We extracted data from literature abstracts in PubMed by text-mining and manual curation. Moreover, a local MySQL database containing all the data was developed with PHP on a Windows server. RESULTS: OsteoporosAtlas (http://biokb.ncpsb.org/osteoporosis/), the first specialized database for easily accessing relevant information such as osteoporosis-related genes and miRNAs, was constructed and served for researchers. OsteoporosAtlas enables users to retrieve, browse and download osteoporosis-related genes and miRNAs. Gene ontology and pathway analyses were integrated into OsteoporosAtlas. It currently includes 617 human encoding genes, 131 human non-coding miRNAs, and 128 functional roles. We think that OsteoporosAtlas will be an important bioinformatics resource to facilitate a better understanding of the pathogenesis of osteoporosis and developing better diagnosis, prevention and treatment strategies.