|

1.

The Canadian VirusSeq Data Portal & Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology.

Gill, Erin E; Jia, Baofeng; Murall, Carmen Lia; Poujol, Raphaël; Anwar, Muhammad Zohaib; John, Nithu Sara; Richardsson, Justin; Hobb, Ashley; Olabode, Abayomi S; Lepsa, Alexandru; Duggan, Ana T; Tyler, Andrea D; N'Guessan, Arnaud; Kachru, Atul; Chan, Brandon; Yoshida, Catherine; Yung, Christina K; Bujold, David; Andric, Dusan; Su, Edmund; Griffiths, Emma J; Domselaar, Gary Van; Jolly, Gordon W; Ward, Heather K E; Feher, Henrich; Baker, Jared; Simpson, Jared T; Uddin, Jaser; Ragoussis, Jiannis; Eubank, Jon; Fritz, Jörg H; Gálvez, José Héctor; Fang, Karen; Cullion, Kim; Rivera, Leonardo; Xiang, Linda; Croxen, Matthew A; Shiell, Mitchell; Prystajecky, Natalie; Quirion, Pierre-Olivier; Bajari, Rosita; Rich, Samantha; Mubareka, Samira; Moreira, Sandrine; Cain, Scott; Sutcliffe, Steven G; Kraemer, Susanne A; Joly, Yann; Alturmessov, Yelizar; Consortium, Cphln.

ArXiv ; 2024 May 08.

Article En | MEDLINE | ID: mdl-38764594

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). The Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. Here we also highlight Duotang, a web platform that presents genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

2.

WormBase 2024: status and transitioning to Alliance infrastructure.

Sternberg, Paul W; Van Auken, Kimberly; Wang, Qinghua; Wright, Adam; Yook, Karen; Zarowiecki, Magdalena; Arnaboldi, Valerio; Becerra, Andrés; Brown, Stephanie; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Cho, Jaehyoung; Davis, Paul; Diamantakis, Stavros; Dyer, Sarah; Grigoriadis, Dionysis; Grove, Christian A; Harris, Todd; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Longden, Ian; Luypaert, Manuel; Müller, Hans-Michael; Nuin, Paulo; Quinton-Tulloch, Mark; Raciti, Daniela; Schedl, Tim; Schindelman, Gary; Stein, Lincoln.

Genetics ; 227(1)2024 May 07.

Article En | MEDLINE | ID: mdl-38573366

WormBase has been the major repository and knowledgebase of information about the genome and genetics of Caenorhabditis elegans and other nematodes of experimental interest for over 2 decades. We have 3 goals: to keep current with the fast-paced C. elegans research, to provide better integration with other resources, and to be sustainable. Here, we discuss the current state of WormBase as well as progress and plans for moving core WormBase infrastructure to the Alliance of Genome Resources (the Alliance). As an Alliance member, WormBase will continue to interact with the C. elegans community, develop new features as needed, and curate key information from the literature and large-scale projects.

Caenorhabditis elegans , Caenorhabditis elegans/genetics , Animals , Databases, Genetic , Genome, Helminth , Genomics/methods

3.

FAIR Header Reference genome: a TRUSTworthy standard.

Wright, Adam; Wilkinson, Mark D; Mungall, Christopher; Cain, Scott; Richards, Stephen; Sternberg, Paul; Provin, Ellen; Jacobs, Jonathan L; Geib, Scott; Raciti, Daniela; Yook, Karen; Stein, Lincoln; Molik, David C.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38555475

The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.

Software , Humans , Genome , Genomics , Information Dissemination

4.

DATA RESOURCES AND ANALYSES FAIR Header Reference genome: A TRUSTworthy standard.

Wright, Adam; Wilkinson, Mark D; Mungall, Chris; Cain, Scott; Richards, Stephen; Sternberg, Paul; Provin, Ellen; Jacobs, Jonathan L; Geib, Scott; Raciti, Daniela; Yook, Karen; Stein, Lincoln; Molik, David C.

bioRxiv ; 2023 Dec 20.

Article En | MEDLINE | ID: mdl-38076838

The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR-bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability, and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability, and Technology (TRUST). The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility, and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.

5.

JBrowse 2: a modular genome browser with views of synteny and structural variation.

Diesh, Colin; Stevens, Garrett J; Xie, Peter; De Jesus Martinez, Teresa; Hershberg, Elliot A; Leung, Angel; Guo, Emma; Dider, Shihab; Zhang, Junjun; Bridge, Caroline; Hogue, Gregory; Duncan, Andrew; Morgan, Matthew; Flores, Tia; Bimber, Benjamin N; Haw, Robin; Cain, Scott; Buels, Robert M; Stein, Lincoln D; Holmes, Ian H.

Genome Biol ; 24(1): 74, 2023 04 17.

Article En | MEDLINE | ID: mdl-37069644

We present JBrowse 2, a general-purpose genome annotation browser offering enhanced visualization of complex structural variation and evolutionary relationships. It retains core features of JBrowse while adding new views for synteny, dotplots, breakpoints, gene fusions, and whole-genome overviews. It allows users to share sessions, open multiple genomes, and navigate between views. It can be embedded in a web page, used as a standalone application, or run from Jupyter notebooks or R sessions. These improvements are enabled by a ground-up redesign using modern web technology. We describe application functionality, use cases, performance benchmarks, and implementation notes for web administrators and developers.

Genomics , Software , Synteny , Genome , Biological Evolution , Web Browser , Internet

6.

JBrowse Jupyter: a Python interface to JBrowse 2.

De Jesus Martinez, Teresa; Hershberg, Elliot A; Guo, Emma; Stevens, Garrett J; Diesh, Colin; Xie, Peter; Bridge, Caroline; Cain, Scott; Haw, Robin; Buels, Robert M; Stein, Lincoln D; Holmes, Ian H.

Bioinformatics ; 39(1)2023 01 01.

Article En | MEDLINE | ID: mdl-36648320

MOTIVATION: JBrowse Jupyter is a package that aims to close the gap between Python programming and genomic visualization. Web-based genome browsers are routinely used for publishing and inspecting genome annotations. Historically they have been deployed at the end of bioinformatics pipelines, typically decoupled from the analysis itself. However, emerging technologies such as Jupyter notebooks enable a more rapid iterative cycle of development, analysis and visualization. RESULTS: We have developed a package that provides a Python interface to JBrowse 2's suite of embeddable components, including the primary Linear Genome View. The package enables users to quickly set up, launch and customize JBrowse views from Jupyter notebooks. In addition, users can share their data via Google's Colab notebooks, providing reproducible interactive views. AVAILABILITY AND IMPLEMENTATION: JBrowse Jupyter is released under the Apache License and is available for download on PyPI. Source code and demos are available on GitHub at https://github.com/GMOD/jbrowse-jupyter.

Computational Biology , Genomics , Software , Genome , Web Browser

7.

WormBase in 2022-data, processes, and tools for analyzing Caenorhabditis elegans.

Davis, Paul; Zarowiecki, Magdalena; Arnaboldi, Valerio; Becerra, Andrés; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Cho, Jaehyoung; da Veiga Beltrame, Eduardo; Diamantakis, Stavros; Gao, Sibyl; Grigoriadis, Dionysis; Grove, Christian A; Harris, Todd W; Kishore, Ranjana; Le, Tuan; Lee, Raymond Y N; Luypaert, Manuel; Müller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Quinton-Tulloch, Mark; Raciti, Daniela; Rodgers, Faye H; Russell, Matthew; Schindelman, Gary; Singh, Archana; Stickland, Tim; Van Auken, Kimberly; Wang, Qinghua; Williams, Gary; Wright, Adam J; Yook, Karen; Berriman, Matt; Howe, Kevin L; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Genetics ; 220(4)2022 04 04.

Article En | MEDLINE | ID: mdl-35134929

WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.

Caenorhabditis , Nematoda , Animals , Caenorhabditis/genetics , Caenorhabditis elegans/genetics , Databases, Genetic , Genome , Genomics , Humans , Nematoda/genetics

8.

WormBase: a modern Model Organism Information Resource.

Harris, Todd W; Arnaboldi, Valerio; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Cho, Jaehyoung; Davis, Paul; Gao, Sibyl; Grove, Christian A; Kishore, Ranjana; Lee, Raymond Y N; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Rodgers, Faye H; Russell, Matthew; Schindelman, Gary; Auken, Kimberly V; Wang, Qinghua; Williams, Gary; Wright, Adam J; Yook, Karen; Howe, Kevin L; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Nucleic Acids Res ; 48(D1): D762-D767, 2020 01 08.

Article En | MEDLINE | ID: mdl-31642470

WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.

Caenorhabditis elegans/genetics , Databases, Genetic , Genes, Helminth , Animals , Data Mining , Genomics , Internet , User-Computer Interface

9.

Using WormBase: A Genome Biology Resource for Caenorhabditis elegans and Related Nematodes.

Grove, Christian; Cain, Scott; Chen, Wen J; Davis, Paul; Harris, Todd; Howe, Kevin L; Kishore, Ranjana; Lee, Raymond; Paulini, Michael; Raciti, Daniela; Tuli, Mary Ann; Van Auken, Kimberly; Williams, Gary.

Methods Mol Biol ; 1757: 399-470, 2018.

Article En | MEDLINE | ID: mdl-29761466

WormBase ( www.wormbase.org ) provides the nematode research community with a centralized database for information pertaining to nematode genes and genomes. As more nematode genome sequences are becoming available and as richer data sets are published, WormBase strives to maintain updated information, displays, and services to facilitate efficient access to and understanding of the knowledge generated by the published nematode genetics literature. This chapter aims to provide an explanation of how to use basic features of WormBase, new features, and some commonly used tools and data queries. Explanations of the curated data and step-by-step instructions of how to access the data via the WormBase website and available data mining tools are provided.

Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Genomics , Animals , Computational Biology/methods , Data Mining/methods , Epistasis, Genetic , Gene Ontology , Genes, Helminth , Genomics/methods , Humans , Phenotype , Proteome , Search Engine , Software , Transcriptome , User-Computer Interface , Web Browser

10.

WormBase 2017: molting into a new stage.

Lee, Raymond Y N; Howe, Kevin L; Harris, Todd W; Arnaboldi, Valerio; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Davis, Paul; Gao, Sibyl; Grove, Christian; Kishore, Ranjana; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Rodgers, Faye; Russell, Matt; Schindelman, Gary; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Qinghua; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Nucleic Acids Res ; 46(D1): D869-D874, 2018 01 04.

Article En | MEDLINE | ID: mdl-29069413

WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.

Databases, Genetic , Genome , Nematoda/genetics , Animals , Caenorhabditis/genetics , Caenorhabditis elegans/genetics , Data Curation , Data Mining , Datasets as Topic , Disease Models, Animal , Forecasting , Gene Ontology , Humans , Information Storage and Retrieval , Platyhelminths/genetics , Publishing , RNA Interference , Sequence Alignment , User-Computer Interface , Web Browser

11.

WormBase 2016: expanding to enable helminth genomic research.

Howe, Kevin L; Bolt, Bruce J; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Davis, Paul; Done, James; Down, Thomas; Gao, Sibyl; Grove, Christian; Harris, Todd W; Kishore, Ranjana; Lee, Raymond; Lomax, Jane; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Stanley, Eleanor; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Nucleic Acids Res ; 44(D1): D774-80, 2016 Jan 04.

Article En | MEDLINE | ID: mdl-26578572

WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.

Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Genomics , Nematoda/genetics , Animals , Genes, Helminth , Molecular Sequence Annotation , Platyhelminths/genetics , Software

12.

The Chado Natural Diversity module: a new generic database schema for large-scale phenotyping and genotyping data.

Jung, Sook; Menda, Naama; Redmond, Seth; Buels, Robert M; Friesen, Maren; Bendana, Yuri; Sanderson, Lacey-Anne; Lapp, Hilmar; Lee, Taein; MacCallum, Bob; Bett, Kirstin E; Cain, Scott; Clements, Dave; Mueller, Lukas A; Main, Dorrie.

Database (Oxford) ; 2011: bar051, 2011.

Article En | MEDLINE | ID: mdl-22120662

Linking phenotypic with genotypic diversity has become a major requirement for basic and applied genome-centric biological research. To meet this need, a comprehensive database backend for efficiently storing, querying and analyzing large experimental data sets is necessary. Chado, a generic, modular, community-based database schema is widely used in the biological community to store information associated with genome sequence data. To meet the need to also accommodate large-scale phenotyping and genotyping projects, a new Chado module called Natural Diversity has been developed. The module strictly adheres to the Chado remit of being generic and ontology driven. The flexibility of the new module is demonstrated in its capacity to store any type of experiment that either uses or generates specimens or stock organisms. Experiments may be grouped or structured hierarchically, whereas any kind of biological entity can be stored as the observed unit, from a specimen to be used in genotyping or phenotyping experiments, to a group of species collected in the field that will undergo further lab analysis. We describe details of the Natural Diversity module, including the design approach, the relational schema and use cases implemented in several databases.

Biodiversity , Computational Biology/methods , Database Management Systems , Databases, Factual , Animals , Genotype , Internet , Phenotype , Plants

13.

GMODWeb: a web framework for the Generic Model Organism Database.

O'Connor, Brian D; Day, Allen; Cain, Scott; Arnaiz, Olivier; Sperling, Linda; Stein, Lincoln D.

Genome Biol ; 9(6): R102, 2008.

Article En | MEDLINE | ID: mdl-18570664

The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of model organism database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from http://turnkey.sourceforge.net.

Databases, Genetic , Models, Biological , Software , Animals , Humans

14.

ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data.

Arnaiz, Olivier; Cain, Scott; Cohen, Jean; Sperling, Linda.

Nucleic Acids Res ; 35(Database issue): D439-44, 2007 Jan.

Article En | MEDLINE | ID: mdl-17142227

ParameciumDB (http://paramecium.cgm.cnrs-gif.fr) is a new model organism database associated with the genome sequencing project of the unicellular eukaryote Paramecium tetraurelia. Built with the core components of the Generic Model Organism Database (GMOD) project, ParameciumDB currently contains the genome sequence and annotations, linked to available genetic data including the Gif Paramecium stock collection. It is thus possible to navigate between sequences and stocks via the genes and alleles. Phenotypes, of mutant strains and of knockdowns obtained by RNA interference, are captured using controlled vocabularies according to the Entity-Attribute-Value model. ParameciumDB currently supports browsing of phenotypes, alleles and stocks as well as querying of sequence features (genes, UniProt matches, InterPro domains, Gene Ontology terms) and of genetic data (phenotypes, stocks, RNA interference experiments). Forms allow submission of RNA interference data and some bioinformatics services are available. Future ParameciumDB development plans include coordination of human curation of the near 40 000 gene models by members of the research community.

Databases, Nucleic Acid , Genome, Protozoan , Paramecium tetraurelia/genetics , Alleles , Animals , Genes, Protozoan , Genomics , Internet , Models, Genetic , Mutation , Phenotype , RNA Interference , Systems Integration , User-Computer Interface