Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 122
Filter
Add more filters










Publication year range
1.
Stand Genomic Sci ; 11(1): 69, 2016.
Article in English | MEDLINE | ID: mdl-27617059

ABSTRACT

BACKGROUND: Efforts to harmonize genomic data standards used by the biodiversity and metagenomic research communities have shown that prokaryotic data cannot be understood or represented in a traditional, classical biological context for conceptual reasons, not technical ones. RESULTS: Biology, like physics, has a fundamental duality-the classical macroscale eukaryotic realm vs. the quantum microscale microbial realm-with the two realms differing profoundly, and counter-intuitively, from one another. Just as classical physics is emergent from and cannot explain the microscale realm of quantum physics, so classical biology is emergent from and cannot explain the microscale realm of prokaryotic life. Classical biology describes the familiar, macroscale realm of multi-cellular eukaryotic organisms, which constitute a highly derived and constrained evolutionary subset of the biosphere, unrepresentative of the vast, mostly unseen, microbial world of prokaryotic life that comprises at least half of the planet's biomass and most of its genetic diversity. The two realms occupy fundamentally different mega-niches: eukaryotes interact primarily mechanically with the environment, prokaryotes primarily physiologically. Further, many foundational tenets of classical biology simply do not apply to prokaryotic biology. CONCLUSIONS: Classical genetics one held that genes, arranged on chromosomes like beads on a string, were the fundamental units of mutation, recombination, and heredity. Then, molecular analysis showed that there were no fundamental units, no beads, no string. Similarly, classical biology asserts that individual organisms and species are fundamental units of ecology, evolution, and biodiversity, composing an evolutionary history of objectively real, lineage-defined groups in a single-rooted tree of life. Now, metagenomic tools are forcing a recognition that there are no completely objective individuals, no unique lineages, and no one true tree. The newly revealed biosphere of microbial dark matter cannot be understood merely by extending the concepts and methods of eukaryotic macrobiology. The unveiling of biological dark matter is allowing us to see, for the first time, the diversity of the entire biosphere and, to paraphrase Darwin, is providing a new view of life. Advancing and understanding that view will require major revisions to some of the most fundamental concepts and theories in biology.

2.
Stand Genomic Sci ; 9(3): 599-601, 2014 Jun 15.
Article in English | MEDLINE | ID: mdl-25197446

ABSTRACT

The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.

4.
PLoS One ; 9(3): e89606, 2014.
Article in English | MEDLINE | ID: mdl-24595056

ABSTRACT

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.


Subject(s)
Biodiversity , Knowledge , Semantics
6.
Brief Bioinform ; 13(6): 656-68, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22772836

ABSTRACT

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.


Subject(s)
Algorithms , Metagenome , Cluster Analysis , Metagenomics , Sequence Analysis, DNA
7.
Stand Genomic Sci ; 7(1): 153-8, 2012 Oct 10.
Article in English | MEDLINE | ID: mdl-23451293

ABSTRACT

At the GSC11 meeting (4-6 April 2011, Hinxton, England, the GSC's genomic biodiversity working group (GBWG) developed an initial model for a data management testbed at the interface of biodiversity with genomics and metagenomics. With representatives of the Global Biodiversity Information Facility (GBIF) participating, it was agreed that the most useful course of action would be for GBIF to collaborate with the GSC in its ongoing GBWG workshops to achieve common goals around interoperability/data integration across (meta)-genomic and species level data. It was determined that a quick comparison should be made of the contents of the Darwin Core (DwC) and the GSC data checklists, with a goal of determining their degree of overlap and compatibility. An ad-hoc task group lead by Renzo Kottman and Peter Dawyndt undertook an initial comparison between the Darwin Core (DwC) standard used by the Global Biodiversity Information Facility (GBIF) and the MIxS checklists put forward by the Genomic Standards Consortium (GSC). A term-by-term comparison showed that DwC and GSC concepts complement each other far more than they compete with each other. Because the preliminary analysis done at this meeting was based on expertise with GSC standards, but not with DwC standards, the group recommended that a joint meeting of DwC and GSC experts be convened as soon as possible to continue this joint assessment and to propose additional work going forward.

8.
Stand Genomic Sci ; 7(1): 159-65, 2012 Oct 10.
Article in English | MEDLINE | ID: mdl-23451294

ABSTRACT

Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of 'fitness for use' for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC's standard checklists for genomics and metagenomics and (b) TDWG's Darwin Core standard, used primarily in taxonomy and systematic biology.

9.
Stand Genomic Sci ; 7(1): 171-4, 2012 Oct 10.
Article in English | MEDLINE | ID: mdl-23409219

ABSTRACT

Following up on efforts from two earlier workshops, a meeting was convened in San Diego to (a) establish working connections between experts in the use of the Darwin Core and the GSC MIxS standards, (b) conduct mutual briefings to promote knowledge exchange and to increase the understanding of the two communities' approaches, constraints, community goals, subtleties, etc., (c) perform an element-by-element comparison of the two standards, assessing the compatibility and complementarity of the two approaches, (d) propose and consider possible use cases and test beds in which a joint annotation approach might be tried, to useful scientific effect, and (e) propose additional action items necessary to continue the development of this joint effort. Several focused working teams were identified to continue the work after the meeting ended.

10.
PLoS One ; 6(11): e27396, 2011.
Article in English | MEDLINE | ID: mdl-22087307

ABSTRACT

The "Function to Find Domain" (FIIND)-containing proteins CARD8 (Cardinal; Tucan) and NLRP1 (NALP1; NAC) are well known components of inflammasomes, multiprotein complexes responsible for activation of caspase-1, a regulator of inflammation and innate immunity. Although identified many years ago, the role of the FIIND is unknown. Here, we report that CARD8 and NLRP1 undergo autoproteolytic cleavage at a conserved SF/S motif within the FIIND. Using bioinformatics and computational modeling approaches, we detected striking structural similarity between the FIIND and the ZU5-UPA domain present in the autoproteolytic protein PIDD. This allowed us to generate a three-dimensional model and to gain insights in the molecular mechanism of the cleavage. Site-directed mutagenesis experiments revealed that the second serine of the SF/S motif is required for CARD8 and NLRP1 autoproteolysis. Furthermore, we discovered an important function for conserved glutamic acid and histidine residues, located in proximity of the cleavage site in regulating the autoprocessing efficiency. Altogether, these results identify a function for the FIIND and show that CARD8 and NLRP1 are ZU5-UPA domain-containing autoproteolytic proteins, thus suggesting a novel mechanism for regulating innate immune responses.


Subject(s)
Adaptor Proteins, Signal Transducing/metabolism , Apoptosis Regulatory Proteins/metabolism , CARD Signaling Adaptor Proteins/metabolism , Neoplasm Proteins/metabolism , Peptide Hydrolases/metabolism , Binding Sites , Cell Line , Humans , Immunity, Innate , Inflammasomes , Multiprotein Complexes/immunology , NLR Proteins , Protein Structure, Tertiary
11.
PLoS Biol ; 9(6): e1001088, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21713030

ABSTRACT

A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.


Subject(s)
Databases, Genetic , Genomics/standards , International Cooperation , Metagenome
12.
Proteins ; 79(8): 2389-402, 2011 Aug.
Article in English | MEDLINE | ID: mdl-21671455

ABSTRACT

The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.


Subject(s)
Proteins/chemistry , Cluster Analysis , Databases, Protein
13.
Nucleic Acids Res ; 39(Database issue): D494-6, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20961957

ABSTRACT

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.


Subject(s)
Databases, Protein , Protein Conformation , Genomics , Proteins/chemistry , Proteins/genetics , User-Computer Interface
14.
Nucleic Acids Res ; 39(Database issue): D546-51, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21045053

ABSTRACT

The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.


Subject(s)
Databases, Genetic , Metagenome , Environment , Metagenomics , Software
15.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1137-42, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944202

ABSTRACT

The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverage of an entire organism. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination. The pipeline uses parallel processing methods at almost every step in the process and can adapt to a wide range of protein targets from bacterial to human. The construction, expansion and optimization of the JCSG gene-to-structure pipeline over the years have resulted in many technological and methodological advances and developments. The vast number of targets and the enormous amounts of associated data processed through the multiple stages of the experimental pipeline required the development of variety of valuable resources that, wherever feasible, have been converted to free-access web-based tools and applications.


Subject(s)
Databases, Genetic , Genomics , Humans , Protein Conformation
16.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1143-7, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944203

ABSTRACT

The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (`The Open Protein Structure Annotation Network'), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in insightful structure-function analysis for many proteins and have led to numerous peer-reviewed publications, as exemplified by the articles included in this issue of Acta Crystallographica Section F.


Subject(s)
Databases, Genetic , Genomics , Humans , Internet , Protein Conformation
17.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1153-9, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944205

ABSTRACT

The first structural representative of the domain of unknown function DUF2006 family, also known as Pfam family PF09410, comprises a lipocalin-like fold with domain duplication. The finding of the calycin signature in the N-terminal domain, combined with remote sequence similarity to two other protein families (PF07143 and PF08622) implicated in isoprenoid metabolism and the oxidative stress response, support an involvement in lipid metabolism. Clusters of conserved residues that interact with ligand mimetics suggest that the binding and regulation sites map to the N-terminal domain and to the interdomain interface, respectively.


Subject(s)
Bacterial Proteins/chemistry , Databases, Genetic , Lipid Metabolism , Nitrosomonas europaea/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Nitrosomonas europaea/metabolism , Oxidative Stress , Protein Structure, Tertiary , Sequence Alignment , Sequence Homology, Amino Acid
18.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1160-6, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944206

ABSTRACT

SSO2064 is the first structural representative of PF01796 (DUF35), a large prokaryotic family with a wide phylogenetic distribution. The structure reveals a novel two-domain architecture comprising an N-terminal, rubredoxin-like, zinc ribbon and a C-terminal, oligonucleotide/oligosaccharide-binding (OB) fold domain. Additional N-terminal helical segments may be involved in protein-protein interactions. Domain architectures, genomic context analysis and functional evidence from certain bacterial representatives of this family suggest that these proteins form a novel fatty-acid-binding component that is involved in the biosynthesis of lipids and polyketide antibiotics and that they possibly function as acyl-CoA-binding proteins. This structure has led to a re-evaluation of the DUF35 family, which has now been split into two entries in the latest Pfam release (v.24.0).


Subject(s)
Acyl Coenzyme A/chemistry , Archaeal Proteins/chemistry , Protein Folding , Sulfolobus solfataricus/chemistry , Zinc/chemistry , Amino Acid Sequence , Archaeal Proteins/genetics , Archaeal Proteins/metabolism , Crystallography, X-Ray , Genome, Archaeal , Models, Molecular , Molecular Sequence Data , Protein Binding , Protein Structure, Tertiary , Sulfolobus solfataricus/genetics , Sulfolobus solfataricus/metabolism
19.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1167-73, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944207

ABSTRACT

The crystal structure of Dhaf4260 from Desulfitobacterium hafniense DCB-2 was determined by single-wavelength anomalous diffraction (SAD) to a resolution of 2.01 Šusing the semi-automated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG) as part of the NIGMS Protein Structure Initiative (PSI). This protein structure is the first representative of the PF04016 (DUF364) Pfam family and reveals a novel combination of two well known domains (an enolase N-terminal-like fold followed by a Rossmann-like domain). Structural and bioinformatic analyses reveal partial similarities to Rossmann-like methyltransferases, with residues from the enolase-like fold combining to form a unique active site that is likely to be involved in the condensation or hydrolysis of molecules implicated in the synthesis of flavins, pterins or other siderophores. The genome context of Dhaf4260 and homologs additionally supports a role in heavy-metal chelation.


Subject(s)
Bacterial Proteins/chemistry , Desulfitobacterium/chemistry , Metals, Heavy/chemistry , Phosphopyruvate Hydratase/chemistry , Protein Folding , Amino Acid Sequence , Bacterial Proteins/metabolism , Catalytic Domain , Crystallography, X-Ray , Desulfitobacterium/metabolism , Metals, Heavy/metabolism , Models, Molecular , Molecular Sequence Data , Protein Binding , Protein Structure, Tertiary
20.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1198-204, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944211

ABSTRACT

The crystal structure of Jann_2411 from Jannaschia sp. strain CCS1, a member of the Pfam PF07336 family classified as a domain of unknown function (DUF1470), was solved to a resolution of 1.45 Šby multiple-wavelength anomalous dispersion (MAD). This protein is the first structural representative of the DUF1470 Pfam family. Structural analysis revealed a two-domain organization, with the N-terminal domain presenting a new fold called the ABATE domain that may bind an as yet unknown ligand. The C-terminal domain forms a treble-clef zinc finger that is likely to be involved in DNA binding. Analysis of the Jann_2411 protein and the broader ABATE-domain family suggests a role as stress-induced transcriptional regulators.


Subject(s)
Bacterial Proteins/chemistry , Rhodobacteraceae/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Protein Structure, Quaternary , Protein Structure, Tertiary , Sequence Alignment , Zinc Fingers
SELECTION OF CITATIONS
SEARCH DETAIL
...