Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Nucleic Acids Res ; 48(D1): D570-D578, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31696235

RESUMEN

MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.


Asunto(s)
Metagenoma , Microbiota , Filogenia , Programas Informáticos , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , ADN Espaciador Ribosómico/genética , Bases de Datos Genéticas , Metagenómica/métodos
2.
Nucleic Acids Res ; 47(W1): W636-W641, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-30976793

RESUMEN

The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.


Asunto(s)
Análisis de Secuencia , Programas Informáticos , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Alineación de Secuencia , Análisis de Secuencia de Proteína
3.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
4.
Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357350

RESUMEN

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Anotación de Secuencia Molecular , Dominios Proteicos , Proteínas/química , Secuencias Repetitivas de Aminoácido
5.
Nucleic Acids Res ; 46(W1): W200-W204, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29905871

RESUMEN

The HMMER webserver [http://www.ebi.ac.uk/Tools/hmmer] is a free-to-use service which provides fast searches against widely used sequence databases and profile hidden Markov model (HMM) libraries using the HMMER software suite (http://hmmer.org). The results of a sequence search may be summarized in a number of ways, allowing users to view and filter the significant hits by domain architecture or taxonomy. For large scale usage, we provide an application programmatic interface (API) which has been expanded in scope, such that all result presentations are available via both HTML and API. Furthermore, we have refactored our JavaScript visualization library to provide standalone components for different result representations. These consume the aforementioned API and can be integrated into third-party websites. The range of databases that can be searched against has been expanded, adding four sequence datasets (12 in total) and one profile HMM library (6 in total). To help users explore the biological context of their results, and to discover new data resources, search results are now supplemented with cross references to other EMBL-EBI databases.


Asunto(s)
Análisis de Secuencia , Programas Informáticos , Dominio Catalítico , Bases de Datos Genéticas , Internet , Cadenas de Markov , Análisis de Secuencia de Proteína , Interfaz Usuario-Computador
6.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
7.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26673716

RESUMEN

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Proteoma/química , Alineación de Secuencia , Análisis de Secuencia de Proteína , Anotación de Secuencia Molecular
8.
Nat Commun ; 5: 4204, 2014 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-25003214

RESUMEN

Dissecting how genetic and environmental influences impact on learning is helpful for maximizing numeracy and literacy. Here we show, using twin and genome-wide analysis, that there is a substantial genetic component to children's ability in reading and mathematics, and estimate that around one half of the observed correlation in these traits is due to shared genetic effects (so-called Generalist Genes). Thus, our results highlight the potential role of the learning environment in contributing to differences in a child's cognitive abilities at age twelve.


Asunto(s)
Dislexia/genética , Genética de Población , Matemática , Carácter Cuantitativo Heredable , Lectura , Gemelos/genética , Niño , Dislexia/psicología , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Aprendizaje , Masculino , Polimorfismo de Nucleótido Simple , Gemelos/psicología , Reino Unido
9.
Biol Psychiatry ; 75(5): 386-97, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-23871474

RESUMEN

BACKGROUND: Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories. METHODS: 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 695,193 SNPs were conducted using UNPHASED, which combines information across families and unrelated individuals. We attempted to replicate signals found in 23 genomic regions using existing data on nonoverlapping samples from the Psychiatric GWAS Consortium and Schizophrenia-GENE-plus cohorts (10,352 schizophrenia patients and 24,474 controls). RESULTS: No individual SNP showed compelling evidence for association with psychosis in our data. However, we observed a trend for association with same risk alleles at loci previously associated with schizophrenia (one-sided p = .003). A polygenic score analysis found that the Psychiatric GWAS Consortium's panel of SNPs associated with schizophrenia significantly predicted disease status in our sample (p = 5 × 10(-14)) and explained approximately 2% of the phenotypic variance. CONCLUSIONS: Although narrowly defined phenotypes have their advantages, we believe new loci may also be discovered through meta-analysis across broad phenotypes. The novel statistical methodology we introduced to model effect size heterogeneity between studies should help future GWAS that combine association evidence from related phenotypes. Applying these approaches, we highlight three loci that warrant further investigation. We found that SNPs conveying risk for schizophrenia are also predictive of disease status in our data.


Asunto(s)
Polimorfismo de Nucleótido Simple/genética , Trastornos Psicóticos/genética , Esquizofrenia/genética , Femenino , Estudios de Asociación Genética , Genotipo , Humanos , Masculino , Fenotipo , Análisis de Componente Principal
10.
Lancet ; 380(9844): 815-23, 2012 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-22763110

RESUMEN

BACKGROUND: Osteoarthritis is the most common form of arthritis worldwide and is a major cause of pain and disability in elderly people. The health economic burden of osteoarthritis is increasing commensurate with obesity prevalence and longevity. Osteoarthritis has a strong genetic component but the success of previous genetic studies has been restricted due to insufficient sample sizes and phenotype heterogeneity. METHODS: We undertook a large genome-wide association study (GWAS) in 7410 unrelated and retrospectively and prospectively selected patients with severe osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We replicated the most promising signals in an independent set of up to 7473 cases and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent. FINDINGS: We identified five genome-wide significant loci (binomial test p≤5·0×10(-8)) for association with osteoarthritis and three loci just below this threshold. The strongest association was on chromosome 3 with rs6976 (odds ratio 1·12 [95% CI 1·08-1·16]; p=7·24×10(-11)), which is in perfect linkage disequilibrium with rs11177. This SNP encodes a missense polymorphism within the nucleostemin-encoding gene GNL3. Levels of nucleostemin were raised in chondrocytes from patients with osteoarthritis in functional studies. Other significant loci were on chromosome 9 close to ASTN2, chromosome 6 between FILIP1 and SENP6, chromosome 12 close to KLHDC5 and PTHLH, and in another region of chromosome 12 close to CHST11. One of the signals close to genome-wide significance was within the FTO gene, which is involved in regulation of bodyweight-a strong risk factor for osteoarthritis. All risk variants were common in frequency and exerted small effects. INTERPRETATION: Our findings provide insight into the genetics of arthritis and identify new pathways that might be amenable to future therapeutic intervention. FUNDING: arcOGEN was funded by a special purpose grant from Arthritis Research UK.


Asunto(s)
Osteoartritis/genética , Artroplastia de Reemplazo , Estudios de Casos y Controles , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Masculino , Osteoartritis/cirugía , Osteoartritis de la Cadera/genética , Osteoartritis de la Cadera/cirugía , Osteoartritis de la Rodilla/genética , Osteoartritis de la Rodilla/cirugía , Polimorfismo de Nucleótido Simple
11.
Nature ; 476(7359): 214-9, 2011 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-21833088

RESUMEN

Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Inmunidad Celular/inmunología , Esclerosis Múltiple/genética , Esclerosis Múltiple/inmunología , Alelos , Diferenciación Celular/inmunología , Europa (Continente)/etnología , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Antígenos HLA-A/genética , Antígenos HLA-DR/genética , Cadenas HLA-DRB1 , Humanos , Inmunidad Celular/genética , Complejo Mayor de Histocompatibilidad/genética , Polimorfismo de Nucleótido Simple/genética , Tamaño de la Muestra , Linfocitos T Colaboradores-Inductores/citología , Linfocitos T Colaboradores-Inductores/inmunología
12.
Nat Genet ; 41(11): 1182-90, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19820697

RESUMEN

The number and volume of cells in the blood affect a wide range of disorders including cancer and cardiovascular, metabolic, infectious and immune conditions. We consider here the genetic variation in eight clinically relevant hematological parameters, including hemoglobin levels, red and white blood cell counts and platelet counts and volume. We describe common variants within 22 genetic loci reproducibly associated with these hematological parameters in 13,943 samples from six European population-based studies, including 6 associated with red blood cell parameters, 15 associated with platelet parameters and 1 associated with total white blood cell count. We further identified a long-range haplotype at 12q24 associated with coronary artery disease and myocardial infarction in 9,479 cases and 10,527 controls. We show that this haplotype demonstrates extensive disease pleiotropy, as it contains known risk loci for type 1 diabetes, hypertension and celiac disease and has been spread by a selective sweep specific to European and geographically nearby populations.


Asunto(s)
Células Sanguíneas , Genoma Humano , Estudio de Asociación del Genoma Completo , Recuento de Células Sanguíneas , Células Sanguíneas/citología , Cromosomas Humanos Par 12 , Enfermedad de la Arteria Coronaria/genética , Marcadores Genéticos , Humanos , Polimorfismo de Nucleótido Simple , Selección Genética
13.
Blood ; 113(16): 3831-7, 2009 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-19221038

RESUMEN

Mean platelet volume (MPV) and platelet count (PLT) are highly heritable and tightly regulated traits. We performed a genome-wide association study for MPV and identified one SNP, rs342293, as having highly significant and reproducible association with MPV (per-G allele effect 0.016 +/- 0.001 log fL; P < 1.08 x 10(-24)) and PLT (per-G effect -4.55 +/- 0.80 10(9)/L; P < 7.19 x 10(-8)) in 8586 healthy subjects. Whole-genome expression analysis in the 1-MB region showed a significant association with platelet transcript levels for PIK3CG (n = 35; P = .047). The G allele at rs342293 was also associated with decreased binding of annexin V to platelets activated with collagen-related peptide (n = 84; P = .003). The region 7q22.3 identifies the first QTL influencing platelet volume, counts, and function in healthy subjects. Notably, the association signal maps to a chromosome region implicated in myeloid malignancies, indicating this site as an important regulatory site for hematopoiesis. The identification of loci regulating MPV by this and other studies will increase our insight in the processes of megakaryopoiesis and proplatelet formation, and it may aid the identification of genes that are somatically mutated in essential thrombocytosis.


Asunto(s)
Plaquetas , Cromosomas Humanos Par 7/genética , Genoma Humano/genética , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo/genética , Trombopoyesis/genética , Adulto , Anciano , Mapeo Cromosómico , Estudios de Cohortes , Femenino , Regulación de la Expresión Génica/genética , Neoplasias Hematológicas/genética , Humanos , Masculino , Persona de Mediana Edad , Recuento de Plaquetas , Trombocitemia Esencial/genética
14.
Nat Genet ; 41(1): 77-81, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19060907

RESUMEN

To identify previously unknown genetic loci associated with fasting glucose concentrations, we examined the leading association signals in ten genome-wide association scans involving a total of 36,610 individuals of European descent. Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten studies. The strongest signal was observed at rs10830963, where each G allele (frequency 0.30 in HapMap CEU) was associated with an increase of 0.07 (95% CI = 0.06-0.08) mmol/l in fasting glucose levels (P = 3.2 x 10(-50)) and reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, P = 1.1 x 10(-15)). The same allele was associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05-1.12), per G allele P = 3.3 x 10(-7)) in a meta-analysis of 13 case-control studies totaling 18,236 cases and 64,453 controls. Our analyses also confirm previous associations of fasting glucose with variants at the G6PC2 (rs560887, P = 1.1 x 10(-57)) and GCK (rs4607517, P = 1.0 x 10(-25)) loci.


Asunto(s)
Glucemia/genética , Ayuno/sangre , Polimorfismo de Nucleótido Simple/genética , Receptor de Melatonina MT2/genética , Receptores de Melatonina/genética , Estudios de Casos y Controles , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/fisiopatología , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Metaanálisis como Asunto , Sitios de Carácter Cuantitativo/genética
16.
Genome Res ; 14(5): 934-41, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123589

RESUMEN

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.


Asunto(s)
Biología Computacional/métodos , Secuencia de Bases/genética , ADN/genética , Bases de Datos Genéticas/normas , Lenguajes de Programación , Proteínas/clasificación , Programas Informáticos , Diseño de Software
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...