Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nucleic Acids Res ; 52(D1): D808-D816, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37953350

RESUMEN

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes.


Asunto(s)
Biología Computacional , Eucariontes , Animales , Biología Computacional/métodos , Invertebrados , Bases de Datos Factuales
2.
Nucleic Acids Res ; 50(D1): D898-D911, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34718728

RESUMEN

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.


Asunto(s)
Bases de Datos Factuales , Vectores de Enfermedades/clasificación , Interacciones Huésped-Patógeno/genética , Fenotipo , Interfaz Usuario-Computador , Animales , Apicomplexa/clasificación , Apicomplexa/genética , Apicomplexa/patogenicidad , Bacterias/clasificación , Bacterias/genética , Bacterias/patogenicidad , Enfermedades Transmisibles/microbiología , Enfermedades Transmisibles/parasitología , Enfermedades Transmisibles/patología , Enfermedades Transmisibles/transmisión , Biología Computacional/métodos , Minería de Datos/métodos , Diplomonadida/clasificación , Diplomonadida/genética , Diplomonadida/patogenicidad , Hongos/clasificación , Hongos/genética , Hongos/patogenicidad , Humanos , Insectos/clasificación , Insectos/genética , Insectos/patogenicidad , Internet , Nematodos/clasificación , Nematodos/genética , Nematodos/patogenicidad , Filogenia , Virulencia , Flujo de Trabajo
3.
BMC Genomics ; 22(1): 422, 2021 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-34103015

RESUMEN

BACKGROUND: Whole genome re-sequencing provides powerful data for population genomic studies, allowing robust inferences of population structure, gene flow and evolutionary history. For the major malaria vector in Africa, Anopheles gambiae, other genetic aspects such as selection and adaptation are also important. In the present study, we explore population genetic variation from genome-wide sequencing of 765 An. gambiae and An. coluzzii specimens collected from across Africa. We used t-SNE, a recently popularized dimensionality reduction method, to create a 2D-map of An. gambiae and An. coluzzii genes that reflect their population structure similarities. RESULTS: The map allows intuitive navigation among genes distributed throughout the so-called "mainland" and numerous surrounding "island-like" gene clusters. These gene clusters of various sizes correspond predominantly to low recombination genomic regions such as inversions and centromeres, and also to recent selective sweeps. Because this mosquito species complex has been studied extensively, we were able to support our interpretations with previously published findings. Several novel observations and hypotheses are also made, including selective sweeps and a multi-locus selection event in Guinea-Bissau, a known intense hybridization zone between An. gambiae and An. coluzzii. CONCLUSIONS: Our results present a rich dataset that could be utilized in functional investigations aiming to shed light onto An. gambiae s.l genome evolution and eventual speciation. In addition, the methodology presented here can be used to further characterize other species not so well studied as An. gambiae, shortening the time required to progress from field sampling to the identification of genes and genomic regions under unique evolutionary processes.


Asunto(s)
Anopheles , Malaria , África , Animales , Anopheles/genética , Guinea Bissau , Islas , Malaria/genética , Mosquitos Vectores/genética
4.
Nucleic Acids Res ; 43(Database issue): D707-13, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25510499

RESUMEN

VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/.


Asunto(s)
Bases de Datos Genéticas , Vectores de Enfermedades , Genómica , Animales , Ontologías Biológicas , Perfilación de la Expresión Génica , Variación Genética , Genoma , Humanos , Resistencia a los Insecticidas , Internet , Invertebrados/genética , Redes y Vías Metabólicas/genética
5.
Proc Natl Acad Sci U S A ; 109(30): 12081-6, 2012 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-22711832

RESUMEN

Music evolves as composers, performers, and consumers favor some musical variants over others. To investigate the role of consumer selection, we constructed a Darwinian music engine consisting of a population of short audio loops that sexually reproduce and mutate. This population evolved for 2,513 generations under the selective influence of 6,931 consumers who rated the loops' aesthetic qualities. We found that the loops quickly evolved into music attributable, in part, to the evolution of aesthetically pleasing chords and rhythms. Later, however, evolution slowed. Applying the Price equation, a general description of evolutionary processes, we found that this stasis was mostly attributable to a decrease in the fidelity of transmission. Our experiment shows how cultural dynamics can be explained in terms of competing evolutionary forces.


Asunto(s)
Comportamiento del Consumidor/estadística & datos numéricos , Evolución Cultural , Modelos Teóricos , Música , Estimulación Acústica , Algoritmos , Simulación por Computador , Estética , Humanos
6.
Genome Res ; 21(11): 1872-81, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21795387

RESUMEN

Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.


Asunto(s)
Anopheles/genética , Anopheles/metabolismo , Empalme Alternativo , Animales , Mapeo Cromosómico , Codón Iniciador , Exones , Genes de Insecto , Genómica , Intrones , Espectrometría de Masas , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Sistemas de Lectura Abierta , Péptidos/genética , Proteómica , Sitios de Empalme de ARN , Reproducibilidad de los Resultados , Regiones no Traducidas/genética
7.
Nucleic Acids Res ; 40(Database issue): D729-34, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22135296

RESUMEN

VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.


Asunto(s)
Bases de Datos Genéticas , Genoma de los Insectos , Insectos Vectores/genética , Animales , Culicidae/genética , Variación Genética , Genómica , Resistencia a los Insecticidas , Ixodes/genética , Pediculus/genética , Rhodnius/genética , Moscas Tse-Tse/genética
8.
BMC Genomics ; 13: 207, 2012 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-22646700

RESUMEN

BACKGROUND: Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline species, only a few species from different branches of the mosquito evolutionary tree transmit malaria, suggesting that their vectorial capacity has evolved independently. Anopheles albimanus (subgenus Nyssorhynchus) is an important malaria vector in the Americas. The divergence time between Anopheles gambiae, the main malaria vector in Africa, and the Neotropical vectors has been estimated to be 100 My. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to explore the mosquito biology beyond the An. gambiae complex. RESULTS: We sequenced the transcriptome of the An. albimanus adult female. By combining Sanger, 454 and Illumina sequences from cDNA libraries derived from the midgut, cuticular fat body, dorsal vessel, salivary gland and whole body, we generated a single, high-quality assembly containing 16,669 transcripts, 92% of which mapped to the An. darlingi genome and covered 90% of the core eukaryotic genome. Bidirectional comparisons between the An. gambiae, An. darlingi and An. albimanus predicted proteomes allowed the identification of 3,772 putative orthologs. More than half of the transcripts had a match to proteins in other insect vectors and had an InterPro annotation. We identified several protein families that may be relevant to the study of Plasmodium-mosquito interaction. An open source transcript annotation browser called GDAV (Genome-Delinked Annotation Viewer) was developed to facilitate public access to the data generated by this and future transcriptome projects. CONCLUSIONS: We have explored the adult female transcriptome of one important New World malaria vector, An. albimanus. We identified protein-coding transcripts involved in biological processes that may be relevant to the Plasmodium lifecycle and can serve as the starting point for searching targets for novel control strategies. Our data increase the available genomic information regarding An. albimanus several hundred-fold, and will facilitate molecular research in medical entomology, evolutionary biology, genomics and proteomics of anopheline mosquito vectors. The data reported in this manuscript is accessible to the community via the VectorBase website (http://www.vectorbase.org/Other/AdditionalOrganisms/).


Asunto(s)
Anopheles/genética , Insectos Vectores/genética , Transcriptoma/genética , Animales , Mapeo Cromosómico , Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Femenino , Biblioteca de Genes , Genoma , Interacciones Huésped-Parásitos , Plasmodium/fisiología , Proteoma/metabolismo , Análisis de Secuencia de ADN
9.
BMC Genomics ; 12: 620, 2011 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-22185628

RESUMEN

BACKGROUND: Quantitative transcriptome data for the malaria-transmitting mosquito Anopheles gambiae covers a broad range of biological and experimental conditions, including development, blood feeding and infection. Web-based summaries of differential expression for individual genes with respect to these conditions are a useful tool for the biologist, but they lack the context that a visualisation of all genes with respect to all conditions would give. For most organisms, including A. gambiae, such a systems-level view of gene expression is not yet available. RESULTS: We have clustered microarray-based gene-averaged expression values, available from VectorBase, for 10194 genes over 93 experimental conditions using a self-organizing map. Map regions corresponding to known biological events, such as egg production, are revealed. Many individual gene clusters (nodes) on the map are highly enriched in biological and molecular functions, such as protein synthesis, protein degradation and DNA replication. Gene families, such as odorant binding proteins, can be classified into distinct functional groups based on their expression and evolutionary history. Immunity-related genes are non-randomly distributed in several distinct regions on the map, and are generally distant from genes with house-keeping roles. Each immunity-rich region appears to represent a distinct biological context for pathogen recognition and clearance (e.g. the humoral and gut epithelial responses). Several immunity gene families, such as peptidoglycan recognition proteins (PGRPs) and defensins, appear to be specialised for these distinct roles, while three genes with physically interacting protein products (LRIM1/APL1C/TEP1) are found in close proximity. CONCLUSIONS: The map provides the first genome-scale, multi-experiment overview of gene expression in A. gambiae and should also be useful at the gene-level for investigating potential interactions. A web interface is available through the VectorBase website http://www.vectorbase.org/. It is regularly updated as new experimental data becomes available.


Asunto(s)
Anopheles/genética , Mapeo Cromosómico/métodos , Transcriptoma , Animales , Anopheles/metabolismo , Proteínas Portadoras/genética , Defensinas/genética , Perfilación de la Expresión Génica , Genoma , Proteínas de Insectos/genética , Receptores Odorantes/genética
10.
BMC Med ; 9: 21, 2011 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-21366904

RESUMEN

BACKGROUND: Cerebral microdialysis (MD) is used to monitor local brain chemistry of patients with traumatic brain injury (TBI). Despite an extensive literature on cerebral MD in the clinical setting, it remains unclear how individual levels of real-time MD data are to be interpreted. Intracranial pressure (ICP) and cerebral perfusion pressure (CPP) are important continuous brain monitors in neurointensive care. They are used as surrogate monitors of cerebral blood flow and have an established relation to outcome. The purpose of this study was to investigate the relations between MD parameters and ICP and/or CPP in patients with TBI. METHODS: Cerebral MD, ICP and CPP were monitored in 90 patients with TBI. Data were extensively analyzed, using over 7,350 samples of complete (hourly) MD data sets (glucose, lactate, pyruvate and glycerol) to seek representations of ICP, CPP and MD that were best correlated. MD catheter positions were located on computed tomography scans as pericontusional or nonpericontusional. MD markers were analyzed for correlations to ICP and CPP using time series regression analysis, mixed effects models and nonlinear (artificial neural networks) computer-based pattern recognition methods. RESULTS: Despite much data indicating highly perturbed metabolism, MD shows weak correlations to ICP and CPP. In contrast, the autocorrelation of MD is high for all markers, even at up to 30 future hours. Consequently, subject identity alone explains 52% to 75% of MD marker variance. This indicates that the dominant metabolic processes monitored with MD are long-term, spanning days or longer. In comparison, short-term (differenced or Δ) changes of MD vs. CPP are significantly correlated in pericontusional locations, but with less than 1% explained variance. Moreover, CPP and ICP were significantly related to outcome based on Glasgow Outcome Scale scores, while no significant relations were found between outcome and MD. CONCLUSIONS: The multitude of highly perturbed local chemistry seen with MD in patients with TBI predominately represents long-term metabolic patterns and is weakly correlated to ICP and CPP. This suggests that disturbances other than pressure and/or flow have a dominant influence on MD levels in patients with TBI.


Asunto(s)
Química Encefálica , Lesiones Encefálicas/diagnóstico , Encéfalo/fisiopatología , Cuidados Críticos/métodos , Microdiálisis/métodos , Adolescente , Adulto , Anciano , Cateterismo/métodos , Humanos , Presión Intracraneal , Persona de Mediana Edad , Perfusión , Adulto Joven
11.
Nucleic Acids Res ; 37(Database issue): D583-7, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19028744

RESUMEN

VectorBase (http://www.vectorbase.org) is an NIAID-funded Bioinformatic Resource Center focused on invertebrate vectors of human pathogens. VectorBase annotates and curates vector genomes providing a web accessible integrated resource for the research community. Currently, VectorBase contains genome information for three mosquito species: Aedes aegypti, Anopheles gambiae and Culex quinquefasciatus, a body louse Pediculus humanus and a tick species Ixodes scapularis. Since our last report VectorBase has initiated a community annotation system, a microarray and gene expression repository and controlled vocabularies for anatomy and insecticide resistance. We have continued to develop both the software infrastructure and tools for interrogating the stored data.


Asunto(s)
Vectores Artrópodos/genética , Culicidae/genética , Bases de Datos Genéticas , Aedes/genética , Animales , Anopheles/genética , Culex/genética , Culicidae/metabolismo , Perfilación de la Expresión Génica , Genoma de los Insectos , Genómica , Ixodes/genética , Pediculus/genética , Vocabulario Controlado
12.
Sci Data ; 6(1): 40, 2019 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-31024009

RESUMEN

Arthropods play a dominant role in natural and human-modified terrestrial ecosystem dynamics. Spatially-explicit arthropod population time-series data are crucial for statistical or mathematical models of these dynamics and assessment of their veterinary, medical, agricultural, and ecological impacts. Such data have been collected world-wide for over a century, but remain scattered and largely inaccessible. In particular, with the ever-present and growing threat of arthropod pests and vectors of infectious diseases, there are numerous historical and ongoing surveillance efforts, but the data are not reported in consistent formats and typically lack sufficient metadata to make reuse and re-analysis possible. Here, we present the first-ever minimum information standard for arthropod abundance, Minimum Information for Reusable Arthropod Abundance Data (MIReAD). Developed with broad stakeholder collaboration, it balances sufficiency for reuse with the practicality of preparing the data for submission. It is designed to optimize data (re)usability from the "FAIR," (Findable, Accessible, Interoperable, and Reusable) principles of public data archiving (PDA). This standard will facilitate data unification across research initiatives and communities dedicated to surveillance for detection and control of vector-borne diseases and pests.


Asunto(s)
Artrópodos , Almacenamiento y Recuperación de la Información/normas , Animales , Artrópodos/fisiología , Biodiversidad , Ecosistema , Difusión de la Información , Dinámica Poblacional
13.
Bioinformatics ; 23(9): 1159-60, 2007 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-17332022

RESUMEN

UNLABELLED: NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression matching and multiple program classifiers induced by genetic programming. A likelihood score is derived from the programs for each input sequence and each residue position. Different forms of visualization are provided to assist the detection of nuclear localization signals (NLSs). The NucPred server also provides access to additional sources of biological information (real and predicted) for a better validation and interpretation of results. AVAILABILITY: The web interface to the NucPred tool is provided at http://www.sbc.su.se/~maccallr/nucpred. In addition, the Perl code is made freely available under the GNU Public Licence (GPL) for simple incorporation into other tools and web servers.


Asunto(s)
Algoritmos , Núcleo Celular/química , Núcleo Celular/metabolismo , Proteínas Nucleares/química , Proteínas Nucleares/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas/métodos , Relación Estructura-Actividad
14.
BMC Bioinformatics ; 7: 16, 2006 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-16409628

RESUMEN

BACKGROUND: Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterized protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. RESULTS: We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. CONCLUSION: We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription.


Asunto(s)
Biología Computacional/métodos , Proteómica/métodos , Algoritmos , Secuencias de Aminoácidos , Inteligencia Artificial , Catálisis , Bases de Datos de Proteínas , Evolución Molecular , Genómica , Humanos , Modelos Estadísticos , Modelos Teóricos , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas , Alineación de Secuencia , Análisis de Secuencia de Proteína/métodos , Relación Estructura-Actividad , Ubiquitina/química
15.
BMC Bioinformatics ; 7: 357, 2006 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-16869963

RESUMEN

BACKGROUND: Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. RESULTS: The addition of self-organizing map locations as inputs to a profile-profile scoring function improves the alignment quality of distantly related proteins slightly. The improvement is slightly smaller than that gained from the inclusion of predicted secondary structure. However, the information seems to be complementary as the two prediction schemes can be combined to improve the alignment quality by a further small but significant amount. CONCLUSION: It has been observed in many studies that predicted secondary structure significantly improves the alignments. Here we have shown that the addition of self-organizing map locations can further improve the alignments as the self-organizing map locations seem to contain some information that is not captured by the predicted secondary structure.


Asunto(s)
Evolución Molecular , Estructura Secundaria de Proteína , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Biología Computacional/métodos , Biología Computacional/tendencias , Bases de Datos de Proteínas , Predicción , Redes Neurales de la Computación , Alineación de Secuencia/tendencias , Análisis de Secuencia de Proteína/tendencias , Homología de Secuencia de Aminoácido
16.
Nucleic Acids Res ; 32(Database issue): D245-50, 2004 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-14681404

RESUMEN

The 3D-GENOMICS database (http://www.sbg.bio. ic.ac.uk/3dgenomics/) provides structural annotations for proteins from sequenced genomes. In August 2003 the database included data for 93 proteomes. The annotations stored in the database include homologous sequences from various sequence databases, domains from SCOP and Pfam, patterns from Prosite and other predicted sequence features such as transmembrane regions and coiled coils. In addition to annotations at the sequence level, several precomputed cross- proteome comparative analyses are available based on SCOP domain superfamily composition. Annotations are available to the user via a web interface to the database. Multiple points of entry are available so that a user is able to: (i) directly access annotations for a single protein sequence via keywords or accession codes, (ii) examine a sequence of interest chosen from a summary of annotations for a particular proteome, or (iii) access precomputed frequency-based cross-proteome comparative analyses.


Asunto(s)
Bases de Datos de Proteínas , Genómica , Proteínas/química , Proteínas/metabolismo , Proteómica , Secuencia de Aminoácidos , Animales , Biología Computacional , Genoma , Humanos , Almacenamiento y Recuperación de la Información , Internet , Datos de Secuencia Molecular , Conformación Proteica , Proteínas/genética , Proteoma , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Interfaz Usuario-Computador
17.
Proteins ; 61 Suppl 7: 214-224, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16187364

RESUMEN

Here we present the evaluation results of the Critical Assessment of Protein Structure Prediction (CASP6) contact prediction category. Contact prediction was assessed with standard measures well known in the field and the performance of specialist groups was evaluated alongside groups that submitted models with 3D coordinates. The evaluation was mainly focused on long range contact predictions for the set of new fold targets, although we analyzed predictions for all targets. Three groups with similar levels of accuracy and coverage performed a little better than the others. Comparisons of the predictions of the three best methods with those of CASP5/CAFASP3 suggested some improvement, although there were not enough targets in the comparisons to make this statistically significant.


Asunto(s)
Biología Computacional/métodos , Proteómica/métodos , Algoritmos , Simulación por Computador , Computadores , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Reproducibilidad de los Resultados , Alineación de Secuencia , Programas Informáticos
18.
Bioinformatics ; 20 Suppl 1: i224-31, 2004 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-15262803

RESUMEN

MOTIVATION: Current approaches to contact map prediction in proteins have focused on amino acid conservation and patterns of mutation at sequentially distant positions. This sequence information is poorly understood and very little progress has been made in this area during recent years. RESULTS: In this study, an observation of 'striped' sequence patterns across beta-sheets prompted the development of a new type of contact map predictor. Computer program code was evolved with an evolutionary algorithm (genetic programming) to select residues and residue pairs likely to make contacts based solely on local sequence patterns extracted with the help of self-organizing maps. The mean prediction accuracy is 27% on a validation set of 156 domains up to 400 residues in length, where contacts are separated by at least 8 residues and length/10 pairs are predicted. The retrospective accuracy on a set of 15 CASP5 targets is 27% and 14% for length/10 and length/2 predicted pairs, respectively (both using a minimum residue separation of 24). This compares favourably to the equivalent 21% and 13% obtained for the best automated contact prediction methods at CASP5. The results suggest that protein architectures impose regularities in local sequence environments. Other sources of information, such as correlated/compensatory mutations, may further improve accuracy. AVAILABILITY: A web-based prediction service is available at http://www.sbc.su.se/~maccallr/contactmaps


Asunto(s)
Modelos Químicos , Modelos Moleculares , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/ultraestructura , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Sitios de Unión , Simulación por Computador , Datos de Secuencia Molecular , Unión Proteica , Conformación Proteica
19.
R Soc Open Sci ; 2(5): 150081, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-26064663

RESUMEN

In modern societies, cultural change seems ceaseless. The flux of fashion is especially obvious for popular music. While much has been written about the origin and evolution of pop, most claims about its history are anecdotal rather than scientific in nature. To rectify this, we investigate the US Billboard Hot 100 between 1960 and 2010. Using music information retrieval and text-mining tools, we analyse the musical properties of approximately 17 000 recordings that appeared in the charts and demonstrate quantitative trends in their harmonic and timbral properties. We then use these properties to produce an audio-based classification of musical styles and study the evolution of musical diversity and disparity, testing, and rejecting, several classical theories of cultural change. Finally, we investigate whether pop musical evolution has been gradual or punctuated. We show that, although pop music has evolved continuously, it did so with particular rapidity during three stylistic 'revolutions' around 1964, 1983 and 1991. We conclude by discussing how our study points the way to a quantitative science of cultural change.

20.
Science ; 347(6217): 1258522, 2015 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-25554792

RESUMEN

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.


Asunto(s)
Anopheles/genética , Evolución Molecular , Genoma de los Insectos , Insectos Vectores/genética , Malaria/transmisión , Animales , Anopheles/clasificación , Secuencia de Bases , Cromosomas de Insectos/genética , Drosophila/genética , Humanos , Insectos Vectores/clasificación , Datos de Secuencia Molecular , Filogenia , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda