Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Front Cell Infect Microbiol ; 13: 1182567, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37600946

RESUMEN

Introduction: Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers. Methods: Here, machine learning models are employed to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Results and discussion: Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction.


Asunto(s)
Drosophila melanogaster , Histonas , Animales , Histonas/genética , Análisis de Secuencia de ADN , Cromatina/genética , Desoxirribonucleasas
2.
PLoS Negl Trop Dis ; 17(4): e0010862, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37043542

RESUMEN

Phlebotomine sand flies are of global significance as important vectors of human disease, transmitting bacterial, viral, and protozoan pathogens, including the kinetoplastid parasites of the genus Leishmania, the causative agents of devastating diseases collectively termed leishmaniasis. More than 40 pathogenic Leishmania species are transmitted to humans by approximately 35 sand fly species in 98 countries with hundreds of millions of people at risk around the world. No approved efficacious vaccine exists for leishmaniasis and available therapeutic drugs are either toxic and/or expensive, or the parasites are becoming resistant to the more recently developed drugs. Therefore, sand fly and/or reservoir control are currently the most effective strategies to break transmission. To better understand the biology of sand flies, including the mechanisms involved in their vectorial capacity, insecticide resistance, and population structures we sequenced the genomes of two geographically widespread and important sand fly vector species: Phlebotomus papatasi, a vector of Leishmania parasites that cause cutaneous leishmaniasis, (distributed in Europe, the Middle East and North Africa) and Lutzomyia longipalpis, a vector of Leishmania parasites that cause visceral leishmaniasis (distributed across Central and South America). We categorized and curated genes involved in processes important to their roles as disease vectors, including chemosensation, blood feeding, circadian rhythm, immunity, and detoxification, as well as mobile genetic elements. We also defined gene orthology and observed micro-synteny among the genomes. Finally, we present the genetic diversity and population structure of these species in their respective geographical areas. These genomes will be a foundation on which to base future efforts to prevent vector-borne transmission of Leishmania parasites.


Asunto(s)
Leishmania , Leishmaniasis Cutánea , Phlebotomus , Psychodidae , Animales , Humanos , Phlebotomus/parasitología , Psychodidae/parasitología , Leishmania/genética , Genómica
3.
PeerJ ; 10: e12831, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35116204

RESUMEN

BACKGROUND: Large (>1 Mb), polymorphic inversions have substantial impacts on population structure and maintenance of genotypes. These large inversions can be detected from single nucleotide polymorphism (SNP) data using unsupervised learning techniques like PCA. Construction and analysis of a feature matrix from millions of SNPs requires large amount of memory and limits the sizes of data sets that can be analyzed. METHODS: We propose using feature hashing construct a feature matrix from a VCF file of SNPs for reducing memory usage. The matrix is constructed in a streaming fashion such that the entire VCF file is never loaded into memory at one time. RESULTS: When evaluated on Anopheles mosquito and Drosophila fly data sets, our approach reduced memory usage by 97% with minimal reductions in accuracy for inversion detection and localization tasks. CONCLUSION: With these changes, inversions in larger data sets can be analyzed easily and efficiently on common laptop and desktop computers. Our method is publicly available through our open-source inversion analysis software, Asaph.


Asunto(s)
Anopheles , Polimorfismo de Nucleótido Simple , Animales , Polimorfismo de Nucleótido Simple/genética , Inversión Cromosómica/genética , Programas Informáticos , Genotipo , Anopheles/genética
4.
Hereditas ; 158(1): 7, 2021 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-33509290

RESUMEN

BACKGROUND: The Aedes aegypti mosquito is a threat to human health across the globe. The A. aegypti genome was recently re-sequenced and re-assembled. Due to a combination of long-read PacBio and Hi-C sequencing, the AaegL5 assembly is chromosome complete and significantly improves the assembly in key areas such as the M/m sex-determining locus. Release of the updated genome assembly has precipitated the need to reprocess historical functional genomic data sets, including cis-regulatory element (CRE) maps that had previously been generated for A. aegypti. RESULTS: We re-processed and re-analyzed the A. aegypti whole embryo FAIRE seq data to create an updated embryonic CRE map for the AaegL5 genome. We validated that the new CRE map recapitulates key features of the original AaegL3 CRE map. Further, we built on the improved assembly in the M/m locus to analyze overlaps of open chromatin regions with genes. To support the validation, we created a new method (PeakMatcher) for matching peaks from the same experimental data set across genome assemblies. CONCLUSION: Use of PeakMatcher software, which is available publicly under an open-source license, facilitated the release of an updated and validated CRE map, which is available through the NIH GEO. These findings demonstrate that PeakMatcher software will be a useful resource for validation and transferring of previous annotations to updated genome assemblies.


Asunto(s)
Aedes/genética , Elementos Reguladores de la Transcripción , Aedes/embriología , Animales , Genoma de los Insectos , Anotación de Secuencia Molecular
5.
Front Genet ; 12: 785934, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35082832

RESUMEN

Almost all regulation of gene expression in eukaryotic genomes is mediated by the action of distant non-coding transcriptional enhancers upon proximal gene promoters. Enhancer locations cannot be accurately predicted bioinformatically because of the absence of a defined sequence code, and thus functional assays are required for their direct detection. Here we used a massively parallel reporter assay, Self-Transcribing Active Regulatory Region sequencing (STARR-seq), to generate the first comprehensive genome-wide map of enhancers in Anopheles coluzzii, a major African malaria vector in the Gambiae species complex. The screen was carried out by transfecting reporter libraries created from the genomic DNA of 60 wild A. coluzzii from Burkina Faso into A. coluzzii 4a3A cells, in order to functionally query enhancer activity of the natural population within the homologous cellular context. We report a catalog of 3,288 active genomic enhancers that were significant across three biological replicates, 74% of them located in intergenic and intronic regions. The STARR-seq enhancer screen is chromatin-free and thus detects inherent activity of a comprehensive catalog of enhancers that may be restricted in vivo to specific cell types or developmental stages. Testing of a validation panel of enhancer candidates using manual luciferase assays confirmed enhancer function in 26 of 28 (93%) of the candidates over a wide dynamic range of activity from two to at least 16-fold activity above baseline. The enhancers occupy only 0.7% of the genome, and display distinct composition features. The enhancer compartment is significantly enriched for 15 transcription factor binding site signatures, and displays divergence for specific dinucleotide repeats, as compared to matched non-enhancer genomic controls. The genome-wide catalog of A. coluzzii enhancers is publicly available in a simple searchable graphic format. This enhancer catalogue will be valuable in linking genetic and phenotypic variation, in identifying regulatory elements that could be employed in vector manipulation, and in better targeting of chromosome editing to minimize extraneous regulation influences on the introduced sequences. Importance: Understanding the role of the non-coding regulatory genome in complex disease phenotypes is essential, but even in well-characterized model organisms, identification of regulatory regions within the vast non-coding genome remains a challenge. We used a large-scale assay to generate a genome wide map of transcriptional enhancers. Such a catalogue for the important malaria vector, Anopheles coluzzii, will be an important research tool as the role of non-coding regulatory variation in differential susceptibility to malaria infection is explored and as a public resource for research on this important insect vector of disease.

6.
PLoS Negl Trop Dis ; 14(12): e0008967, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33370303

RESUMEN

Phlebotomine sand flies employ an elaborate system of pheromone communication wherein males produce pheromones that attract other males to leks (thus acting as an aggregation pheromone) and females to the lekking males (sex pheromone). In addition, the type of pheromone produced varies among populations. Despite the numerous studies on sand fly chemical communication, little is known of their chemosensory genome. Chemoreceptors interact with chemicals in an organism's environment to elicit essential behaviors such as the identification of suitable mates and food sources. Thus, they play important roles during adaptation and speciation. Major chemoreceptor gene families, odorant receptors (ORs), gustatory receptors (GRs) and ionotropic receptors (IRs) together detect and discriminate the chemical landscape. Here, we annotated the chemoreceptor repertoire in the genomes of Lutzomyia longipalpis and Phlebotomus papatasi, major phlebotomine vectors in the New World and Old World, respectively. Comparison with other sequenced Diptera revealed a large and unique expansion where over 80% of the ~140 ORs belong to a single, taxonomically restricted clade. We next conducted a comprehensive analysis of the chemoreceptors in 63 L. longipalpis individuals from four different locations in Brazil representing allopatric and sympatric populations and three sex-aggregation pheromone types (chemotypes). Population structure based on single nucleotide polymorphisms (SNPs) and gene copy number in the chemoreceptors corresponded with their putative chemotypes, and corroborate previous studies that identified multiple populations. Our work provides genomic insights into the underlying behavioral evolution of sexual communication in the L. longipalpis species complex in Brazil, and highlights the importance of accounting for the ongoing speciation in central and South American Lutzomyia that could have important implications for vectorial capacity.


Asunto(s)
Células Quimiorreceptoras/metabolismo , Proteínas de Insectos/genética , Leishmaniasis/prevención & control , Leishmaniasis/transmisión , Phlebotomus/parasitología , Atractivos Sexuales/química , Animales , Brasil , Femenino , Insectos Vectores/parasitología , Leishmania , Masculino , Phlebotomus/genética , Phlebotomus/fisiología , Polimorfismo de Nucleótido Simple/genética
7.
PLoS One ; 15(10): e0240429, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33119626

RESUMEN

Chromosomal inversions can lead to reproductive isolation and adaptation in insects such as Drosophila melanogaster and the non-model malaria vector Anopheles gambiae. Inversions can be detected and characterized using principal component analysis (PCA) of single nucleotide polymorphisms (SNPs). To aid in developing such methods, we formed a new benchmark derived from three publicly-available insect data. We then used this benchmark to perform an extended validation of our software for inversion analysis (Asaph). Through that process, we identified and characterized several problematic test cases liable to misinterpretation that can help guide PCA-based inversion detection. Lastly, we re-analyzed the 2R chromosome arm of 150 An. gambiae and coluzzii samples and observed two inversions (2Rc and 2Rd) that were previously known but not annotated in these particular individuals. The resulting benchmark data set and methods will be useful for future inversion detection based solely on SNP data.


Asunto(s)
Anopheles/genética , Cromosomas de Insectos/genética , Biología Computacional/métodos , Drosophila melanogaster/genética , Animales , Inversión Cromosómica , Conjuntos de Datos como Asunto , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Programas Informáticos
8.
J Bioinform Comput Biol ; 16(5): 1840020, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30419783

RESUMEN

Association tests performed with the Likelihood-Ratio Test (LR Test) can be an alternative to [Formula: see text], which is often used in population genetics to find variants of interest. Because the LR Test has several properties that could make it preferable to [Formula: see text], we propose a novel approach for modeling unknown genotypes in highly-similar species. To show the effectiveness of this LR Test approach, we apply it to single-nucleotide polymorphisms (SNPs) associated with the recent speciation of the malaria vectors Anopheles gambiae and Anopheles coluzzii and compare to [Formula: see text].


Asunto(s)
Anopheles/genética , Genética de Población/métodos , Polimorfismo de Nucleótido Simple , Animales , Burkina Faso , Camerún , Genotipo , Funciones de Verosimilitud , Modelos Genéticos , Mosquitos Vectores/genética , Cromosoma X
9.
Parasit Vectors ; 6: 150, 2013 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-23705687

RESUMEN

BACKGROUND: The control of vector-borne diseases, such as malaria, dengue fever, and typhus fever is often achieved with the use of insecticides. Unfortunately, insecticide resistance is becoming common among different vector species. There are currently no chemical alternatives to these insecticides because new human-safe classes of molecules have yet to be brought to the vector-control market. The identification of novel targets offer opportunities for rational design of new chemistries to control vector populations. One target family, G protein-coupled receptors (GPCRs), has remained relatively under explored in terms of insecticide development. METHODS: A novel classifier, Ensemble*, for vector GPCRs was developed. Ensemble* was validated and compared to existing classifiers using a set of all known GPCRs from Aedes aegypti, Anopheles gambiae, Apis Mellifera, Drosophila melanogaster, Homo sapiens, and Pediculus humanus. Predictions for unidentified sequences from Ae. aegypti, An. gambiae, and Pe. humanus were validated. Quantitative RT-PCR expression analysis was performed on previously-known and newly discovered Ae. aegypti GPCR genes. RESULTS: We present a new analysis of GPCRs in the genomes of Ae, aegypti, a vector of dengue fever, An. gambiae, a primary vector of Plasmodium falciparum that causes malaria, and Pe. humanus, a vector of epidemic typhus fever, using a novel GPCR classifier, Ensemble*, designed for insect vector species. We identified 30 additional putative GPCRs, 19 of which we validated. Expression of the newly discovered Ae. aegypti GPCR genes was confirmed via quantitative RT-PCR. CONCLUSION: A novel GPCR classifier for insect vectors, Ensemble*, was developed and GPCR predictions were validated. Ensemble* and the validation pipeline were applied to the genomes of three insect vectors (Ae. aegypti, An. gambiae, and Pe. humanus), resulting in the identification of 52 GPCRs not previously identified, of which 11 are predicted GPCRs, and 19 are predicted and confirmed GPCRs.


Asunto(s)
Vectores Artrópodos/genética , Biología Computacional/métodos , Entomología/métodos , Biología Molecular/métodos , Receptores Acoplados a Proteínas G/genética , Aedes/genética , Animales , Anopheles/genética , Perfilación de la Expresión Génica , Pediculus/genética , Reacción en Cadena en Tiempo Real de la Polimerasa
10.
J Chem Theory Comput ; 9(8): 3267-3281, 2013 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-24436689

RESUMEN

Molecular dynamics (MD) simulations now play a key role in many areas of theoretical chemistry, biology, physics, and materials science. In many cases, such calculations are significantly limited by the massive amount of computer time needed to perform calculations of interest. Herein, we present Long Timestep Molecular Dynamics (LTMD), a method to significantly speed MD simulations. In particular, we discuss new methods to calculate the needed terms in LTMD as well as issues germane to a GPU implementation. The resulting code, implemented in the OpenMM MD library, can achieve a significant 6-fold speed increase, leading to MD simulations on the order of 5 µs/day using implicit solvent models.

11.
Comput Sci Eng ; 15(1): 76-83, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24634607

RESUMEN

The problem of formatting data so that it conforms to the required input for scientific data processing tools pervades scientific computing. The CONNecticut Joint University Research Group (CONNJUR) has developed a data translation tool based on a pipeline architecture that partially solves this problem. The CONNJUR Spectrum Translator supports data format translation for experiments that use Nuclear Magnetic Resonance to determine the structure of large protein molecules.

12.
J Biomol NMR ; 50(1): 83-9, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21409563

RESUMEN

NMR spectroscopists are hindered by the lack of standardization for spectral data among the file formats for various NMR data processing tools. This lack of standardization is cumbersome as researchers must perform their own file conversion in order to switch between processing tools and also restricts the combination of tools employed if no conversion option is available. The CONNJUR Spectrum Translator introduces a new, extensible architecture for spectrum translation and introduces two key algorithmic improvements. This first is translation of NMR spectral data (time and frequency domain) to a single in-memory data model to allow addition of new file formats with two converter modules, a reader and a writer, instead of writing a separate converter to each existing format. Secondly, the use of layout descriptors allows a single fid data translation engine to be used for all formats. For the end user, sophisticated metadata readers allow conversion of the majority of files with minimum user configuration. The open source code is freely available at http://connjur.sourceforge.net for inspection and extension.


Asunto(s)
Espectroscopía de Resonancia Magnética/métodos , Programas Informáticos , Algoritmos , Interfaz Usuario-Computador
13.
Proc Int Conf Inf Technol New Gener ; : 1014-1020, 2011 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-22214925

RESUMEN

The CONNecticut Joint University Research (CONNJUR) team is a group of biochemical and software engineering researchers at multiple institutions. The vision of the team is to develop a comprehensive application that integrates a variety of existing analysis tools with workflow and data management to support the process of protein structure determination using Nuclear Magnetic Resonance (NMR). The use of multiple disparate tools and lack of data management, currently the norm in NMR data processing, provides strong motivation for such an integrated environment. This manuscript briefly describes the domain of NMR as used for protein structure determination and explains the formation of the CONNJUR team and its operation in developing the CONNJUR application. The manuscript also describes the evolution of the CONNJUR application through four prototypes and describes the challenges faced while developing the CONNJUR application and how those challenges were met.

14.
BMC Bioinformatics ; 11: 328, 2010 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-20565705

RESUMEN

BACKGROUND: Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. RESULTS: We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. CONCLUSIONS: MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context.


Asunto(s)
Algoritmos , Secuencias de Aminoácidos , Bases de Datos de Proteínas , Proteínas/química , Animales , Inteligencia Artificial , Minería de Datos/métodos , Humanos , Unión Proteica , Proteínas/metabolismo , Análisis de Secuencia de Proteína
15.
BMC Genomics ; 10: 360, 2009 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-19656396

RESUMEN

BACKGROUND: One of the most important developments in bioinformatics over the past few decades has been the observation that short linear peptide sequences (minimotifs) mediate many classes of cellular functions such as protein-protein interactions, molecular trafficking and post-translational modifications. As both the creators and curators of a database which catalogues minimotifs, Minimotif Miner, the authors have a unique perspective on the commonalities of the many functional roles of minimotifs. There is an obvious usefulness in standardizing functional annotations both in allowing for the facile exchange of data between various bioinformatics resources, as well as the internal clustering of sets of related data elements. With these two purposes in mind, the authors provide a proposed syntax for minimotif semantics primarily useful for functional annotation. RESULTS: Herein, we present a structured syntax of minimotifs and their functional annotation. A syntax-based model of minimotif function with established minimotif sequence definitions was implemented using a relational database management system (RDBMS). To assess the usefulness of our standardized semantics, a series of database queries and stored procedures were used to classify SH3 domain binding minimotifs into 10 groups spanning 700 unique binding sequences. CONCLUSION: Our derived minimotif syntax is currently being used to normalize minimotif covalent chemistry and functional definitions within the MnM database. Analysis of SH3 binding minimotif data spanning many different studies within our database reveals unique attributes and frequencies which can be used to classify different types of binding minimotifs. Implementation of the syntax in the relational database enables the application of many different analysis protocols of minimotif data and is an important tool that will help to better understand specificity of minimotif-driven molecular interactions with proteins.


Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Secuencias de Aminoácidos , Dominios y Motivos de Interacción de Proteínas , Semántica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...