Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nat Methods ; 6(1): 55-61, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19079254

RESUMEN

Comprehensive protein-interaction mapping projects are underway for many model species and humans. A key step in these projects is estimating the time, cost and personnel required for obtaining an accurate and complete map. Here we modeled the cost of interaction-map completion for various experimental designs. We showed that current efforts may require up to 20 independent tests covering each protein pair to approach completion. We explored designs for reducing this cost substantially, including prioritization of protein pairs, probability thresholding and interaction prediction. The best experimental designs lowered cost by fourfold overall and >100-fold in early stages of mapping. We demonstrate the best strategy in an ongoing project in Drosophila melanogaster, in which we mapped 450 high-confidence interactions using 47 microtiter plates, versus thousands of plates expected using current designs. This study provides a framework for assessing the feasibility of interaction mapping projects and for future efforts to increase their efficiency.


Asunto(s)
Mapeo de Interacción de Proteínas/economía , Mapeo de Interacción de Proteínas/métodos , Animales , Simulación por Computador , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Humanos , Modelos Biológicos
2.
Gigascience ; 9(6)2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32543654

RESUMEN

BACKGROUND: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility. RESULTS: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2. CONCLUSIONS: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.


Asunto(s)
Biología Computacional/métodos , Genoma , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Sus scrofa/inmunología , Animales , Anotación de Secuencia Molecular , Reproducibilidad de los Resultados , Investigación , Porcinos
3.
Bioinformatics ; 23(2): e24-9, 2007 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-17237099

RESUMEN

MOTIVATION: We introduce a novel approach to multiple alignment that is based on an algorithm for rapidly checking whether single matches are consistent with a partial multiple alignment. This leads to a sequence annealing algorithm, which is an incremental method for building multiple sequence alignments one match at a time. Our approach improves significantly on the standard progressive alignment approach to multiple alignment. RESULTS: The sequence annealing algorithm performs well on benchmark test sets of protein sequences. It is not only sensitive, but also specific, drastically reducing the number of incorrectly aligned residues in comparison to other programs. The method allows for adjustment of the sensitivity/specificity tradeoff and can be used to reliably identify homologous regions among protein sequences. AVAILABILITY: An implementation of the sequence annealing algorithm is available at http://bio.math.berkeley.edu/amap/


Asunto(s)
Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Datos de Secuencia Molecular
4.
Genome Announc ; 6(7)2018 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-29449398

RESUMEN

The model oleaginous alga Nannochloropsis gaditana was completely sequenced using a combination of optical mapping and next-generation sequencing technologies to generate one of the most complete eukaryotic genomes published to date. The assembled genome is 30.7 Mb long.

5.
Genome Announc ; 6(12)2018 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-29567741

RESUMEN

Haematococcus lacustris is an industrially relevant microalga that is used for the production of the carotenoid astaxanthin. Here, we report the use of PacBio long-read sequencing to assemble the chloroplast genome of H. lacustris strain UTEX:2505. At 1.35 Mb, this is the largest assembled chloroplast of any plant or alga known to date.

6.
Nat Biotechnol ; 35(7): 647-652, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28628130

RESUMEN

Lipid production in the industrial microalga Nannochloropsis gaditana exceeds that of model algal species and can be maximized by nutrient starvation in batch culture. However, starvation halts growth, thereby decreasing productivity. Efforts to engineer N. gaditana strains that can accumulate biomass and overproduce lipids have previously met with little success. We identified 20 transcription factors as putative negative regulators of lipid production by using RNA-seq analysis of N. gaditana during nitrogen deprivation. Application of a CRISPR-Cas9 reverse-genetics pipeline enabled insertional mutagenesis of 18 of these 20 transcription factors. Knocking out a homolog of fungal Zn(II)2Cys6-encoding genes improved partitioning of total carbon to lipids from 20% (wild type) to 40-55% (mutant) in nutrient-replete conditions. Knockout mutants grew poorly, but attenuation of Zn(II)2Cys6 expression yielded strains producing twice as much lipid (∼5.0 g m-2 d-1) as that in the wild type (∼2.5 g m-2 d-1) under semicontinuous growth conditions and had little effect on growth.


Asunto(s)
Mejoramiento Genético/métodos , Metabolismo de los Lípidos/genética , Lípidos/biosíntesis , Elementos Reguladores de la Transcripción/genética , Estramenopilos/genética , Factores de Transcripción/genética , Proteínas Algáceas/genética , Regulación hacia Abajo/genética , Técnicas de Inactivación de Genes , Lípidos/genética , Estramenopilos/crecimiento & desarrollo
7.
mBio ; 6(6): e01796-15, 2015 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-26604259

RESUMEN

UNLABELLED: Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. However, little information on the phylogenetic distribution and potential role of these CRISPR-Cas systems in molding the P. aeruginosa accessory genome and antibiotic resistance elements is known. Computational approaches were used to identify and characterize CRISPR-Cas systems within 672 genomes, and in the process, we identified a previously unreported and putatively mobile type I-C P. aeruginosa CRISPR-Cas system. Furthermore, genomes harboring noninhibited type I-F and I-E CRISPR-Cas systems were on average ~300 kb smaller than those without a CRISPR-Cas system. In silico analysis demonstrated that the accessory genome (n = 22,036 genes) harbored the majority of identified CRISPR-Cas targets. We also assembled a global spacer library that aided the identification of difficult-to-characterize mobile genetic elements within next-generation sequencing (NGS) data and allowed CRISPR typing of a majority of P. aeruginosa strains. In summary, our analysis demonstrated that CRISPR-Cas systems play an important role in shaping the accessory genomes of globally distributed P. aeruginosa isolates. IMPORTANCE: P. aeruginosa is both an antibiotic-refractory pathogen and an important model system for type I CRISPR-Cas bacterial immune systems. By combining the genome sequences of 672 newly and previously sequenced genomes, we were able to provide a global view of the phylogenetic distribution, conservation, and potential targets of these systems. This analysis identified a new and putatively mobile P. aeruginosa CRISPR-Cas subtype, characterized the diverse distribution of known CRISPR-inhibiting genes, and provided a potential new use for CRISPR spacer libraries in accessory genome analysis. Our data demonstrated the importance of CRISPR-Cas systems in modulating the accessory genomes of globally distributed strains while also providing substantial data for subsequent genomic and experimental studies in multiple fields. Understanding why certain genotypes of P. aeruginosa are clinically prevalent and adept at horizontally acquiring virulence and antibiotic resistance elements is of major clinical and economic importance.


Asunto(s)
Antibacterianos/farmacología , Sistemas CRISPR-Cas , Farmacorresistencia Bacteriana , Variación Genética , Filogenia , Pseudomonas aeruginosa/efectos de los fármacos , Pseudomonas aeruginosa/genética , Biología Computacional , Genoma Bacteriano , Pseudomonas aeruginosa/clasificación , Análisis de Secuencia de ADN
8.
BMC Bioinformatics ; 5: 146, 2004 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-15471541

RESUMEN

BACKGROUND: Researchers who use MEDLINE for text mining, information extraction, or natural language processing may benefit from having a copy of MEDLINE that they can manage locally. The National Library of Medicine (NLM) distributes MEDLINE in eXtensible Markup Language (XML)-formatted text files, but it is difficult to query MEDLINE in that format. We have developed software tools to parse the MEDLINE data files and load their contents into a relational database. Although the task is conceptually straightforward, the size and scope of MEDLINE make the task nontrivial. Given the increasing importance of text analysis in biology and medicine, we believe a local installation of MEDLINE will provide helpful computing infrastructure for researchers. RESULTS: We developed three software packages that parse and load MEDLINE, and ran each package to install separate instances of the MEDLINE database. For each installation, we collected data on loading time and disk-space utilization to provide examples of the process in different settings. Settings differed in terms of commercial database-management system (IBM DB2 or Oracle 9i), processor (Intel or Sun), programming language of installation software (Java or Perl), and methods employed in different versions of the software. The loading times for the three installations were 76 hours, 196 hours, and 132 hours, and disk-space utilization was 46.3 GB, 37.7 GB, and 31.6 GB, respectively. Loading times varied due to a variety of differences among the systems. Loading time also depended on whether data were written to intermediate files or not, and on whether input files were processed in sequence or in parallel. Disk-space utilization depended on the number of MEDLINE files processed, amount of indexing, and whether abstracts were stored as character large objects or truncated. CONCLUSIONS: Relational database (RDBMS) technology supports indexing and querying of very large datasets, and can accommodate a locally stored version of MEDLINE. RDBMS systems support a wide range of queries and facilitate certain tasks that are not directly supported by the application programming interface to PubMed. Because there is variation in hardware, software, and network infrastructures across sites, we cannot predict the exact time required for a user to load MEDLINE, but our results suggest that performance of the software is reasonable. Our database schemas and conversion software are publicly available at http://biotext.berkeley.edu.


Asunto(s)
MEDLINE , Diseño de Software , Sistemas de Administración de Bases de Datos , Bases de Datos Bibliográficas , Programas Informáticos , Validación de Programas de Computación , Interfaz Usuario-Computador
9.
Genome Res ; 17(6): 760-74, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17567995

RESUMEN

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.


Asunto(s)
Evolución Molecular , Genoma Humano , Mamíferos/genética , Sistemas de Lectura Abierta , Filogenia , Alineación de Secuencia , Animales , Proyecto Genoma Humano , Humanos
10.
Pac Symp Biocomput ; : 451-62, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-12603049

RESUMEN

The volume of biomedical text is growing at a fast rate, creating challenges for humans and computer systems alike. One of these challenges arises from the frequent use of novel abbreviations in these texts, thus requiring that biomedical lexical ontologies be continually updated. In this paper we show that the problem of identifying abbreviations' definitions can be solved with a much simpler algorithm than that proposed by other research efforts. The algorithm achieves 96% precision and 82% recall on a standard test collection, which is at least as good as existing approaches. It also achieves 95% precision and 82% recall on another, larger test set. A notable advantage of the algorithm is that, unlike other approaches, it does not require any training data.


Asunto(s)
Abreviaturas como Asunto , Algoritmos , Indización y Redacción de Resúmenes , Biología Computacional , MEDLINE , Edición
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA