Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Nucleic Acids Res ; 41(10): e112, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23580546

RESUMEN

We present an intramolecular reaction, Reflex™, to derive shorter, sequencer-ready, daughter polymerase chain reaction products from a pooled population of barcoded long-range polymerase chain reaction products, whilst still preserving the cognate DNA barcodes. Our Reflex workflow needs only a small number of primer extension steps to rapidly enable uniform sequence coverage of long contiguous sequence targets in large numbers of samples at low cost on desktop next-generation sequencers.


Asunto(s)
Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADN/métodos , Citocromo P-450 CYP2D6/genética , Cartilla de ADN/química , Humanos
2.
Nucleic Acids Res ; 39(12): e81, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21490082

RESUMEN

Amplification by polymerase chain reaction is often used in the preparation of template DNA molecules for next-generation sequencing. Amplification increases the number of available molecules for sequencing but changes the representation of the template molecules in the amplified product and introduces random errors. Such changes in representation hinder applications requiring accurate quantification of template molecules, such as allele calling or estimation of microbial diversity. We present a simple method to count the number of template molecules using degenerate bases and show that it improves genotyping accuracy and removes noise from PCR amplification. This method can be easily added to existing DNA library preparation techniques and can improve the accuracy of variant calling.


Asunto(s)
Reacción en Cadena de la Polimerasa/métodos , Análisis de Secuencia de ADN , Alelos , Biblioteca de Genes , Genotipo , Humanos , Moldes Genéticos
3.
Nucleic Acids Res ; 34(5): 1571-80, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16547200

RESUMEN

An important problem in genomics is automatically clustering homologous proteins when only sequence information is available. Most methods for clustering proteins are local, and are based on simply thresholding a measure related to sequence distance. We first show how locality limits the performance of such methods by analysing the distribution of distances between protein sequences. We then present a global method based on spectral clustering and provide theoretical justification of why it will have a remarkable improvement over local methods. We extensively tested our method and compared its performance with other local methods on several subsets of the SCOP (Structural Classification of Proteins) database, a gold standard for protein structure classification. We consistently observed that, the number of clusters that we obtain for a given set of proteins is close to the number of superfamilies in that set; there are fewer singletons; and the method correctly groups most remote homologs. In our experiments, the quality of the clusters as quantified by a measure that combines sensitivity and specificity was consistently better [on average, improvements were 84% over hierarchical clustering, 34% over Connected Component Analysis (CCA) (similar to GeneRAGE) and 72% over another global method, TribeMCL].


Asunto(s)
Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Algoritmos , Análisis por Conglomerados , Proteínas/clasificación
4.
BMC Bioinformatics ; 7: 48, 2006 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-16448555

RESUMEN

BACKGROUND: The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. RESULTS: Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC5) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC5 value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models. CONCLUSION: Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Modelos Químicos , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Proteínas/clasificación , Alineación de Secuencia/métodos , Análisis de Secuencia/métodos , Secuencia de Aminoácidos , Cadenas de Markov , Modelos Estadísticos , Datos de Secuencia Molecular , Proteínas/análisis , Homología de Secuencia de Aminoácido
5.
BMC Bioinformatics ; 7: 10, 2006 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-16403221

RESUMEN

BACKGROUND: Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. RESULTS: We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. CONCLUSION: The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Interfaz Usuario-Computador , Lenguajes de Programación , Homología de Secuencia de Aminoácido
6.
BMC Bioinformatics ; 5: 200, 2004 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-15603591

RESUMEN

BACKGROUND: Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily. RESULTS: The performance varies greatly between superfamilies with the truncated receiver operating characteristic, ROC10, varying from 0.95 down to 0.01. These large differences persist even when the profiles are trimmed to approximately the same level of diversity. CONCLUSIONS: Although the number of sequences in the profile (profile width) and degree of sequence variation within positions in the profile (profile diversity) contribute to accurate detection there are other superfamily specific factors.


Asunto(s)
Biología Computacional/normas , Análisis por Matrices de Proteínas/normas , Proteínas/química , Proteínas/genética , Benchmarking/métodos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Secuencia Conservada/genética , Bases de Datos de Proteínas/estadística & datos numéricos , Variación Genética/genética , Péptidos/química , Péptidos/genética , Análisis por Matrices de Proteínas/métodos , Estructura Terciaria de Proteína/genética , Homología de Secuencia de Aminoácido
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA