Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 43(20): e129, 2015 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-26101252

RESUMEN

Single Molecule, Real-Time (SMRT) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.


Asunto(s)
Variación Genética , Genoma Viral , VIH-1/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Análisis por Conglomerados , Humanos , Alineación de Secuencia
2.
Biostatistics ; 10(3): 424-35, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19234308

RESUMEN

Classification studies with high-dimensional measurements and relatively small sample sizes are increasingly common. Prospective analysis of the role of sample sizes in the performance of such studies is important for study design and interpretation of results, but the complexity of typical pattern discovery methods makes this problem challenging. The approach developed here combines Monte Carlo methods and new approximations for linear discriminant analysis, assuming multivariate normal distributions. Monte Carlo methods are used to sample the distribution of which features are selected for a classifier and the mean and variance of features given that they are selected. Given selected features, the linear discriminant problem involves different distributions of training data and generalization data, for which 2 approximations are compared: one based on Taylor series approximation of the generalization error and the other on approximating the discriminant scores as normally distributed. Combining the Monte Carlo and approximation approaches to different aspects of the problem allows efficient estimation of expected generalization error without full simulations of the entire sampling and analysis process. To evaluate the method and investigate realistic study design questions, full simulations are used to ask how validation error rate depends on the strength and number of informative features, the number of noninformative features, the sample size, and the number of features allowed into the pattern. Both approximation methods perform well for most cases but only the normal discriminant score approximation performs well for cases of very many weakly informative or uninformative dimensions. The simulated cases show that many realistic study designs will typically estimate substantially suboptimal patterns and may have low probability of statistically significant validation results.


Asunto(s)
Biometría/métodos , Clasificación/métodos , Tamaño de la Muestra , Algoritmos , Genómica/estadística & datos numéricos , Humanos , Modelos Lineales , Método de Montecarlo , Análisis Multivariante , Proteómica/estadística & datos numéricos
3.
Eukaryot Cell ; 6(6): 940-8, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17468393

RESUMEN

Pre-mRNA splicing is essential to ensure accurate expression of many genes in eukaryotic organisms. In Entamoeba histolytica, a deep-branching eukaryote, approximately 30% of the annotated genes are predicted to contain introns; however, the accuracy of these predictions has not been tested. In this study, we mined an expressed sequence tag (EST) library representing 7% of amoebic genes and found evidence supporting splicing of 60% of the testable intron predictions, the majority of which contain a GUUUGU 5' splice site and a UAG 3' splice site. Additionally, we identified several splice site misannotations, evidence for the existence of 30 novel introns in previously annotated genes, and identified novel genes through uncovering their spliced ESTs. Finally, we provided molecular evidence for the E. histolytica U2, U4, and U5 snRNAs. These data lay the foundation for further dissection of the role of RNA processing in E. histolytica gene expression.


Asunto(s)
Entamoeba histolytica , Intrones , ARN Nuclear Pequeño/metabolismo , Empalmosomas/metabolismo , Animales , Secuencia de Bases , Entamoeba histolytica/genética , Entamoeba histolytica/metabolismo , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Empalme del ARN , ARN Nuclear Pequeño/química , ARN Nuclear Pequeño/genética , Empalmosomas/genética
4.
Electrophoresis ; 26(7-8): 1500-12, 2005 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-15765480

RESUMEN

A capillary electrophoresis-mass spectrometry (CE-MS) method has been developed to perform routine, automated analysis of low-molecular-weight peptides in human serum. The method incorporates transient isotachophoresis for in-line preconcentration and a sheathless electrospray interface. To evaluate the performance of the method and demonstrate the utility of the approach, an experiment was designed in which peptides were added to sera from individuals at each of two different concentrations, artificially creating two groups of samples. The CE-MS data from the serum samples were divided into separate training and test sets. A pattern-recognition/feature-selection algorithm based on support vector machines was used to select the mass-to-charge (m/z) values from the training set data that distinguished the two groups of samples from each other. The added peptides were identified correctly as the distinguishing features, and pattern recognition based on these peptides was used to assign each sample in the independent test set to its respective group. A twofold difference in peptide concentration could be detected with statistical significance (p-value < 0.0001). The accuracy of the assignment was 95%, demonstrating the utility of this technique for the discovery of patterns of biomarkers in serum.


Asunto(s)
Biomarcadores/sangre , Electroforesis Capilar/métodos , Espectrometría de Masa por Ionización de Electrospray/métodos , Automatización , Electroforesis en Gel Bidimensional , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA