Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
BMC Bioinformatics ; 14 Suppl 9: S6, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23901840

RESUMEN

BACKGROUND: Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly from short reads and identify the coding ORFs, bypassing other challenging tasks such as the assembly of the metagenome. RESULTS: In this paper we introduce a metagenomics gene caller (MGC) which is an improvement over the state-of-the-art prediction algorithm Orphelia. Orphelia uses a two-stage machine learning approach and computes a model that classifies extracted ORFs from fragmented sequences. We hypothesise and demonstrate evidence that sequences need separate models based on their local GC-content in order to avoid the noise introduced to a single model computed with sequences from the entire GC spectrum. We have also added two amino-acid features based on the benefit of amino-acid usage shown in our previous research. Our algorithm is able to predict genes and translation initiation sites (TIS) more accurately than Orphelia which uses a single model. CONCLUSIONS: Learning separate models for several pre-defined GC-content regions as opposed to a single model approach improves the performance of the neural network as demonstrated by the experimental results presented in this paper. The inclusion of amino-acid usage features also helps improve the overall accuracy of our algorithm. MGC's improvement sets the ground for further investigation into the use of GC-content to separate data for training models in machine learning based gene finders.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Metagenómica/métodos , Redes Neurales de la Computación , Sistemas de Lectura Abierta , Composición de Base , Secuencia de Bases , Codón , Genoma Arqueal , Genoma Bacteriano
2.
Proteome Sci ; 11(Suppl 1): S4, 2013 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-24565419

RESUMEN

Independent of the approach used, the ability to correctly interpret tandem MS data depends on the quality of the original spectra. Even in the case of the highest quality spectra, the majority of spectral peaks can not be reliably interpreted. The accuracy of sequencing algorithms can be improved by filtering out such 'noise' peaks. Preprocessing MS/MS spectra to select informative ion peaks increases accuracy and reduces the processing time. Intuitively, the mix of informative versus non-informative peaks has a direct effect on the quality and size of the resulting candidate peptide search space. As the number of selected peaks increases, the corresponding search space increases exponentially. If we select too few peaks then the ion-ladder interpretation of the spectrum will contain gaps that can only be explained by permutations of combinations of amino acids. This will result in a larger candidate peptide search space and poorer quality candidates. The dependency that peptide sequencing accuracy has on an initial peak selection regime makes this preprocessing step a crucial facet of any approach, whether de novo or not, to MS/MS spectra interpretation.

3.
J Invest Surg ; 22(1): 35-45, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19191156

RESUMEN

Successfully engineering functional muscle tissue either in vitro or in vivo to treat muscle defects rather than using the host muscle transfer would be revolutionary. Tissue engineering is on the cutting edge of biomedical research, bridging a gap between the clinic and the bench top. A new focus on skeletal muscle tissue engineering has led investigators to explore the application of satellite cells (autologous muscle precursor cells) as a vehicle for engineering tissues either in vitro or in vivo. However, few skeletal muscle tissue-engineering studies have reported on successful generation of living tissue substitutes for functional skeletal muscle replacement. Our model system combines a novel aligned collagen tube and autologous skeletal muscle satellite cells to create an engineered tissue repair for a surgically created ventral hernia as previously reported [SA Fann, L Terracio, W Yan, et al., A model of tissue-engineered ventral hernia repair, J Invest Surg. 2006;19(3):193-205]. Several key features we specifically observe are the significant persistence of transplanted skeletal muscle cell mass within the engineered repair, the integration of new tissue with adjacent native muscle, and the presence of significant neovascularization. In this study, we report on our experience investigating the genetic signals important to the integration of neoskeletal muscle tissue. The knowledge gained from our model system applies to the repair of severely injured extremities, maxillofacial reconstructions, and restorative procedures following tumor excision in other areas of the body.


Asunto(s)
Hernia Ventral/cirugía , Músculo Esquelético/metabolismo , Neovascularización Fisiológica , Ingeniería de Tejidos , Trasplante de Tejidos , Animales , Moléculas de Adhesión Celular/genética , Moléculas de Adhesión Celular/metabolismo , Citocinas/genética , Citocinas/metabolismo , Proteínas de la Matriz Extracelular/genética , Proteínas de la Matriz Extracelular/metabolismo , Perfilación de la Expresión Génica , Hernia Ventral/metabolismo , Hipoxia/metabolismo , Péptidos y Proteínas de Señalización Intercelular/genética , Péptidos y Proteínas de Señalización Intercelular/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Ratas , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
4.
Syst Biol ; 54(2): 268-76, 2005 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16012097

RESUMEN

A number of recent papers have suggested that gene family content can be used to resolve phylogenies, particularly in the case of prokaryotes, in which extensive horizontal gene transfer means that individual gene phylogenies may not mirror the organismal phylogeny. However, no study has yet examined how sensitive such analyses are to the criterion of homology assessment used to assemble multigene families. Using data from 99 completely sequenced prokaryotic genomes, we examined the effect of homology criteria in phylogenetic analyses wherein presence or absence of each family in the genome was used as a cladistic character. Different criteria resulted in evidence for contradictory tree topologies, sometimes with high bootstrap support. A moderately strict criterion seemed best for assembling multigene families in a biologically meaningful way, but it was not necessarily preferable for phylogenetic analysis. Instead, a very strict criterion, which broke up gene families into smaller subfamilies, seemed to have advantages for phylogenetic purposes. The poor performance of gene family content-based phylogenetic analysis in the case of prokaryotes appears to reflect high levels of homoplasy resulting not only from horizontal gene transfer but also, more importantly, from extensive parallel loss of gene families in certain bacteria genomes.


Asunto(s)
Archaea/genética , Bacterias/genética , Clasificación/métodos , Familia de Multigenes/genética , Filogenia , Biología Computacional , Interpretación Estadística de Datos , Transferencia de Gen Horizontal/genética
5.
Bioinformatics ; 21(8): 1349-57, 2005 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-15572467

RESUMEN

MOTIVATION: In the event of an outbreak of a disease caused by an initially unknown pathogen, the ability to characterize anonymous sequences prior to isolation and culturing of the pathogen will be helpful. We show that it is possible to classify viral sequences by genome type (dsDNA, ssDNA, ssRNA positive strand, ssRNA negative strand, retroid) using amino acid distribution. RESULTS: In this paper we describe the results of analysis of amino acid preference in mammalian viruses. The study was carried out at the genome level as well as two shorter sequence levels: short (300 amino acids) and medium length (660 amino acids). The analysis indicates a correlation between the viral genome types dsDNA, ssDNA, ssRNA positive strand, ssRNA negative strand and retroid and amino acid preference. We investigated three different models of amino acid preference. The simplest amino acid preference model, 1-AAP, is a normalized description of the frequency of amino acids in genomes of a viral genome type. A slightly more complex model is the ordered pair amino acid preference model (2-AAP), which characterizes genomes of different viral genome types by the frequency of ordered pairs of amino acids. The most complex and accurate model is the ordered triple amino acid preference model (3-AAP), which is based on ordered triples of amino acids. The results demonstrate that mammalian viral genome types differ in their amino acid preference. AVAILABILITY: The tools used to format and analyze data and supplementary material are available at http://www.cse.sc.edu/~rose/aminoPreference/index.html CONTACT: rose@cse.sc.edu.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , ADN Viral/genética , Genoma Viral , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Proteínas Virales/genética , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Mamíferos , Modelos Genéticos , Datos de Secuencia Molecular , Especificidad de la Especie , Estadística como Asunto
6.
Mol Phylogenet Evol ; 29(3): 410-6, 2003 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-14615183

RESUMEN

The genome of Arabidopsis thaliana is known to contain numerous open reading frames apparently encoding transposases. In order to test the hypothesis that transposable elements have played a role in segmental duplication in this species, we compared the distribution of transposable elements with that of genomic windows that shared gene families to a greater extent than expected by chance. Phylogenetic analyses indicated that duplication of these segments occurred after the monocot-dicot divergence and probably after the eurosid I-eurosid II divergence. Known transposable elements were found to occur in putatively duplicated segments to a far greater extent than expected on the basis of their genome-wide distribution, suggesting that transposition may have played a role in segmental duplication in this species.


Asunto(s)
Arabidopsis/genética , Elementos Transponibles de ADN/genética , Evolución Molecular , Duplicación de Gen , Genoma de Planta , Filogenia
7.
Bioinformatics ; 20(16): 2834-5, 2004 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-15145815

RESUMEN

UNLABELLED: Dblox and RDblox provide a simple statistical test for duplicated genomic structure; the same programs can also be used to identify putatively duplicated regions. The method focuses on ancient duplication events involving protein-coding genes. AVAILABILITY: http://www.biol.sc.edu/~austin/


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Evolución Molecular , Duplicación de Gen , Genes Duplicados/genética , Proteínas/genética , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Genes/genética , Modelos Genéticos , Modelos Estadísticos , Alineación de Secuencia/métodos , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA