Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
J Comput Biol ; 14(9): 1229-45, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17990973

RESUMO

It is a challenging task to predict with high reliability whether plant genomic sequences contain a polyadenylation (polyA) site or not. In this paper, we solve the task by means of a systematic machine-learning procedure applied on a dataset of 1000 Arabidopsis thaliana sequences flanking polyA sites. Our procedure consists of three steps. In the first step, we extract informative features from the sequences using the highly informative k-mer windows approach. Experiments with five classifiers show that the best performance is approximately 83%. In the second step, we improve performance to 95% by reducing the number of features using linear discriminant analysis, followed by applying the linear discriminant classifier. In the third step, we apply the transductive confidence machines approach and the receiver operating characteristic isometrics approach. The resulting two classifiers enable presetting any desired performance by dealing carefully with sequences for which it is unclear whether they contain polyA sites or not. For example, in our case study, we obtain 99% performance by leaving 26% of the sequences unclassified, and 100% performance by leaving 40% of the sequences unclassified. This is clearly useful for experimental verification of putative polyA sites in the laboratory. The novel methods in our machine-learning procedure should find applications in several areas of bioinformatics.


Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Poliadenilação/genética , Análise de Sequência de DNA/métodos , Inteligência Artificial , Sequência de Bases , Análise Discriminante , Análise de Componente Principal , Curva ROC
2.
Neuroinformatics ; 6(4): 257-77, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18797828

RESUMO

Generation algorithms allow for the generation of Virtual Neurons (VNs) from a small set of morphological properties. The set describes the morphological properties of real neurons in terms of statistical descriptors such as the number of branches and segment lengths (among others). The majority of reconstruction algorithms use the observed properties to estimate the parameters of a priori fixed probability distributions in order to construct statistical descriptors that fit well with the observed data. In this article, we present a non-parametric generation algorithm based on kernel density estimators (KDEs). The new algorithm is called KDE-NEURON: and has three advantages over parametric reconstruction algorithms: (1) no a priori specifications about the distributions underlying the real data, (2) peculiarities in the biological data will be reflected in the VNs, and (3) ability to reconstruct different cell types. We experimentally generated motor neurons and granule cells, and statistically validated the obtained results. Moreover, we assessed the quality of the prototype data set and observed that our generated neurons are as good as the prototype data in terms of the used statistical descriptors. The opportunities and limitations of data-driven algorithmic reconstruction of neurons are discussed.


Assuntos
Algoritmos , Forma Celular/fisiologia , Biologia Computacional/métodos , Neuroanatomia/métodos , Neurônios/citologia , Software , Animais , Polaridade Celular/fisiologia , Simulação por Computador , Interpretação Estatística de Dados , Dendritos/fisiologia , Dendritos/ultraestrutura , Hipocampo/citologia , Hipocampo/fisiologia , Interneurônios/citologia , Interneurônios/fisiologia , Modelos Estatísticos , Neurônios Motores/citologia , Neurônios Motores/fisiologia , Neurônios/fisiologia , Ratos , Reprodutibilidade dos Testes , Medula Espinal/citologia , Medula Espinal/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA