Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Neural Comput ; 29(8): 2164-2176, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28562212

RESUMO

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

2.
Evol Comput ; 24(2): 347-84, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26066806

RESUMO

The fitness landscape of the travelling salesman problem is investigated for 11 different types of the problem. The types differ in how the distances between cities are generated. Many different properties of the landscape are studied. The properties chosen are all potentially relevant to choosing an appropriate search algorithm. The analysis includes a scaling study of the time to reach a local optimum, the number of local optima, the expected probability of reaching a local optimum as a function of its fitness, the expected fitness found by local search and the best fitness, the probability of reaching a global optimum, the distance between the local optima and the global optimum, the expected fitness as a function of the distance from an optimum, their basins of attraction and a principal component analysis of the local optima. The principal component analysis shows the correlation of the local optima in the component space. We show how the properties of the principal components of the local optima change from one problem type to another.


Assuntos
Modelos Teóricos , Algoritmos , Análise de Componente Principal
3.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 593-607, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34982674

RESUMO

We describe a novel semi-supervised learning method that reduces the labelling effort needed to train convolutional neural networks (CNNs) when processing georeferenced imagery. This allows deep learning CNNs to be trained on a per-dataset basis, which is useful in domains where there is limited learning transferability across datasets. The method identifies representative subsets of images from an unlabelled dataset based on the latent representation of a location guided autoencoder. We assess the method's sensitivities to design options using four different ground-truthed datasets of georeferenced environmental monitoring images, where these include various scenes in aerial and seafloor imagery. Efficiency gains are achieved for all the aerial and seafloor image datasets analysed in our experiments, demonstrating the benefit of the method across application domains. Compared to CNNs of the same architecture trained using conventional transfer and active learning, the method achieves equivalent accuracy with an order of magnitude fewer annotations, and 85 % of the accuracy of CNNs trained conventionally with approximately 10,000 human annotations using just 40 prioritised annotations. The biggest gains in efficiency are seen in datasets with unbalanced class distributions and rare classes that have a relatively small number of observations.

4.
J Theor Biol ; 266(3): 343-57, 2010 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-20619272

RESUMO

The dynamics of transcriptional control involve small numbers of molecules and result in significant fluctuations in protein and mRNA concentrations. The correlations between these intrinsic fluctuations then offer, via the fluctuation dissipation relation, the possibility of capturing the system's response to external perturbations, and hence the nature of the regulatory activity itself. We show that for simple regulatory networks of activators and repressors, the correlated fluctuations between molecular species show distinct characteristics for changes in regulatory mechanism and for changes to the topology of causal influence. Here, we do a stochastic analysis and derive time-dependent correlation functions between molecular species of regulatory networks and present analytical and numerical results on peaks and delays in correlations between proteins within networks. Upon using these values of peaks and delays as a two-dimensional feature space, we find that different regulatory mechanisms separate into distinct clusters. This indicates that experimentally observable pairwise correlations can distinguish between gene regulatory networks.


Assuntos
Algoritmos , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Modelos Genéticos , Análise por Conglomerados , Perfilação da Expressão Gênica , Proteínas/genética , Proteínas/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
5.
BMC Bioinformatics ; 8: 357, 2007 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-17888163

RESUMO

BACKGROUND: The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs). RESULTS: In the GA procedure, populations of HMMs are assembled from biologically meaningful building blocks. Mutation and crossover operators were designed to explore the space of such Block-HMMs. After each step of the GA, the standard HMM estimation algorithm (the Baum-Welch algorithm) was used to update model parameters. The final HMM captures several features of protein sequence and structure, with its own HMM grammar. In contrast to neural network based predictors, the evolved HMM also calculates the probabilities associated with the predictions. We carefully examined the performance of the HMM based predictor, both under the multiple- and single-sequence condition. CONCLUSION: We have shown that the proposed evolutionary method can automatically design the topology of HMMs. The method reads the grammar of protein sequences and converts it into the grammar of an HMM. It improved previously suggested evolutionary methods and increased the prediction quality. Especially, it shows good performance under the single-sequence condition and provides probabilistic information on the prediction result. The protein secondary structure predictor using HMMs (P.S.HMM) is on-line available http://www.binf.ku.dk/~won/pshmm.htm. It runs under the single-sequence condition.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Cadeias de Markov , Dados de Sequência Molecular
6.
Nucleic Acids Res ; 33(19): e171, 2005 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-16275781

RESUMO

Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20-30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Cromossomos Humanos Par 1 , Estudos de Viabilidade , Genoma Bacteriano , Genoma Humano , Genoma Viral , Humanos
7.
IEEE Trans Pattern Anal Mach Intell ; 28(11): 1738-52, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17063680

RESUMO

Extracting full-body motion of walking people from monocular video sequences in complex, real-world environments is an important and difficult problem, going beyond simple tracking, whose satisfactory solution demands an appropriate balance between use of prior knowledge and learning from data. We propose a consistent Bayesian framework for introducing strong prior knowledge into a system for extracting human gait. In this work, the strong prior is built from a simple articulated model having both time-invariant (static) and time-variant (dynamic) parameters. The model is easily modified to cater to situations such as walkers wearing clothing that obscures the limbs. The statistics of the parameters are learned from high-quality (indoor laboratory) data and the Bayesian framework then allows us to "bootstrap" to accurate gait extraction on the noisy images typical of cluttered, outdoor scenes. To achieve automatic fitting, we use a hidden Markov model to detect the phases of images in a walking cycle. We demonstrate our approach on silhouettes extracted from fronto-parallel ("sideways on") sequences of walkers under both high-quality indoor and noisy outdoor conditions. As well as high-quality data with synthetic noise and occlusions added, we also test walkers with rucksacks, skirts, and trench coats. Results are quantified in terms of chamfer distance and average pixel error between automatically extracted body points and corresponding hand-labeled points. No one part of the system is novel in itself, but the overall framework makes it feasible to extract gait from very much poorer quality image sequences than hitherto. This is confirmed by comparing person identification by gait using our method and a well-established baseline recognition algorithm.


Assuntos
Algoritmos , Inteligência Artificial , Fenômenos Biomecânicos/métodos , Marcha/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Articulações/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Teorema de Bayes , Análise por Conglomerados , Simulação por Computador , Diagnóstico por Computador/métodos , Humanos , Aumento da Imagem/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
8.
Artigo em Inglês | MEDLINE | ID: mdl-27045827

RESUMO

The behaviour of a high dimensional stochastic system described by a chemical master equation (CME) depends on many parameters, rendering explicit simulation an inefficient method for exploring the properties of such models. Capturing their behaviour by low-dimensional models makes analysis of system behaviour tractable. In this paper, we present low dimensional models for the noise-induced excitable dynamics in Bacillus subtilis, whereby a key protein ComK, which drives a complex chain of reactions leading to bacterial competence, gets expressed rapidly in large quantities (competent state) before subsiding to low levels of expression (vegetative state). These rapid reactions suggest the application of an adiabatic approximation of the dynamics of the regulatory model that, however, lead to competence durations that are incorrect by a factor of 2. We apply a modified version of an iterative functional procedure that faithfully approximates the time-course of the trajectories in terms of a two-dimensional model involving proteins ComK and ComS. Furthermore, in order to describe the bimodal bivariate marginal probability distribution obtained from the Gillespie simulations of the CME, we introduce a tunable multiplicative noise term in a two-dimensional Langevin model whose stationary state is described by the time-independent solution of the corresponding Fokker-Planck equation.


Assuntos
Bacillus subtilis/genética , Biologia Computacional/métodos , Competência de Transformação por DNA/genética , Redes Reguladoras de Genes/genética , Modelos Genéticos , Algoritmos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Processos Estocásticos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
9.
PLoS One ; 3(6): e2500, 2008 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-18563203

RESUMO

BACKGROUND: Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences. RESULTS: We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence. CONCLUSIONS: Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools.


Assuntos
Sondas Moleculares , HIV/genética , Hepacivirus/genética , Orthomyxoviridae/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
10.
Bioinformatics ; 20(18): 3613-9, 2004 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-15297297

RESUMO

SUMMARY: Hidden Markov models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic means of optimizing the structure of HMMs would be highly desirable. However, this raises two important issues; first, the new HMMs should be biologically interpretable, and second, we need to control the complexity of the HMM so that it has good generalization performance on unseen sequences. In this paper, we explore the possibility of using a genetic algorithm (GA) for optimizing the HMM structure. GAs are sufficiently flexible to allow incorporation of other techniques such as Baum-Welch training within their evolutionary cycle. Furthermore, operators that alter the structure of HMMs can be designed to favour interpretable and simple structures. In this paper, a training strategy using GAs is proposed, and it is tested on finding HMM structures for the promoter and coding region of the bacterium Campylobacter jejuni. The proposed GA for hidden Markov models (GA-HMM) allows, HMMs with different numbers of states to evolve. To prevent over-fitting, a separate dataset is used for comparing the performance of the HMMs to that used for the Baum-Welch training. The GA-HMM was capable of finding an HMM comparable to a hand-coded HMM designed for the same task, which has been published previously.


Assuntos
Algoritmos , Inteligência Artificial , Mapeamento Cromossômico/métodos , Modelos Genéticos , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Campylobacter jejuni/genética , Cadeias de Markov , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA