Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Micromachines (Basel) ; 13(8)2022 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-36014286

RESUMO

Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGB-colored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., <8:8>. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems.

2.
Life (Basel) ; 12(5)2022 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-35629339

RESUMO

Epidemiological surveillance of bacterial pathogens requires real-time data analysis with a fast turnaround, while aiming at generating two main outcomes: (1) species-level identification and (2) variant mapping at different levels of genotypic resolution for population-based tracking and surveillance, in addition to predicting traits such as antimicrobial resistance (AMR). Multi-locus sequence typing (MLST) aids this process by identifying sequence types (ST) based on seven ubiquitous genome-scattered loci. In this paper, we selected one assembly-dependent and one assembly-free method for ST mapping and applied them with the default settings and ST schemes they are distributed with, and systematically assessed their accuracy and scalability across a wide array of phylogenetically divergent Public Health-relevant bacterial pathogens with available MLST databases. Our data show that the optimal k-mer length for stringMLST is species-specific and that genome-intrinsic and -extrinsic features can affect the performance and accuracy of the program. Although suitable parameters could be identified for most organisms, there were instances where this program may not be directly deployable in its current format. Next, we integrated stringMLST into our freely available and scalable hierarchical-based population genomics platform, ProkEvo, and further demonstrated how the implementation facilitates automated, reproducible bacterial population analysis.

3.
BMC Bioinformatics ; 22(1): 513, 2021 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-34674629

RESUMO

BACKGROUND: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. RESULTS: In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. CONCLUSIONS: Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/ .


Assuntos
Genoma , Transcriptoma , Consenso
4.
Front Robot AI ; 8: 600410, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34179104

RESUMO

The timing of flowering plays a critical role in determining the productivity of agricultural crops. If the crops flower too early, the crop would mature before the end of the growing season, losing the opportunity to capture and use large amounts of light energy. If the crops flower too late, the crop may be killed by the change of seasons before it is ready to harvest. Maize flowering is one of the most important periods where even small amounts of stress can significantly alter yield. In this work, we developed and compared two methods for automatic tassel detection based on the imagery collected from an unmanned aerial vehicle, using deep learning models. The first approach was a customized framework for tassel detection based on convolutional neural network (TD-CNN). The other method was a state-of-the-art object detection technique of the faster region-based CNN (Faster R-CNN), serving as baseline detection accuracy. The evaluation criteria for tassel detection were customized to correctly reflect the needs of tassel detection in an agricultural setting. Although detecting thin tassels in the aerial imagery is challenging, our results showed promising accuracy: the TD-CNN had an F1 score of 95.9% and the Faster R-CNN had 97.9% F1 score. More CNN-based model structures can be investigated in the future for improved accuracy, speed, and generalizability on aerial-based tassel detection.

5.
PeerJ ; 9: e11376, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34055480

RESUMO

Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.

6.
Mol Plant ; 10(7): 990-999, 2017 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-28602693

RESUMO

One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.


Assuntos
Sequência Conservada/genética , DNA Intergênico/genética , Genoma de Planta/genética , Algoritmos , Genômica/métodos
7.
BMC Bioinformatics ; 15 Suppl 7: S13, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25080237

RESUMO

BACKGROUND: Our environment is composed of biological components of varying magnitude. The relationships between the different biological elements can be represented as a biological network. The process of mating in S. cerevisiae is initiated by secretion of pheromone by one of the cells. Our interest lies in one particular question: how does a cell dynamically adapt the pathway to continue mating under severe environmental changes or under mutation (which might result in the loss of functionality of some proteins known to participate in the pheromone pathway). Our work attempts to answer this question. To achieve this, we first propose a model to simulate the pheromone pathway using Petri nets. Petri nets are directed graphs that can be used for describing and modeling systems characterized as concurrent, asynchronous, distributed, parallel, non-deterministic, and/or stochastic. We then analyze our Petri net-based model of the pathway to investigate the following: 1) Given the model of the pheromone response pathway, under what conditions does the cell respond positively, i.e., mate? 2) What kinds of perturbations in the cell would result in changing a negative response to a positive one? METHOD: In our model, we classify proteins into two categories: core component proteins (set ψ) and additional proteins (set λ). We randomly generate our model's parameters in repeated simulations. To simulate the pathway, we carry out three different experiments. In the experiments, we simply change the concentration of the additional proteins (λ) available to the cell. The concentration of proteins in ψ is varied consistently from 300 to 400. In Experiment 1, the range of values for λ is set to be 100 to 150. In Experiment 2, it is set to be 151 to 200. In Experiment 3, the set λ is further split into σ and ς, with the idea that proteins in σ are more important than those in ς. The range of values for σ is set to be between 151 to 200 while that of ς is 100 to 150. Decision trees were derived from each of the first two experiments to allow us to more easily analyze the conditions under which the pheromone is expressed. CONCLUSION: The simulation results reveal that a cell can overcome the detrimental effects of the conditions by using more concentration of additional proteins in λ. The first two experiments provide evidence that employing more concentration of proteins might be one of the ways that the cell uses to adapt itself in inhibiting conditions to facilitate mating. The results of the third experiment reveal that in some case the protein set σ is sufficient in regulating the response of the cell. Results of Experiments 4 and 5 reveal that there are certain conditions (parameters) in the model that are more important in determining whether a cell will respond positively or not.


Assuntos
Feromônios/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/fisiologia , Simulação por Computador , Meio Ambiente , Modelos Biológicos , Mutação , Feromônios/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Transdução de Sinais
8.
Int J Bioinform Res Appl ; 6(4): 366-83, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20940124

RESUMO

In this paper, we present a simple and efficient algorithm for multiple genome sequence alignment. Sequences of Maximal Unique Matches (MUMs) are first transformed into a multi-bipartite diagram. The diagram is then converted into a Directed Acyclic Graph (DAG). Therefore, finding the alignment is reduced to finding the longest path in the DAG, which is solvable in linear time. The experiments show that the algorithm can correctly find the alignment, and runs faster than MGA and EMAGEN. In addition, our algorithm can handle the alignments with overlapping MUMs and has both weighted and unweighted options. It provides the flexibility for the alignments depending on different needs.


Assuntos
Algoritmos , Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Sequência de Bases , Análise de Sequência de DNA
9.
Biol Direct ; 1: 10, 2006 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-16584568

RESUMO

BACKGROUND: Predicting and proper ranking of canonical splice sites (SSs) is a challenging problem in bioinformatics and machine learning communities. Any progress in SSs recognition will lead to better understanding of splicing mechanism. We introduce several new approaches of combining a priori knowledge for improved SS detection. First, we design our new Bayesian SS sensor based on oligonucleotide counting. To further enhance prediction quality, we applied our new de novo motif detection tool MHMMotif to intronic ends and exons. We combine elements found with sensor information using Naive Bayesian Network, as implemented in our new tool SpliceScan. RESULTS: According to our tests, the Bayesian sensor outperforms the contemporary Maximum Entropy sensor for 5' SS detection. We report a number of putative Exonic (ESE) and Intronic (ISE) Splicing Enhancers found by MHMMotif tool. T-test statistics on mouse/rat intronic alignments indicates, that detected elements are on average more conserved as compared to other oligos, which supports our assumption of their functional importance. The tool has been shown to outperform the SpliceView, GeneSplicer, NNSplice, Genio and NetUTR tools for the test set of human genes. SpliceScan outperforms all contemporary ab initio gene structural prediction tools on the set of 5' UTR gene fragments. CONCLUSION: Designed methods have many attractive properties, compared to existing approaches. Bayesian sensor, MHMMotif program and SpliceScan tools are freely available on our web site. REVIEWERS: This article was reviewed by Manyuan Long, Arcady Mushegian and Mikhail Gelfand.

10.
Artigo em Inglês | MEDLINE | ID: mdl-16448011

RESUMO

The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif.


Assuntos
DNA/genética , Perfilação da Expressão Gênica/métodos , Expressão Gênica/genética , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Motivos de Aminoácidos , Análise por Conglomerados , Simulação por Computador , Entropia , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA