Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
China CDC Wkly ; 6(21): 478-486, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38854463

RESUMO

Background: This study provides a detailed analysis of the daily fluctuations in coronavirus disease 2019 (COVID-19) case numbers in London from January 31, 2020 to February 24, 2022. The primary objective was to enhance understanding of the interactions among government pandemic responses, viral mutations, and the subsequent changes in COVID-19 case incidences. Methods: We employed the adaptive Fourier decomposition (AFD) method to analyze diurnal changes and further segmented the AFD into novel multi-component groups consisting of one to three elements. These restructured components were rigorously evaluated using Pearson correlation, and their effectiveness was compared with other signal analysis techniques. This study introduced a novel approach to differentiate individual components across various time-frequency scales using basis decomposition methods. Results: Analysis of London's daily COVID-19 data using AFD revealed a strong correlation between the "stay at home" directive and high-frequency components during the first epidemic wave. This indicates the need for sustained implementation of vaccination policies to maintain their effectiveness. Discussion: The AFD component method provides a comprehensive analysis of the immediate and prolonged impact of governmental policies on the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This robust tool has proven invaluable for analyzing COVID-19 pandemic data, offering critical insights that guide the formulation of future preventive and public health strategies.

2.
ArXiv ; 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38903738

RESUMO

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

3.
J Invasive Cardiol ; 36(3)2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38441988

RESUMO

OBJECTIVES: Coronary angiography (CAG)-derived physiology methods have been developed in an attempt to simplify and increase the usage of coronary physiology, based mostly on dynamic fluid computational algorithms. We aimed to develop a different approach based on artificial intelligence methods, which has seldom been explored. METHODS: Consecutive patients undergoing invasive instantaneous free-wave ratio (iFR) measurements were included. We developed artificial intelligence (AI) models capable of classifying target lesions as positive (iFR ≤ 0.89) or negative (iFR > 0.89). The predictions were then compared to the true measurements. RESULTS: Two hundred-fifty measurements were included, and 3 models were developed. Model 3 had the best overall performance: accuracy, negative predictive value (NPV), positive predictive value (PPV), sensitivity, and specificity were 69%, 88%, 44%, 74%, and 67%, respectively. Performance differed per target vessel. For the left anterior descending artery (LAD), model 3 had the highest accuracy (66%), while model 2 the highest NPV (86%) and sensitivity (91%). PPV was always low/modest. Model 1 had the highest specificity (68%). For the right coronary artery, model 1's accuracy was 86%, NPV was 97%, and specificity was 87%, but all models had low PPV (maximum 25%) and low/modest sensitivity (maximum 60%). For the circumflex, model 1 performed best: accuracy, NPV, PPV, sensitivity, and specificity were 69%, 96%, 24%, 80%, and 68%, respectively. CONCLUSIONS: We developed 3 AI models capable of binary iFR estimation from CAG images. Despite modest accuracy, the consistently high NPV is of potential clinical significance, as it would enable avoiding further invasive maneuvers after CAG. This pivotal study offers proof of concept for further development.


Assuntos
Inteligência Artificial , Aprendizado Profundo , Humanos , Projetos Piloto , Raios X , Angiografia Coronária
4.
Diagnostics (Basel) ; 13(24)2023 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-38132189

RESUMO

Accurately predicting functional outcomes in stroke patients remains challenging yet clinically relevant. While brain CTs provide prognostic information, their practical value for outcome prediction is unclear. We analyzed a multi-center cohort of 743 ischemic stroke patients (<72 h onset), including their admission brain NCCT and CTA scans as well as their clinical data. Our goal was to predict the patients' future functional outcome, measured by the 3-month post-stroke modified Rankin Scale (mRS), dichotomized into good (mRS ≤ 2) and poor (mRS > 2). To this end, we developed deep learning models to predict the outcome from CT data only, and models that incorporate other patient variables. Three deep learning architectures were tested in the image-only prediction, achieving 0.779 ± 0.005 AUC. In addition, we created a model fusing imaging and tabular data by feeding the output of a deep learning model trained to detect occlusions on CT angiograms into our prediction framework, which achieved an AUC of 0.806 ± 0.082. These findings highlight how further refinement of prognostic models incorporating both image biomarkers and clinical data could enable more accurate outcome prediction for ischemic stroke patients.

5.
Front Public Health ; 11: 1259084, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38106897

RESUMO

Background: As China amends its "zero COVID" strategy, a sudden increase in the number of infections may overwhelm medical resources and its impact has not been quantified. Specific mitigation strategies are needed to minimize disruption to the healthcare system and to prepare for the next possible epidemic in advance. Method: We develop a stochastic compartmental model to project the burden on the medical system (that is, the number of fever clinic visits and admission beds) of China after adjustment to COVID-19 policy, which considers the epidemiological characteristics of the Omicron variant, age composition of the population, and vaccine effectiveness against infection and severe COVD-19. We also estimate the effect of four-dose vaccinations (heterologous and homologous), antipyretic drug supply, non-pharmacological interventions (NPIs), and triage treatment on mitigating the domestic infection peak. Result: As to the impact on the medical system, this epidemic is projected to result in 398.02 million fever clinic visits and 16.58 million hospitalizations, and the disruption period on the healthcare system is 18 and 30 days, respectively. Antipyretic drug supply and booster vaccination could reduce the burden on emergency visits and hospitalization, respectively, while neither of them could not reduce to the current capacity. The synergy of several different strategies suggests that increasing the heterologous booster vaccination rate for older adult to over 90% is a key measure to alleviate the bed burden for respiratory diseases on the basis of expanded healthcare resource allocation. Conclusion: The Omicron epidemic followed the adjustment to COVID-19 policy overloading many local health systems across the country at the end of 2022. The combined effect of vaccination, antipyretic drug supply, triage treatment, and PHSMs could prevent overwhelming medical resources.


Assuntos
Antipiréticos , COVID-19 , Humanos , Idoso , Antipiréticos/uso terapêutico , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2 , China/epidemiologia , Febre , Políticas
6.
Catheter Cardiovasc Interv ; 102(4): 631-640, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37579212

RESUMO

BACKGROUND: Visual assessment of the percentage diameter stenosis (%DSVE ) of lesions is essential in coronary angiography (CAG) interpretation. We have previously developed an artificial intelligence (AI) model capable of accurate CAG segmentation. We aim to compare operators' %DSVE in angiography versus AI-segmented images. METHODS: Quantitative coronary analysis (QCA) %DS (%DSQCA ) was previously performed in our published validation dataset. Operators were asked to estimate %DSVE of lesions in angiography versus AI-segmented images in separate sessions and differences were assessed using angiography %DSQCA as reference. RESULTS: A total of 123 lesions were included. %DSVE was significantly higher in both the angiography (77% ± 20% vs. 56% ± 13%, p < 0.001) and segmentation groups (59% ± 20% vs. 56% ± 13%, p < 0.001), with a much smaller absolute %DS difference in the latter. For lesions with %DSQCA of 50%-70% (60% ± 5%), an even higher discrepancy was found (angiography: 83% ± 13% vs. 60% ± 5%, p < 0.001; segmentation: 63% ± 15% vs. 60% ± 5%, p < 0.001). Similar, less pronounced, findings were observed for %DSQCA < 50% lesions, but not %DSQCA > 70% lesions. Agreement between %DSQCA /%DSVE across %DSQCA strata (<50%, 50%-70%, >70%) was approximately twice in the segmentation group (60.4% vs. 30.1%; p < 0.001). %DSVE inter-operator differences were smaller with segmentation. CONCLUSION: %DSVE was much less discrepant with segmentation versus angiography. Overestimation of %DSQCA < 70% lesions with angiography was especially common. Segmentation may reduce %DSVE overestimation and thus unwarranted revascularization.

7.
Int J Cardiovasc Imaging ; 39(7): 1385-1396, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37027105

RESUMO

INTRODUCTION: We previously developed an artificial intelligence (AI) model for automatic coronary angiography (CAG) segmentation, using deep learning. To validate this approach, the model was applied to a new dataset and results are reported. METHODS: Retrospective selection of patients undergoing CAG and percutaneous coronary intervention or invasive physiology assessment over a one month period from four centers. A single frame was selected from images containing a lesion with a 50-99% stenosis (visual estimation). Automatic Quantitative Coronary Analysis (QCA) was performed with a validated software. Images were then segmented by the AI model. Lesion diameters, area overlap [based on true positive (TP) and true negative (TN) pixels] and a global segmentation score (GSS - 0 -100 points) - previously developed and published - were measured. RESULTS: 123 regions of interest from 117 images across 90 patients were included. There were no significant differences between lesion diameter, percentage diameter stenosis and distal border diameter between the original/segmented images. There was a statistically significant albeit minor difference [0,19 mm (0,09-0,28)] regarding proximal border diameter. Overlap accuracy ((TP + TN)/(TP + TN + FP + FN)), sensitivity (TP / (TP + FN)) and Dice Score (2TP / (2TP + FN + FP)) between original/segmented images was 99,9%, 95,1% and 94,8%, respectively. The GSS was 92 (87-96), similar to the previously obtained value in the training dataset. CONCLUSION: the AI model was capable of accurate CAG segmentation across multiple performance metrics, when applied to a multicentric validation dataset. This paves the way for future research on its clinical uses.


Assuntos
Estenose Coronária , Aprendizado Profundo , Humanos , Estenose Coronária/diagnóstico por imagem , Estenose Coronária/terapia , Inteligência Artificial , Constrição Patológica , Estudos Retrospectivos , Raios X , Valor Preditivo dos Testes , Angiografia Coronária/métodos
8.
Rev Port Cardiol ; 42(7): 643-651, 2023 07.
Artigo em Inglês, Português | MEDLINE | ID: mdl-37001583

RESUMO

INTRODUCTION: Pulmonary embolism (PE) is a life-threatening condition, in which diagnostic uncertainty remains high given the lack of specificity in clinical presentation. It requires confirmation by computed tomography pulmonary angiography (CTPA). Electrocardiography (ECG) signals can be detected by artificial intelligence (AI) with precision. The purpose of this study was to develop an AI model for predicting PE using a 12-lead ECG. METHODS: We extracted 1014 ECGs from patients admitted to the emergency department who underwent CTPA due to suspected PE: 911 ECGs were used for development of the AI model and 103 ECGs for validation. An AI algorithm based on an ensemble neural network was developed. The performance of the AI model was compared against the guideline recommended clinical prediction rules for PE (Wells and Geneva scores combined with a standard D-dimer cut-off of 500 ng/mL and an age-adjusted cut-off, PEGeD and YEARS algorithm). RESULTS: The AI model achieves greater specificity to detect PE than the commonly used clinical prediction rules. The AI model shown a specificity of 100% (95% confidence interval (CI): 94-100) and a sensitivity of 50% (95% CI: 33-67). The AI model performed significantly better than the other models (area under the curve 0.75; 95% CI 0.66-0.82; p<0.001), which had nearly no discriminative power. The incidence of typical PE ECG features was similar in patients with and without PE. CONCLUSION: We developed and validated a deep learning-based AI model for PE diagnosis using a 12-lead ECG and it demonstrated high specificity.


Assuntos
Inteligência Artificial , Embolia Pulmonar , Humanos , Embolia Pulmonar/diagnóstico , Aprendizado de Máquina , Eletrocardiografia/métodos , Estudos Retrospectivos
9.
Sci Rep ; 13(1): 467, 2023 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-36627317

RESUMO

Given the inherent complexity of the human nervous system, insight into the dynamics of brain activity can be gained from studying smaller and simpler organisms. While some of the potential target organisms are simple enough that their behavioural and structural biology might be well-known and understood, others might still lead to computationally intractable models that require extensive resources to simulate. Since such organisms are frequently only acting as proxies to further our understanding of underlying phenomena or functionality, often one is not interested in the detailed evolution of every single neuron in the system. Instead, it is sufficient to observe the subset of neurons that capture the effect that the profound nonlinearities of the neuronal system have in response to different stimuli. In this paper, we consider the well-known nematode Caenorhabditis elegans and seek to investigate the possibility of generating lower complexity models that capture the system's dynamics with low error using only measured or simulated input-output information. Such models are often termed black-box models. We show how the nervous system of C. elegans can be modelled and simulated with data-driven models using different neural network architectures. Specifically, we target the use of state-of-the-art recurrent neural network architectures such as Long Short-Term Memory and Gated Recurrent Units and compare these architectures in terms of their properties and their accuracy (Root Mean Square Error), as well as the complexity of the resulting models. We show that Gated Recurrent Unit models with a hidden layer size of 4 are able to accurately reproduce the system response to very different stimuli. We furthermore explore the relative importance of their inputs as well as scalability to more scenarios.


Assuntos
Caenorhabditis elegans , Fenômenos Fisiológicos do Sistema Nervoso , Animais , Humanos , Caenorhabditis elegans/fisiologia , Redes Neurais de Computação , Neurônios/fisiologia , Aprendizagem
10.
Rev Port Cardiol ; 41(12): 1011-1021, 2022 12.
Artigo em Inglês, Português | MEDLINE | ID: mdl-36511271

RESUMO

INTRODUCTION AND OBJECTIVES: Although automatic artificial intelligence (AI) coronary angiography (CAG) segmentation is arguably the first step toward future clinical application, it is underexplored. We aimed to (1) develop AI models for CAG segmentation and (2) assess the results using similarity scores and a set of criteria defined by expert physicians. METHODS: Patients undergoing CAG were randomly selected in a retrospective study at a single center. Per incidence, an ideal frame was segmented, forming a baseline human dataset (BH), used for training a baseline AI model (BAI). Enhanced human segmentation (EH) was created by combining the best of both. An enhanced AI model (EAI) was trained using the EH. Results were assessed by experts using 11 weighted criteria, combined into a Global Segmentation Score (GSS: 0-100 points). Generalized Dice Score (GDS) and Dice Similarity Coefficient (DSC) were also used for AI models assessment. RESULTS: 1664 processed images were generated. GSS for BH, EH, BAI and EAI were 96.9+/-5.7; 98.9+/-3.1; 86.1+/-10.1 and 90+/-7.6, respectively (95% confidence interval, p<0.001 for both paired and global differences). The GDS for the BAI and EAI was 0.9234±0.0361 and 0.9348±0.0284, respectively. The DSC for the coronary tree was 0.8904±0.0464 and 0.9134±0.0410 for the BAI and EAI, respectively. The EAI outperformed the BAI in all coronary segmentation tasks, but performed less well in some catheter segmentation tasks. CONCLUSIONS: We successfully developed AI models capable of CAG segmentation, with good performance as assessed by all scores.


Assuntos
Aprendizado Profundo , Humanos , Tomografia Computadorizada por Raios X , Inteligência Artificial , Estudos Retrospectivos , Raios X , Angiografia Coronária
11.
Proc Natl Acad Sci U S A ; 119(23): e2205971119, 2022 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-35609191
12.
Biotechnol J ; 14(8): e1800613, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30927505

RESUMO

Developments in biotechnology are increasingly dependent on the extensive use of big data, generated by modern high-throughput instrumentation technologies, and stored in thousands of databases, public and private. Future developments in this area depend, critically, on the ability of biotechnology researchers to master the skills required to effectively integrate their own contributions with the large amounts of information available in these databases. This article offers a perspective of the relations that exist between the fields of big data and biotechnology, including the related technologies of artificial intelligence and machine learning and describes how data integration, data exploitation, and process optimization correspond to three essential steps in any future biotechnology project. The article also lists a number of application areas where the ability to use big data will become a key factor, including drug discovery, drug recycling, drug safety, functional and structural genomics, proteomics, pharmacogenetics, and pharmacogenomics, among others.


Assuntos
Inteligência Artificial , Big Data , Biotecnologia/métodos , Animais , Mineração de Dados , Bases de Dados Factuais , Humanos , Aprendizado de Máquina
13.
IEEE/ACM Trans Comput Biol Bioinform ; 15(6): 1953-1959, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29994736

RESUMO

Ischemic stroke is a leading cause of disability and death worldwide among adults. The individual prognosis after stroke is extremely dependent on treatment decisions physicians take during the acute phase. In the last five years, several scores such as the ASTRAL, DRAGON, and THRIVE have been proposed as tools to help physicians predict the patient functional outcome after a stroke. These scores are rule-based classifiers that use features available when the patient is admitted to the emergency room. In this paper, we apply machine learning techniques to the problem of predicting the functional outcome of ischemic stroke patients, three months after admission. We show that a pure machine learning approach achieves only a marginally superior Area Under the ROC Curve (AUC) ( 0.808±0.085) than that of the best score ( 0.771±0.056) when using the features available at admission. However, we observed that by progressively adding features available at further points in time, we can significantly increase the AUC to a value above 0.90. We conclude that the results obtained validate the use of the scores at the time of admission, but also point to the importance of using more features, which require more advanced methods, when possible.


Assuntos
Isquemia Encefálica , Diagnóstico por Computador/métodos , Aprendizado de Máquina , Algoritmos , Área Sob a Curva , Isquemia Encefálica/diagnóstico , Isquemia Encefálica/epidemiologia , Isquemia Encefálica/fisiopatologia , Isquemia Encefálica/terapia , Humanos , Resultado do Tratamento
14.
PLoS One ; 8(10): e76300, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24194833

RESUMO

It is widely agreed that complex diseases are typically caused by the joint effects of multiple instead of a single genetic variation. These genetic variations may show stronger effects when considered together than when considered individually, a phenomenon known as epistasis or multilocus interaction. In this work, we explore the applicability of information interaction to discover pairwise epistatic effects related with complex diseases. We start by showing that traditional approaches such as classification methods or greedy feature selection methods (such as the Fleuret method) do not perform well on this problem. We then compare our information interaction method with BEAM and SNPHarvester in artificial datasets simulating epistatic interactions and show that our method is more powerful to detect pairwise epistatic interactions than its competitors. We show results of the application of information interaction method to the WTCCC breast cancer dataset. Our results are validated using permutation tests. We were able to find 89 statistically significant pairwise interactions with a p-value lower than 10(-3). Even though many recent algorithms have been designed to find epistasis with low marginals, we observed that all (except one) of the SNPs involved in statistically significant interactions have moderate or high marginals. We also report that the interactions found in this work were not present in gene-gene interaction network STRING.


Assuntos
Neoplasias da Mama/epidemiologia , Causalidade , Suscetibilidade a Doenças/epidemiologia , Epistasia Genética/genética , Modelos Teóricos , Biologia Computacional/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética
15.
J Bioinform Comput Biol ; 9(5): 613-30, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21976379

RESUMO

In this study we address the problem of finding a quantitative mathematical model for the genetic network regulating the stress response of the yeast Saccharomyces cerevisiae to the agricultural fungicide mancozeb. An S-system formalism was used to model the interactions of a five-gene network encoding four transcription factors (Yap1, Yrr1, Rpn4 and Pdr3) regulating the transcriptional activation of the FLR1 gene. Parameter estimation was accomplished by decoupling the resulting system of nonlinear ordinary differential equations into a larger nonlinear algebraic system, and using the Levenberg-Marquardt algorithm to fit the models predictions to experimental data. The introduction of constraints in the model, related to the putative topology of the network, was explored. The results show that forcing the network connectivity to adhere to this topology did not lead to better results than the ones obtained using an unrestricted network topology. Overall, the modeling approach obtained partial success when trained on the nonmutant datasets, although further work is required if one wishes to obtain more accurate prediction of the time courses.


Assuntos
Modelos Genéticos , Transportadores de Ânions Orgânicos/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Biologia Computacional , Proteínas de Ligação a DNA/genética , Fungicidas Industriais/farmacologia , Redes Reguladoras de Genes , Genes Fúngicos/efeitos dos fármacos , Maneb/farmacologia , Dinâmica não Linear , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Estresse Fisiológico , Fatores de Transcrição/genética , Ativação Transcricional/efeitos dos fármacos , Zineb/farmacologia
16.
Bioinformatics ; 27(22): 3149-57, 2011 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-21965816

RESUMO

MOTIVATION: Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor-target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies. RESULTS: We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.


Assuntos
Regulação da Expressão Gênica , Redes Reguladoras de Genes , Fatores de Transcrição/metabolismo , Transcrição Gênica , Ácido Acético/farmacologia , Sítios de Ligação , Humanos , Metástase Neoplásica , Quinina/farmacologia , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transcrição Gênica/efeitos dos fármacos
17.
BMC Bioinformatics ; 12: 163, 2011 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-21672185

RESUMO

BACKGROUND: Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. RESULTS: We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454) system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. CONCLUSIONS: The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Animais , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alinhamento de Sequência , Software
18.
Algorithms Mol Biol ; 6: 13, 2011 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-21513505

RESUMO

BACKGROUND: Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied. RESULTS: We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote. CONCLUSIONS: The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.

19.
J Comput Biol ; 17(8): 969-92, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20726791

RESUMO

Given a set of genotypes from a population, the process of recovering the haplotypes that explain the genotypes is called haplotype inference. The haplotype inference problem under the assumption of pure parsimony consists in finding the smallest number of haplotypes that explain a given set of genotypes. This problem is NP-hard. The original formulations for solving the Haplotype Inference by Pure Parsimony (HIPP) problem were based on integer linear programming and branch-and-bound techniques. More recently, solutions based on Boolean satisfiability, pseudo-Boolean optimization, and answer set programming have been shown to be remarkably more efficient. HIPP can now be regarded as a feasible approach for haplotype inference, which can be competitive with other different approaches. This article provides an overview of the methods for solving the HIPP problem, including preprocessing, bounding techniques, and heuristic approaches. The article also presents an empirical evaluation of exact HIPP solvers on a comprehensive set of synthetic and real problem instances. Moreover, the bounding techniques to the exact problem are evaluated. The final section compares and discusses the HIPP approach with a well-established statistical method that represents the reference algorithm for this problem.


Assuntos
Haplótipos , Modelos Genéticos , Algoritmos , Animais , Genótipo , Humanos
20.
Artigo em Inglês | MEDLINE | ID: mdl-20150677

RESUMO

Although most biclustering formulations are NP-hard, in time series expression data analysis, it is reasonable to restrict the problem to the identification of maximal biclusters with contiguous columns, which correspond to coherent expression patterns shared by a group of genes in consecutive time points. This restriction leads to a tractable problem. We propose an algorithm that finds and reports all maximal contiguous column coherent biclusters in time linear in the size of the expression matrix. The linear time complexity of CCC-Biclustering relies on the use of a discretized matrix and efficient string processing techniques based on suffix trees. We also propose a method for ranking biclusters based on their statistical significance and a methodology for filtering highly overlapping and, therefore, redundant biclusters. We report results in synthetic and real data showing the effectiveness of the approach and its relevance in the discovery of regulatory modules. Results obtained using the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress show not only the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge but also the utility of using this algorithm in the study of other environmental stresses and of regulatory modules in general.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Proteoma/metabolismo , Elementos Reguladores de Transcrição/fisiologia , Transdução de Sinais/fisiologia , Análise por Conglomerados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...