Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38676570

RESUMO

MOTIVATION: Bacterial genomes present more variability than human genomes, which requires important adjustments in computational tools that are developed for human data. In particular, bacteria exhibit a mosaic structure due to homologous recombinations, but this fact is not sufficiently captured by standard read mappers that align against linear reference genomes. The recent introduction of pangenomics provides some insights in that context, as a pangenome graph can represent the variability within a species. However, the concept of sequence-to-graph alignment that captures the presence of recombinations has not been previously investigated. RESULTS: In this paper, we present the extension of the notion of sequence-to-graph alignment to a variation graph that incorporates a recombination, so that the latter are explicitly represented and evaluated in an alignment. Moreover, we present a dynamic programming approach for the special case where there is at most a recombination-we implement this case as RecGraph. From a modelling point of view, a recombination corresponds to identifying a new path of the variation graph, where the new arc is composed of two halves, each extracted from an original path, possibly joined by a new arc. Our experiments show that RecGraph accurately aligns simulated recombinant bacterial sequences that have at most a recombination, providing evidence for the presence of recombination events. AVAILABILITY AND IMPLEMENTATION: Our implementation is open source and available at https://github.com/AlgoLab/RecGraph.


Assuntos
Algoritmos , Genoma Bacteriano , Recombinação Genética , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Humanos , Software , Análise de Sequência de DNA/métodos , Genômica/métodos
2.
BMC Bioinformatics ; 22(Suppl 15): 625, 2022 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-35439933

RESUMO

BACKGROUND: Being able to efficiently call variants from the increasing amount of sequencing data daily produced from multiple viral strains is of the utmost importance, as demonstrated during the COVID-19 pandemic, in order to track the spread of the viral strains across the globe. RESULTS: We present MALVIRUS, an easy-to-install and easy-to-use application that assists users in multiple tasks required for the analysis of a viral population, such as the SARS-CoV-2. MALVIRUS allows to: (1) construct a variant catalog consisting in a set of variations (SNPs/indels) from the population sequences, (2) efficiently genotype and annotate variants of the catalog supported by a read sample, and (3) when the considered viral species is the SARS-CoV-2, assign the input sample to the most likely Pango lineages using the genotyped variations. CONCLUSIONS: Tests on Illumina and Nanopore samples proved the efficiency and the effectiveness of MALVIRUS in analyzing SARS-CoV-2 strain samples with respect to publicly available data provided by NCBI and the more complete dataset provided by GISAID. A comparison with state-of-the-art tools showed that MALVIRUS is always more precise and often have a better recall.


Assuntos
COVID-19 , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Pandemias , Filogenia , SARS-CoV-2/genética
3.
Bioinformatics ; 37(2): 178-184, 2021 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-32730595

RESUMO

MOTIVATION: The latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies. Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases. RESULTS: To overcome these limitations, in this article, we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data. AVAILABILITY AND IMPLEMENTATION: An open source implementation of MP3 is publicly available at https://github.com/AlgoLab/mp3treesim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Árvores , Evolução Biológica , Filogenia , Análise de Sequência , Software
4.
Bioinformatics ; 37(4): 464-472, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32926128

RESUMO

MOTIVATION: Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. RESULTS: We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. AVAILABILITY AND IMPLEMENTATION: The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Tubarões , Processamento Alternativo , Animais , RNA-Seq , Análise de Sequência de RNA , Tubarões/genética , Software
5.
Bioinformatics ; 37(3): 326-333, 2021 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-32805010

RESUMO

MOTIVATION: In recent years, the well-known Infinite Sites Assumption has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging single-cell sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. RESULTS: We present Simulated Annealing Single-Cell inference (SASC): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS datasets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real datasets and in comparison with some other available methods. AVAILABILITY AND IMPLEMENTATION: The SASC tool is open source and available at https://github.com/sciccolella/sasc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Neoplasias , Análise de Célula Única , Humanos , Mutação , Neoplasias/genética , Filogenia , Análise de Sequência , Software
6.
Nat Comput ; 21(1): 81-108, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36969737

RESUMO

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.

7.
BMC Bioinformatics ; 21(Suppl 1): 413, 2020 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-33297943

RESUMO

BACKGROUND: Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. RESULTS: We present a new tool, gpps, that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gpps . CONCLUSIONS: gpps provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data.


Assuntos
Biologia Computacional/métodos , Análise Mutacional de DNA , Progressão da Doença , Mutação , Neoplasias/genética , Neoplasias/patologia , Sequência de Bases , Evolução Molecular , Humanos , Filogenia , Análise de Célula Única
8.
BMC Bioinformatics ; 19(1): 252, 2018 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-29970002

RESUMO

BACKGROUND: Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages. RESULTS: Here, we propose a new method for assembling haplotypes which combines and extends the features of previous approaches to deal with long reads and higher coverages. In particular, our algorithm is able to dynamically adapt the estimated number of errors at each variant site, while minimizing the total number of error corrections necessary for finding a feasible solution. This allows our method to significantly reduce the required computational resources, allowing to consider datasets composed of higher coverages. The algorithm has been implemented in a freely available tool, HapCHAT: Haplotype Assembly Coverage Handling by Adapting Thresholds. An experimental analysis on sequencing reads with up to 60 × coverage reveals improvements in accuracy and recall achieved by considering a higher coverage with lower runtimes. CONCLUSIONS: Our method leverages the long-range information of sequencing reads that allows to obtain assembled haplotypes fragmented in a lower number of unphased haplotype blocks. At the same time, our method is also able to deal with higher coverages to better correct the errors in the original reads and to obtain more accurate haplotypes as a result. AVAILABILITY: HapCHAT is available at http://hapchat.algolab.eu under the GNU Public License (GPL).


Assuntos
Haplótipos/genética , Análise de Sequência de DNA/métodos , Algoritmos , Humanos
9.
BMC Genomics ; 15 Suppl 6: S10, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25572381

RESUMO

BACKGROUND: The perfect phylogeny is an often used model in phylogenetics since it provides an efficient basic procedure for representing the evolution of genomic binary characters in several frameworks, such as for example in haplotype inference. The model, which is conceptually the simplest, is based on the infinite sites assumption, that is no character can mutate more than once in the whole tree. A main open problem regarding the model is finding generalizations that retain the computational tractability of the original model but are more flexible in modeling biological data when the infinite site assumption is violated because of e.g. back mutations. A special case of back mutations that has been considered in the study of the evolution of protein domains (where a domain is acquired and then lost) is persistency, that is the fact that a character is allowed to return back to the ancestral state. In this model characters can be gained and lost at most once. In this paper we consider the computational problem of explaining binary data by the Persistent Perfect Phylogeny model (referred as PPP) and for this purpose we investigate the problem of reconstructing an evolution where some constraints are imposed on the paths of the tree. RESULTS: We define a natural generalization of the PPP problem obtained by requiring that for some pairs (character, species), neither the species nor any of its ancestors can have the character. In other words, some characters cannot be persistent for some species. This new problem is called Constrained PPP (CPPP). Based on a graph formulation of the CPPP problem, we are able to provide a polynomial time solution for the CPPP problem for matrices whose conflict graph has no edges. Using this result, we develop a parameterized algorithm for solving the CPPP problem where the parameter is the number of characters. CONCLUSIONS: A preliminary experimental analysis shows that the constrained persistent perfect phylogeny model allows to explain efficiently data that do not conform with the classical perfect phylogeny model.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Algoritmos
10.
Eur J Clin Pharmacol ; 70(9): 1129-37, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24951915

RESUMO

PURPOSE: Osteoporosis is a chronic disease of the bone, whose incidence increases progressively with aging. The main consequences of osteoporosis are fragility fractures, which have considerable medical, social, and economic implications. Adequate treatment of osteoporosis must be considered as a compelling public health intervention. Bisphosphonates (BPs) represent the most significant advance in this field in the past decade, and they are widely used in the treatment of osteoporosis. However, evidence for their effectiveness is limited to secondary prevention, whereas their effect in primary prevention is uncertain and needs further investigation. METHODS: Using administrative data collected in the "Biphosphonates Efficacy-Safety Tradeoff" (BEST) study, a nested case-control study was conducted by including 56,058 participants, aged 55 years who were started on oral BPs from 2003 to 2005. Cases were the 1,710 participants who were hospitalized for osteoporotic fractures until 2007. Up to 20 controls were randomly selected for each case. Conditional logistic regression model was used to estimate odds ratio of fracture associated with categories of treatment duration. RESULTS: Compared with participants assuming BPs for less than 1 year, those who remained on therapy for at least 2 years had a 21% (95% confidence interval (CI) 7 to 33%) fracture risk reduction. CONCLUSION: This study provides evidence that BPs, dispensed for primary prevention of osteoporotic fractures, are associated with a reduced risk of osteoporotic fractures after at least 2 years of treatment.


Assuntos
Conservadores da Densidade Óssea/uso terapêutico , Difosfonatos/uso terapêutico , Fraturas por Osteoporose/prevenção & controle , Estudos de Casos e Controles , Humanos , Itália/epidemiologia , Pessoa de Meia-Idade , Razão de Chances , Fraturas por Osteoporose/epidemiologia , Prevenção Primária , Resultado do Tratamento
11.
Pharmacoepidemiol Drug Saf ; 23(8): 859-67, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24911392

RESUMO

PURPOSE: Different strategies applicable to control for confounding by indication in observational studies were compared in a large population-based study regarding the effect of bisphosphonates (BPs) for secondary prevention of fractures. METHODS: The cohort was drawn from healthcare utilization databases of 13 Italian territorial units. Patients aged 55 years or more who were hospitalized for fracture during 2003-2005 entered into the cohort. A nested case-control design was used to compare BPs use in cohort members who did (cases) and who did not experience (controls) a new fracture until 2007 (outcome). Three designs were employed: conventional-matching (D1 ), propensity score-matching (D2 ), and user-only (D3 ) designs. They differed for (i) cohort composition, restricted to patients who received BPs straight after cohort entry (D3 ); (ii) using propensity score for case-control matching (D2 ); and (iii) compared groups of BPs users versus no users (D1 and D2 ) and long-term versus short-term users (D3 ). RESULTS: Bisphosphonate users had odds ratios (95% confidence interval) of 1.20 (1.01 to 1.44) and 0.95 (0.74 to 1.24) by applying D1 and D2 designs, respectively. Statistical evidence that long-term BPs use protects the outcome onset with respect to short-term use was observed for user-only design (D3 ) being the corresponding odds ratio (95% confidence interval) 0.64 (0.44 to 0.93). CONCLUSIONS: User-only design yielded closer results to those seen in RCTs. This approach is one possible strategy to account for confounding by indication.


Assuntos
Bases de Dados Factuais/estatística & dados numéricos , Difosfonatos/uso terapêutico , Fraturas Ósseas/prevenção & controle , Estudos Observacionais como Assunto/métodos , Idoso , Idoso de 80 Anos ou mais , Conservadores da Densidade Óssea/administração & dosagem , Conservadores da Densidade Óssea/uso terapêutico , Estudos de Casos e Controles , Fatores de Confusão Epidemiológicos , Difosfonatos/administração & dosagem , Feminino , Hospitalização , Humanos , Itália , Masculino , Pessoa de Meia-Idade , Pontuação de Propensão , Projetos de Pesquisa , Estudos Retrospectivos , Prevenção Secundária/métodos , Fatores de Tempo
12.
BMC Bioinformatics ; 13 Suppl 5: S2, 2012 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-22537006

RESUMO

BACKGROUND: A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. RESULTS: We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output.The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. CONCLUSIONS: PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.


Assuntos
Algoritmos , Processamento Alternativo , Etiquetas de Sequências Expressas , Animais , Éxons , Genômica , Humanos , Íntrons , Alinhamento de Sequência , Software
13.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-36576129

RESUMO

BACKGROUND: Since the beginning of the coronavirus disease 2019 pandemic, there has been an explosion of sequencing of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, making it the most widely sequenced virus in the history. Several databases and tools have been created to keep track of genome sequences and variants of the virus; most notably, the GISAID platform hosts millions of complete genome sequences, and it is continuously expanding every day. A challenging task is the development of fast and accurate tools that are able to distinguish between the different SARS-CoV-2 variants and assign them to a clade. RESULTS: In this article, we leverage the frequency chaos game representation (FCGR) and convolutional neural networks (CNNs) to develop an original method that learns how to classify genome sequences that we implement into CouGaR-g, a tool for the clade assignment problem on SARS-CoV-2 sequences. On a testing subset of the GISAID, CouGaR-g achieved an $96.29\%$ overall accuracy, while a similar tool, Covidex, obtained a $77,12\%$ overall accuracy. As far as we know, our method is the first using deep learning and FCGR for intraspecies classification. Furthermore, by using some feature importance methods, CouGaR-g allows to identify k-mers that match SARS-CoV-2 marker variants. CONCLUSIONS: By combining FCGR and CNNs, we develop a method that achieves a better accuracy than Covidex (which is based on random forest) for clade assignment of SARS-CoV-2 genome sequences, also thanks to our training on a much larger dataset, with comparable running times. Our method implemented in CouGaR-g is able to detect k-mers that capture relevant biological information that distinguishes the clades, known as marker variants. AVAILABILITY: The trained models can be tested online providing a FASTA file (with 1 or multiple sequences) at https://huggingface.co/spaces/BIASLab/sars-cov-2-classification-fcgr. CouGaR-g is also available at https://github.com/AlgoLab/CouGaR-g under the GPL.


Assuntos
COVID-19 , Aprendizado Profundo , Puma , Animais , SARS-CoV-2/genética , Puma/genética , Genoma Viral
14.
BMC Bioinformatics ; 12: 394, 2011 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-21985453

RESUMO

BACKGROUND: Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint) to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups. RESULTS: Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups. CONCLUSIONS: Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.


Assuntos
Algoritmos , Bactérias/classificação , Técnicas de Tipagem Bacteriana/métodos , Sondas de DNA/genética , Bactérias/genética , Análise por Conglomerados , Análise de Sequência de DNA
15.
IEEE J Biomed Health Inform ; 25(11): 4068-4078, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34003758

RESUMO

Single cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes it difficult, sometimes infeasible using current approaches and tools. One possible solution is to reduce the size of an SCS instance - usually represented as a matrix of presence, absence, and uncertainty of the mutations found in the different sequenced cells - and to infer the tree from this reduced-size instance. In this work, we present a new clustering procedure aimed at clustering such categorical vector, or matrix data - here representing SCS instances, called celluloid. We show that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method. We demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice. Our approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license, as well as on the Python Package Index (PyPI) at https://pypi.org/project/celluloid-clust/.


Assuntos
Algoritmos , Neoplasias , Análise por Conglomerados , Humanos , Mutação/genética , Neoplasias/genética , Filogenia , Software
16.
J Comput Biol ; 26(9): 948-961, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31140836

RESUMO

Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência/métodos
17.
Artigo em Inglês | MEDLINE | ID: mdl-17975265

RESUMO

In this paper, we investigate the computational and approximation complexity of the Exemplar Longest Common Subsequence of a set of sequences (ELCS problem), a generalization of the Longest Common Subsequence problem, where the input sequences are over the union of two disjoint sets of symbols, a set of mandatory symbols and a set of optional symbols. We show that different versions of the problem are APX-hard even for instances with two sequences. Moreover, we show that the related problem of determining the existence of a feasible solution of the Exemplar Longest Common Subsequence of two sequences is NP-hard. On the positive side, we first present an efficient algorithm for the ELCS problem over instances of two sequences where each mandatory symbol can appear in total at most three times in the sequences. Furthermore, we present two fixed-parameter algorithms for the ELCS problem over instances of two sequences where the parameter is the number of mandatory symbols.


Assuntos
Biologia Computacional/métodos , Algoritmos , Computadores , Interpretação Estatística de Dados , Modelos Estatísticos , Modelos Teóricos , Análise de Sequência de DNA , Software
18.
Methods Mol Biol ; 1269: 173-88, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25577379

RESUMO

Alternative Splicing (AS) is the molecular phenomenon whereby multiple transcripts are produced from the same gene locus. As a consequence, it is responsible for the expansion of eukaryotic transcriptomes. Aberrant AS is involved in the onset and progression of several human diseases. Therefore, the characterization of exon-intron structure of a gene and the detection of corresponding transcript isoforms is an extremely relevant biological task. Nonetheless, the computational prediction of AS events and the repertoire of alternative transcripts is yet a challenging issue. Hereafter we introduce PIntron, a software package to predict the exon-intron structure and the full-length isoforms of a gene given a genomic region and a set of transcripts (ESTs and/or mRNAs). The software is open source and available at http://pintron.algolab.eu. PIntron has been designed for (and extensively tested on) a standard workstation without requiring dedicated expensive hardware. It easily manages large genomic regions and more than 20,000 ESTs, achieving good accuracy as shown in an experimental evaluation performed on 112 well-annotated genes selected from the ENCODE human regions used as training set in the EGASP competition.


Assuntos
Processamento Alternativo/genética , Software , Transcriptoma/genética
19.
Appl Bioinformatics ; 2(2): 117-21, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-15130828

RESUMO

In this paper we review some of the existing projects available in the bioinformatics field for facilitating the development of programs, but for which minimising the running time is not of primary importance. We point out the advantages of open source libraries for such tasks and we discuss some of the open source licenses available. Finally, we present the project ALiBio, which is aimed at facilitating the development of efficient programs in bioinformatics.


Assuntos
Algoritmos , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Bibliotecas , Linguagens de Programação , Design de Software , Software , Bases de Dados Genéticas , Internet
20.
PLoS One ; 8(12): e73159, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24348985

RESUMO

BACKGROUND: Oral bisphosphonates (BPs) are the primary agents for the treatment of osteoporosis. Although BPs are generally well tolerated, serious gastrointestinal adverse events have been observed. AIM: To assess the risk of severe upper gastrointestinal complications (UGIC) among BP users by means of a large study based on a network of Italian healthcare utilization databases. METHODS: A nested case-control study was carried out by including 110,220 patients aged 45 years or older who, from 2003 until 2005, were treated with oral BPs. Cases were the 862 patients who experienced the outcome (hospitalization for UGIC) until 2007. Up to 20 controls were randomly selected for each case. Conditional logistic regression model was used to estimate odds ratio (OR) associated with current use of BPs after adjusting for several covariates. A set of sensitivity analyses was performed in order to account for sources of systematic uncertainty. RESULTS: The adjusted OR for current use of BPs with respect to past use was 0.94 (95% CI 0.81 to 1.08). There was no evidence that this risk changed either with BP type and regimen, or concurrent use of other drugs or previous hospitalizations. CONCLUSIONS: No evidence was found that current use of BPs increases the risk of severe upper gastrointestinal complications compared to past use.


Assuntos
Conservadores da Densidade Óssea/administração & dosagem , Conservadores da Densidade Óssea/efeitos adversos , Difosfonatos/administração & dosagem , Difosfonatos/efeitos adversos , Gastroenteropatias/induzido quimicamente , Administração Oral , Idoso , Idoso de 80 Anos ou mais , Conservadores da Densidade Óssea/uso terapêutico , Estudos de Casos e Controles , Difosfonatos/uso terapêutico , Feminino , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Razão de Chances , Osteoporose/tratamento farmacológico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA