Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Algorithms Mol Biol ; 19(1): 15, 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38600518

RESUMO

FM-indexes are crucial data structures in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [1] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. In 2022, Deng et al. [2] proposed parsing genomic data by induced suffix sorting, and showed that the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38, and is consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it seems our method accelerates the performance of count over all state-of-the-art methods with a moderate increase in the memory. The source code for PFP - FM is available at https://github.com/AaronHong1024/afm .

2.
Res Sq ; 2023 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-37961504

RESUMO

FM-indexes are a crucial data structure in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [1] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [2] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing-which takes parameters that let us tune the average length of the phrases-instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38, and is consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it seems our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. The source code for PFP-FM is available at https://github.com/marco-oliva/afm.

3.
Int J Emerg Med ; 16(1): 40, 2023 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-37353768

RESUMO

BACKGROUND: While emergency medicine (ER)-based emergency care is prevalent in many countries, in Japan, the "department-specific emergency care model" and the "emergency center model" are mainstream. We hypothesized that many secondary emergency medical institutions in Japan have inadequate systems. Using a questionnaire, we investigated the status of and problems in the emergency medical services system in secondary emergency medical institutions in Japan. Until date, there has not been an exhaustive survey of emergency facilities on a countrywide scale. The main objective of this study was to investigate problems in the Japanese emergency medical services system and thereby improve optimal care for emergency patients. RESULTS: A nationwide questionnaire survey involving 4063 facilities (all government-approved emergency medical facilities certified by prefectural governors) in Japan was conducted. Of the facilities that responded, all secondary emergency facilities were included in the analysis. Responses from 1289 facilities without a tertiary emergency medical care center were analyzed. Among them, 61% (792/1289) had ≤ 199 beds, and 8% were emergency department specialty training program core facilities. Moreover, 42% had an annual patient acceptance number of ≤ 500, 19% did not calculate the number of acceptances, 29% had an acceptance rate of ≥ 81%, and 25% had an acceptance rate of 61-80%. Pregnant women (63%) and children (56%) were the major types of patients that affected the acceptance rate. Factors affecting facilities with a response rate of 81% or higher were "hospitals designated for residency training" and "facilities making some efforts to improve the quality of emergency care and the emergency medical system" (logistic analysis, P < .001). CONCLUSION: Relevant authorities and core regional facilities should consider and implement specific measures for regions and hospitals with a shortage of emergency medicine specialists and physicians (e.g., development of ER-based emergency medicine and provision of education). This study may lead to further improvement in the optimal care of emergency patients through the nationwide establishment of the proposed measures as well as through grouping and integrating the structures and systems in emergency and other medical facilities.

4.
Mol Biol Evol ; 40(3)2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-36790822

RESUMO

Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows-Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of "big data" genomics: a combinatorial core coupled with statistical inference in closed form.


Assuntos
Genética Populacional , Metagenômica , Genômica , Genoma , Haplótipos
5.
Algorithms Mol Biol ; 15: 2, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32055252

RESUMO

Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Advances in bioinformatics and computational biology: 11th Brazilian symposium on bioinformatics, BSB 2018, Niterói, Brazil, October 30 - November 1, 2018, Proceedings, 2018. 10.1007/978-3-030-01722-4_3) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.

6.
Int J Hematol ; 89(1): 24-33, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19050837

RESUMO

Microarray analysis has made it feasible to carry out extensive gene expression profiling in a single assay. Various hematopoietic stem cell (HSC) populations have been subjected to microarray analyses and their profiles of gene expression have been reported. However, this approach is not suitable to identify novel transcripts or for profiling of genes with low expression levels. To obtain a detailed gene expression profile of CD34(-)c-Kit(+)Sca-1(+)lineage marker-negative (Lin(-)) (CD34(-)KSL) HSCs, we constructed a CD34(-)KSL cDNA library, performed high-throughput sequencing, and compared the generated profile with that of another HSC fraction, side population (SP) Lin(-) (SP Lin(-)) cells. Sequencing of the 5'-termini of about 9,500 cDNAs from each HSC library identified 1,424 and 2,078 different genes from the CD34(-)KSL and SP Lin(-) libraries, respectively. To exclude ubiquitously expressed genes including housekeeping genes, digital subtraction was successfully performed against EST databases of other organs, leaving 25 HSC-specific genes including five novel genes. Among 4,450 transcripts from the CD34(-)KSL cDNA library that showed no homology to the presumable protein-coding genes, 29 were identified as strong candidates for mRNA-like non-coding RNAs by in silico analyses. Our cyclopedic approaches may contribute to understanding of novel molecular aspects of HSC function.


Assuntos
Perfilação da Expressão Gênica/métodos , Células-Tronco Hematopoéticas , Biblioteca Gênica , Genômica/métodos , Humanos , Análise de Sequência de DNA
7.
J Exp Zool A Comp Exp Biol ; 305(9): 787-98, 2006 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16902950

RESUMO

Fish endocrinologists are commonly motivated to pursue their research driven by their own interests in these aquatic animals. However, the data obtained in fish studies not only satisfy their own interests but often contribute more generally to the studies of other vertebrates, including mammals. The life of fishes is characterized by the aquatic habitat, which demands many physiological adjustments distinct from the terrestrial life. Among them, body fluid regulation is of particular importance as the body fluids are exposed to media of varying salinities only across the thin respiratory epithelia of the gills. Endocrine systems play pivotal roles in the homeostatic control of body fluid balance. Judging from the habitat-dependent control mechanisms, some osmoregulatory hormones of fish should have undergone functional and molecular evolution during the ecological transition to the terrestrial life. In fact, water-regulating hormones such as vasopressin are essential for survival on the land, whereas ion-regulating hormones such as natriuretic peptides, guanylins and adrenomedullins are diversified and exhibit more critical functions in aquatic species. In this short review, we introduce some examples illustrating how comparative fish studies contribute to general endocrinology by taking advantage of such differences between fishes and tetrapods. In a functional context, fish studies often afford a deeper understanding of the essential actions of a hormone across vertebrate taxa. Using the natriuretic peptide family as an example, we suggest that more functional studies on fishes will bring similar rewards of understanding. At the molecular level, recent establishment of genome databases in fishes and mammals brings clues to the evolutionary history of hormone molecules via a comparative genomic approach. Because of the functional and molecular diversification of ion-regulating hormones in fishes, this approach sometimes leads to the discovery of new hormones in tetrapods as exemplified by adrenomedullin 2.


Assuntos
Peixes/fisiologia , Hormônios/fisiologia , Equilíbrio Hidroeletrolítico/fisiologia , Adrenomedulina , Sequência de Aminoácidos , Animais , Evolução Molecular , Peixes/genética , Água Doce , Hormônios/genética , Humanos , Dados de Sequência Molecular , Natriuréticos/genética , Natriuréticos/fisiologia , Peptídeos/genética , Peptídeos/fisiologia , Água do Mar , Alinhamento de Sequência
8.
J Bioinform Comput Biol ; 3(6): 1295-313, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16374908

RESUMO

Since microarray gene expression data do not contain sufficient information for estimating accurate gene networks, other biological information has been considered to improve the estimated networks. Recent studies have revealed that highly conserved proteins that exhibit similar expression patterns in different organisms, have almost the same function in each organism. Such conserved proteins are also known to play similar roles in terms of the regulation of genes. Therefore, this evolutionary information can be used to refine regulatory relationships among genes, which are estimated from gene expression data. We propose a statistical method for estimating gene networks from gene expression data by utilizing evolutionarily conserved relationships between genes. Our method simultaneously estimates two gene networks of two distinct organisms, with a Bayesian network model utilizing the evolutionary information so that gene expression data of one organism helps to estimate the gene network of the other. We show the effectiveness of the method through the analysis on Saccharomyces cerevisiae and Homo sapiens cell cycle gene expression data. Our method was successful in estimating gene networks that capture many known relationships as well as several unknown relationships which are likely to be novel. Supplementary information is available at http://bonsai.ims.u-tokyo.ac.jp/~tamada/bayesnet/.


Assuntos
Evolução Molecular , Perfilação da Expressão Gênica/métodos , Genes cdc/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Saccharomyces cerevisiae/fisiologia , Transdução de Sinais/fisiologia , Teorema de Bayes , Simulação por Computador , Regulação da Expressão Gênica/fisiologia , Humanos , Modelos Estatísticos , Proteoma/genética
9.
J Bioinform Comput Biol ; 2(2): 273-88, 2004 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15297982

RESUMO

We present an efficient algorithm for detecting putative regulatory elements in the upstream DNA sequences of genes, using gene expression information obtained from microarray experiments. Based on a generalized suffix tree, our algorithm looks for motif patterns whose appearance in the upstream region is most correlated with the expression levels of the genes. We are able to find the optimal pattern, in time linear in the total length of the upstream sequences. We implement and apply our algorithm to publicly available microarray gene expression data, and show that our method is able to discover biologically significant motifs, including various motifs which have been reported previously using the same data set. We further discuss applications for which the efficiency of the method is essential, as well as possible extensions to our algorithm.


Assuntos
Algoritmos , Motivos de Aminoácidos/genética , Perfilação da Expressão Gênica/métodos , Genes Reguladores/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Homologia de Sequência do Ácido Nucleico , Estatística como Assunto
10.
FEBS Lett ; 556(1-3): 53-8, 2004 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-14706825

RESUMO

We have identified cDNA encoding a new member of the adrenomedullin (AM) family, AM2, for the first time in mammals (mouse, rat and human). The predicted precursor carried mature AM2 in the C-terminus, which had an intramolecular ring formed by an S-S bond and a possibly amidated C-terminus. Phylogenetic analyses clustered AM2 and AM into two distinct but closely related groups. Similarity of exon-intron structure and synteny of neighboring genes showed that mammalian AM2 is an ortholog of pufferfish AM2 and a paralog of mammalian AM. AM2 mRNA was expressed in submaxillary gland, kidney, stomach, ovary, lymphoid tissues and pancreas of mice, but not in adrenal and testis. Intravenous injection of synthetic mature AM2 decreased arterial pressure more potently than AM, and induced antidiuresis and antinatriuresis in mice. These results show that at least two peptides, AM and AM2, comprise an adrenomedullin family in mammals, and that AM2 may play pivotal roles in cardiovascular and body fluid regulation.


Assuntos
Peptídeos/genética , Peptídeos/farmacologia , Vasoconstritores/farmacologia , Adrenomedulina , Sequência de Aminoácidos , Animais , Pressão Sanguínea/efeitos dos fármacos , Clonagem Molecular , DNA Complementar/genética , Feminino , Frequência Cardíaca/efeitos dos fármacos , Humanos , Masculino , Camundongos , Camundongos Endogâmicos ICR , Dados de Sequência Molecular , Peptídeos/síntese química , Filogenia , Isoformas de Proteínas , Ratos , Ratos Wistar , Homologia de Sequência de Aminoácidos , Tetraodontiformes , Urodinâmica/efeitos dos fármacos , Vasoconstritores/síntese química
11.
Artigo em Inglês | MEDLINE | ID: mdl-17051698

RESUMO

We consider the problem of finding the optimal combination of string patterns, which characterizes a given set of strings that have a numeric attribute value assigned to each string. Pattern combinations are scored based on the correlation between their occurrences in the strings and the numeric attribute values. The aim is to find the combination of patterns which is best with respect to an appropriate scoring function. We present an O(N2) time algorithm for finding the optimal pair of substring patterns combined with Boolean functions, where N is the total length of the sequences. The algorithm looks for all possible Boolean combinations of the patterns, e.g., patterns of the form p and not q, which indicates that the pattern pair is considered to occur in a given string s, if p occurs in s, AND q does NOT occur in s. An efficient implementation using suffix arrays is presented, and we further show that the algorithm can be adapted to find the best k-pattern Boolean combination in O(Nk) time. The algorithm is applied to mRNA sequence data sets of moderate size combined with their turnover rates for the purpose of finding regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing mRNA decay.


Assuntos
Biologia Computacional/métodos , DNA/genética , RNA Mensageiro/genética , Análise de Sequência de DNA/métodos , Regiões 3' não Traduzidas , Algoritmos , Sequência de Bases , Genes Fúngicos , Humanos , Modelos Estatísticos , Modelos Teóricos , Reconhecimento Automatizado de Padrão , Software
12.
Bioinformatics ; 19 Suppl 2: ii227-36, 2003 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-14534194

RESUMO

We present a statistical method for estimating gene networks and detecting promoter elements simultaneously. When estimating a network from gene expression data alone, a common problem is that the number of microarrays is limited compared to the number of variables in the network model, making accurate estimation a difficult task. Our method overcomes this problem by integrating the microarray gene expression data and the DNA sequence information into a Bayesian network model. The basic idea of our method is that, if a parent gene is a transcription factor, its children may share a consensus motif in their promoter regions of the DNA sequences. Our method detects consensus motifs based on the structure of the estimated network, then re-estimates the network using the result of the motif detection. We continue this iteration until the network becomes stable. To show the effectiveness of our method, we conducted Monte Carlo simulations and applied our method to Saccharomyces cerevisiae data as a real application.


Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Modelos Biológicos , Regiões Promotoras Genéticas/genética , Proteoma/metabolismo , Análise de Sequência de DNA/métodos , Transdução de Sinais/fisiologia , Teorema de Bayes , Simulação por Computador , Interpretação Estatística de Dados
13.
Bioinformatics ; 18(2): 298-305, 2002 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-11847077

RESUMO

MOTIVATION: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. RESULTS: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.


Assuntos
Biologia Computacional , Proteínas/genética , Proteínas/metabolismo , Sequência de Aminoácidos , Redes Neurais de Computação , Sinais Direcionadores de Proteínas/genética , Software , Frações Subcelulares/metabolismo
14.
Genome Inform ; 13: 3-11, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-14571369

RESUMO

We present a new approach to pattern discovery called string pattern regression, where we are given a data set that consists of a string attribute and an objective numerical attribute. The problem is to find the best string pattern that divides the data set in such a way that the distribution of the numerical attribute values of the set for which the pattern matches the string attribute, is most distinct, with respect to some appropriate measure, from the distribution of the numerical attribute values of the set for which the pattern does not match the string attribute. By solving this problem, we are able to discover, at the same time, a subset of the data whose objective numerical attributes are significantly different from rest of the data, as well as the splitting rule in the form of a string pattern that is conserved in the subset. Although the problem can be solved in linear time for the substring pattern class, the problem is NP-hard in the general case (i.e. more complex patterns), and we present an exact but efficient branch-and-bound algorithm which is applicable to various pattern classes. We apply our algorithm to intron sequences of human, mouse, fly, and zebrafish, and show the practicality of our approach and algorithm. We also discuss possible extensions of our algorithm, as well as promising applications, such as microarray gene expression data.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Análise de Regressão , Análise de Sequência de DNA/métodos , Animais , Simulação por Computador , Humanos , Íntrons/genética , Camundongos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...