Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
NPJ Digit Med ; 4(1): 146, 2021 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-34625656

RESUMO

The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We propose an AI-augmented forecast modeling framework that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases, and hospitalizations during the following 4 weeks. We present an international, prospective evaluation of our models' performance across all states and counties in the USA and prefectures in Japan. Nationally, incident mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths during prospective deployment remained consistently <8% (US) and <29% (Japan), while cumulative MAPE remained <2% (US) and <10% (Japan). We show that our models perform well even during periods of considerable change in population behavior, and are robust to demographic differences across different geographic locations. We further demonstrate that our framework provides meaningful explanatory insights with the models accurately adapting to local and national policy interventions. Our framework enables counterfactual simulations, which indicate continuing Non-Pharmaceutical Interventions alongside vaccinations is essential for faster recovery from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions.

2.
Bioinformatics ; 28(9): 1276-7, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22419785

RESUMO

SUMMARY: Existing SAM visualization tools like 'samtools tview' (Li et al., 2009) are limited to a small region of the genome, and tools like Tablet (Milne et al., 2010) are limited to a relatively small number of reads and may fail outright on large datasets. We need to visualize complex ChIP-Seq and RNA-Seq features such as polarity as well as coverage across whole 3 Gbp genomes such as Human. We have addressed these problems in a lightweight visualization system called SAMSCOPE accelerated by OpenGL. The extensive pre-processing and fast OpenGL interface of SAMSCOPE provides instantaneous and intuitive browsing of complex data at all levels of detail across multiple experiments. AVAILABILITY AND IMPLEMENTATION: The SAMSCOPE software, implemented in C++ for Linux, with source code, binary packages and documentation are freely available from http://samscope.dna.bio.keio.ac.jp.


Assuntos
Gráficos por Computador , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Bacillus subtilis/genética , Genoma , Humanos , Linguagens de Programação
3.
Bioinformatics ; 28(5): 745-6, 2012 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-22257668

RESUMO

UNLABELLED: Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e. amino acid sequences of proteins and structure formulas of chemical compounds. In this article, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by >1000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY: The COPICAT server is available at http://copicat.dna.bio.keio.ac.jp. All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively.


Assuntos
Bases de Dados Factuais , Ligantes , Proteínas/metabolismo , Software , Máquina de Vetores de Suporte , Ligação Proteica , Proteínas/química
4.
PLoS One ; 5(9): e12651, 2010 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-20885980

RESUMO

BACKGROUND: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. METHODOLOGY/PRINCIPAL FINDINGS: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. CONCLUSIONS/SIGNIFICANCE: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available under GPL at http://murasaki.sourceforge.net.


Assuntos
Algoritmos , Sequência Conservada , Genoma , Alinhamento de Sequência/métodos , Animais , Bactérias/química , Bactérias/genética , Bovinos , Cães , Humanos , Mamíferos/genética , Camundongos , Ratos
5.
Genome Res ; 20(9): 1219-28, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20534883

RESUMO

The centromere is essential for faithful chromosome segregation by providing the site for kinetochore assembly. Although the role of the centromere is conserved throughout evolution, the DNA sequences associated with centromere regions are highly divergent among species and it remains to be determined how centromere DNA directs kinetochore formation. Despite the active use of chicken DT40 cells in studies of chromosome segregation, the sequence of the chicken centromere was unclear. Here, we performed a comprehensive analysis of chicken centromere DNA which revealed unique features of chicken centromeres compared with previously studied vertebrates. Centromere DNA sequences from the chicken macrochromosomes, with the exception of chromosome 5, contain chromosome-specific homogenous tandem repetitive arrays that span several hundred kilobases. In contrast, the centromeres of chromosomes 5, 27, and Z do not contain tandem repetitive sequences and span non-tandem-repetitive sequences of only approximately 30 kb. To test the function of these centromere sequences, we conditionally removed the centromere from the Z chromosome using genetic engineering and have shown that that the non-tandem-repeat sequence of chromosome Z is a functional centromere.


Assuntos
Centrômero/genética , Galinhas/genética , Cromossomos/genética , Sequências Repetitivas de Ácido Nucleico , Sequências de Repetição em Tandem , Animais , Sequência de Bases , DNA/química , Hibridização in Situ Fluorescente , Dados de Sequência Molecular , Mapeamento Físico do Cromossomo
6.
BMC Genomics ; 11: 243, 2010 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-20398357

RESUMO

BACKGROUND: Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. RESULTS: We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. CONCLUSIONS: The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.


Assuntos
Bacillus subtilis/classificação , Bacillus subtilis/genética , Genoma Bacteriano , Análise de Sequência de DNA/métodos , Alimentos de Soja/microbiologia
7.
Bioinformatics ; 25(7): 853-60, 2009 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-19188192

RESUMO

MOTIVATION: The accurate detection of orthologous segments (also referred to as syntenic segments) plays a key role in comparative genomics, as it is useful for inferring genome rearrangement scenarios and computing whole-genome alignments. Although a number of algorithms for detecting orthologous segments have been proposed, none of them contain a framework for optimizing their parameter values. METHODS: In the present study, we propose an algorithm, named OSfinder (Orthologous Segment finder), which uses a novel scoring scheme based on stochastic models. OSfinder takes as input the positions of short homologous regions (also referred to as anchors) and explicitly discriminates orthologous anchors from non-orthologous anchors by using Markov chain models which represent respective geometric distributions of lengths of orthologous and non-orthologous anchors. Such stochastic modeling makes it possible to optimize parameter values by maximizing the likelihood of the input dataset, and to automate the setting of the optimal parameter values. RESULTS: We validated the accuracies of orthology-mapping algorithms on the basis of their consistency with the orthology annotation of genes. Our evaluation tests using mammalian and bacterial genomes demonstrated that OSfinder shows higher accuracy than previous algorithms. AVAILABILITY: The OSfinder software was implemented as a C++ program. The software is freely available at http://osfinder.dna.bio.keio.ac.jp under the GNU General Public License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma , Sintenia , Genômica , Alinhamento de Sequência , Software
8.
J Bioinform Comput Biol ; 5(5): 1103-22, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17933013

RESUMO

Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.


Assuntos
Conformação de Ácido Nucleico , RNA/química , RNA/genética , Análise de Sequência de RNA/estatística & dados numéricos , Sequência de Bases , Mapeamento Cromossômico/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , RNA de Transferência/química , RNA de Transferência/genética , RNA Viral/química , RNA Viral/genética , Processos Estocásticos , Tymovirus/genética
9.
Nihon Hansenbyo Gakkai Zasshi ; 76(3): 251-6, 2007 Sep.
Artigo em Japonês | MEDLINE | ID: mdl-17877037

RESUMO

As the number of whole genome sequences available continues to increase rapidly, the raw scale of the sequence data being used in analysis is the first hurdle for comparative genome analysis. When performing whole genome alignments, large-scale rearrangements make it necessary to first find out roughly which short well-conserved segments correspond to what other segments (termed anchors). Successful results have been achieved by adapting tools like BLAT and BLASTZ on a problem-to-problem basis, but the work required to perform a single alignment is considerable. Recently, new programs such as Mauve and Pattern-Hunter can handle slightly larger inputs, but the memory/time requirements for sequences like Human and Chimp X chromosomes are prohibitive for most computational environments. Our novel algorithm, which we have implemented in a program called Murasaki (available at http://murasaki.dna.bio.keio.ac.jp), makes it possible to identify anchors of multiple large sequences on the scale of several hundred megabases (e.g. three mammal chromosomes) in a matter of minutes. We also demonstrate an application of Murasaki to the comparative analysis of multiple mycobacteria genomes.


Assuntos
Genoma Bacteriano/genética , Genômica/métodos , Mycobacterium/genética , Animais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA