Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-29994538

RESUMEN

The Burrows-Wheeler transform (BWT) of short-read data has unexplored potential utilities, such as for efficient and sensitive variation analysis against multiple reference genome sequences, because it does not depend on any particular reference genome sequence, unlike conventional mapping-based methods. However, since the amount of read data is generally much larger than the size of the reference sequence, computation of the BWT of reads is not easy, and this hampers development of potential applications. For the alleviation of this problem, a new method of computing the BWT of reads in parallel is proposed. The BWT, corresponding to a sorted list of suffixes of reads, is constructed incrementally by successively including longer and longer suffixes. The working data is divided into more than 10,000 "blocks" corresponding to sublists of suffixes with the same prefixes. Thousands of groups of blocks can be processed in parallel while making exclusive writes and concurrent reads into a shared memory. Reads and writes are basically sequential, and the read concurrency is limited to two. Thus, a fine-grained parallelism, referred to as prefix parallelism, is expected to work efficiently. The time complexity for processing n reads of length l is O(nl2). On actual biological DNA sequence data of about 100 Gbp with a read length of 100 bp (base pairs), a tentative implementation of the proposed method took less than an hour on a single-node computer; i.e., it was about three times faster than one of the fastest programs developed so far.


Asunto(s)
Algoritmos , Compresión de Datos/métodos , Bases de Datos Genéticas , Análisis de Secuencia de ADN/métodos , Genómica , Humanos , Factores de Tiempo
2.
Hepatol Res ; 46(12): 1247-1255, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26880049

RESUMEN

AIM: The present study investigated the effect of sarcopenia on short- and long-term surgical outcomes and identified potential prognostic factors for hepatocellular carcinoma (HCC) following hepatectomy among patients 70 years of age and older. METHODS: Patient data were retrospectively collected for 296 consecutive patients who underwent hepatectomy for HCC with curative intent. Patients were assigned to two groups according to age (younger than 70 years, and 70 years and older), and the presence of sarcopenia. The clinicopathological, surgical outcome, and long-term survival data were analyzed. RESULTS: Sarcopenia was present in 112 of 296 (37.8%) patients with HCC, and 35% of patients aged 70 years and older. Elderly patients had significantly lower serum albumin levels, prognostic nutrition index, percentage of liver cirrhosis, and histological intrahepatic metastasis compared with patients younger than 70 years. Overall survival and disease-free survival rates in patients with sarcopenia correlated with significantly poor prognosis in the group aged 70 years and older. Multivariate analysis revealed that sarcopenia was predictive of an unfavorable prognosis. CONCLUSION: This retrospective analysis revealed that sarcopenia was predictive of worse overall survival and recurrence-free survival after hepatectomy in patients 70 years of age and older with HCC.

3.
BMC Bioinformatics ; 16 Suppl 18: S5, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26678411

RESUMEN

BACKGROUND: The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data. Recently, one of the most active areas of research in sequence analysis is sensitive detection of rare genomic rearrangements from whole-genome sequencing (WGS) data of heterogeneous cancer samples. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study. RESULTS: A new method for sensitive detection of genomic rearrangements by using the BWT of reads in the following three steps is proposed: first, breakpoint regions, which contain breakpoints and are joined together by rearrangement, are predicted from the distribution of so-called discordant pairs by using a kind of the conjugate gradient method; second, reads partially matching the breakpoint regions are collected from the BWT of reads; and third, breakpoints are detected as branching points among the collected reads, and their precise positions are determined. The method was experimentally implemented, and its performance (i.e., sensitivity and specificity) was evaluated by using simulated data with known artificial rearrangements. It was applied to publicly available real biological WGS data of cancer patients, and the detection results were compared with published results. CONCLUSIONS: Serving as a lossless dictionary of reads, the BWT of short reads enables sensitive analysis of genomic rearrangements in heterogeneous cancer-genome samples when used in conjunction with breakpoint-region predictions based on a conjugate gradient method.


Asunto(s)
Algoritmos , Genómica , Bases de Datos Genéticas , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Programas Informáticos
4.
Bioinformatics ; 31(10): 1577-83, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25609790

RESUMEN

MOTIVATION: Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed. RESULTS: The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with a supplementary data, fragment depth of coverage [FDC]) using a desktop workstation in the case of human exome or transcriptome sequencing data and 20 min using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and FDC defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data were large enough.


Asunto(s)
Algoritmos , Exoma/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Humanos
5.
J Bioinform Comput Biol ; 10(4): 1250002, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-22809415

RESUMEN

Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.


Asunto(s)
Algoritmos , Secuencia de Bases , ADN/química , Biología Computacional , Genoma , Alineación de Secuencia , Análisis de Secuencia de ADN
6.
DNA Res ; 16(6): 371-83, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19880432

RESUMEN

We analyzed diversity of mRNA produced as a result of alternative splicing in order to evaluate gene function. First, we predicted the number of human genes transcribed into protein-coding mRNAs by using the sequence information of full-length cDNAs and 5'-ESTs and obtained 23 241 of such human genes. Next, using these genes, we analyzed the mRNA diversity and consequently sequenced and identified 11 769 human full-length cDNAs whose predicted open reading frames were different from other known full-length cDNAs. Especially, 30% of the cDNAs we identified contained variation in the transcription start site (TSS). Our analysis, which particularly focused on multiple variable first exons (FEVs) formed due to the alternative utilization of TSSs, led to the identification of 261 FEVs expressed in the tissue-specific manner. Quantification of the expression profiles of 13 genes by real-time PCR analysis further confirmed the tissue-specific expression of FEVs, e.g. OXR1 had specific TSS in brain and tumor tissues, and so on. Finally, based on the results of our mRNA diversity analysis, we have created the FLJ Human cDNA Database. From our result, it has been understood mechanisms that one gene produces suitable protein-coding transcripts responding to the situation and the environment.


Asunto(s)
Empalme Alternativo , ADN Complementario/genética , ADN Complementario/metabolismo , Proteínas , ARN Mensajero , Mapeo Cromosómico , Biología Computacional/métodos , Bases de Datos Genéticas , Exones , Etiquetas de Secuencia Expresada , Variación Genética , Humanos , Especificidad de Órganos , Proteínas/genética , Proteínas/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ADN , Relación Estructura-Actividad , Sitio de Iniciación de la Transcripción
7.
J Comput Biol ; 16(11): 1601-13, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19772398

RESUMEN

Abstract We have developed efficient in-practice algorithms for computing rank and select functions on a binary string, based on a novel data structure, a hierarchical binary string with hierarchical accumulatives. It efficiently stores decomposed information on partial summations over various scales of subregions of a given binary string, so that the required space overhead ratio is only about 3.5% irrespective of the string length. Values of rank and select functions are computed hierarchically in [(log(2)n)/8] iterations, where n is the string length. For example, for an unbiased random binary string of 64 G bits, each value of these functions can be computed in about a microsecond, on average, on a single 3.0-GHz CPU using 8+ GB of memory. We also present their applications to genome mapping problems for large-scale short-read DNA sequence data, especially produced by ultra-high-throughput new-generation DNA sequencers. The algorithms are applied to the binarization of the Burrows-Wheeler transform of the human genome DNA sequence. For the sake of high-speed performance, we adopted a somewhat stringent mapping condition that allows at most a single-base mismatch (either a substitution, insertion, or deletion of a single base) per query sequence. An experimentally implemented program mapped several thousands of sequences per second on a single 3.0-GHz CPU, several times faster than ELAND, a widely used mapping program with the Illumina-Solexa 1G analyser.


Asunto(s)
Mapeo Cromosómico/métodos , Biología Computacional/métodos , Genoma Humano/genética , Algoritmos , Secuencia de Bases , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Factores de Tiempo
8.
Pediatr Int ; 51(4): 502-6, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19400816

RESUMEN

BACKGROUND: While recent advances in asthma management have enabled adequate control to be frequently achieved in outpatient settings, children whose asthma remains poorly controlled despite outpatient treatment are often referred to extended-stay hospitals. The aim of the present study was to examine trends concerning extended-stay hospitalization and to evaluate the present status of this approach. METHODS: A retrospective study was conducted to assess changes in the number of admissions among 408 children with extended stays at Kamiamakusa General Hospital between 1989 and 2005. Medical and laboratory data of 236 patients admitted since 1994 were obtained from clinical records. RESULTS: The number of children with extended-stay hospitalizations since 2000 declined dramatically compared with the early 1990s, while the percentage of patients with complications of childhood asthma, such as severe atopic dermatitis, school absenteeism, and obesity, have increased significantly in the recent past. Practical benefits of extended-stay hospitalization were demonstrated by significant improvement of exercise performance and measurement of pulmonary function parameters and serum IgE concentrations by time of discharge. In addition to improvement in asthmatic symptoms, maintenance drug requirements and frequency of school absenteeism were reduced. CONCLUSIONS: The medical mission of extended-stay hospitalizations is currently limited due to the availability of improved pharmacotherapy. Some patients, however, with exceptionally severe asthma or psychological problems that interact with their medical condition still fare poorly under outpatient care and could benefit from group care. Further study is needed to identify the components of long-term programs essential to produce change.


Asunto(s)
Asma/terapia , Tiempo de Internación/estadística & datos numéricos , Adolescente , Asma/fisiopatología , Asma/prevención & control , Niño , Femenino , Humanos , Inmunoglobulina E/sangre , Japón , Tiempo de Internación/tendencias , Masculino , Pruebas de Función Respiratoria , Estudios Retrospectivos , Resultado del Tratamiento
9.
Genome Inform ; 23(1): 60-71, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20180262

RESUMEN

We introduce a new data structure, a localized suffix array, based on which occurrence information is dynamically represented as the combination of global positional information and local lexicographic order information in text search applications. For the search of a pair of words within a given distance, many candidate positions that share a coarse-grained global position can be compactly represented in term of local lexicographic orders as in the conventional suffix array, and they can be simultaneously examined for violation of the distance constraint at the coarse-grained resolution. Trade-off between the positional and lexicographical information is progressively shifted towards finer positional resolution, and the distance constraint is reexamined accordingly. Thus the paired search can be efficiently performed even if there are a large number of occurrences for each word. The localized suffix array itself is in fact a reordering of bits inside the conventional suffix array, and their memory requirements are essentially the same. We demonstrate an application to genome mapping problems for paired-end short reads generated by new-generation DNA sequencers. When paired reads are highly repetitive, it is time-consuming to naïvely calculate, sort, and compare all of the coordinates. For a human genome re-sequencing data of 36 base pairs, more than 10 times speedups over the naïve method were observed in almost half of the cases where the sums of redundancies (number of individual occurrences) of paired reads were greater than 2,000.


Asunto(s)
Genoma , Algoritmos , Análisis de Secuencia de ADN
10.
Nucleic Acids Res ; 37(Database issue): D762-6, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19073703

RESUMEN

Completion of human genome sequencing has greatly accelerated functional genomic research. Full-length cDNA clones are essential experimental tools for functional analysis of human genes. In one of the projects of the New Energy and Industrial Technology Development Organization (NEDO) in Japan, the full-length human cDNA sequencing project (FLJ project), nucleotide sequences of approximately 30 000 human cDNA clones have been analyzed. The Gateway system is a versatile framework to construct a variety of expression clones for various experiments. We have constructed 33 275 human Gateway entry clones from full-length cDNAs, representing to our knowledge the largest collection in the world. Utilizing these clones with a highly efficient cell-free protein synthesis system based on wheat germ extract, we have systematically and comprehensively produced and analyzed human proteins in vitro. Sequence information for both amino acids and nucleotides of open reading frames of cDNAs cloned into Gateway entry clones and in vitro expression data using those clones can be retrieved from the Human Gene and Protein Database (HGPD, http://www.HGPD.jp). HGPD is a unique database that stores the information of a set of human Gateway entry clones and protein expression data and helps the user to search the Gateway entry clones.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/genética , Proteínas/metabolismo , Proteómica , Clonación Molecular , ADN Complementario/química , Electroforesis en Gel de Poliacrilamida , Genes , Humanos , Internet , Biosíntesis de Proteínas , Proteínas/química , Interfaz Usuario-Computador
11.
Nat Methods ; 5(12): 1011-7, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19054851

RESUMEN

Appropriate resources and expression technology necessary for human proteomics on a whole-proteome scale are being developed. We prepared a foundation for simple and efficient production of human proteins using the versatile Gateway vector system. We generated 33,275 human Gateway entry clones for protein synthesis, developed mRNA expression protocols for them and improved the wheat germ cell-free protein synthesis system. We applied this protein expression system to the in vitro expression of 13,364 human proteins and assessed their biological activity in two functional categories. Of the 75 tested phosphatases, 58 (77%) showed biological activity. Several cytokines containing disulfide bonds were produced in an active form in a nonreducing wheat germ cell-free expression system. We also manufactured protein microarrays by direct printing of unpurified in vitro-synthesized proteins and demonstrated their utility. Our 'human protein factory' infrastructure includes the resources and expression technology for in vitro proteome research.


Asunto(s)
Clonación Molecular/métodos , Genoma Humano/genética , Ingeniería de Proteínas/métodos , Proteoma/genética , Proteoma/metabolismo , Proteínas Recombinantes/metabolismo , Sistema Libre de Células , Humanos
12.
Genome Res ; 16(1): 55-65, 2006 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-16344560

RESUMEN

By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.


Asunto(s)
Islas de CpG/genética , Biblioteca de Genes , Familia de Multigenes/genética , Regiones Promotoras Genéticas/genética , Sitios de Carácter Cuantitativo/genética , Transcripción Genética/genética , Secuencia de Bases , Exones/genética , Humanos , Datos de Secuencia Molecular , Especificidad de Órganos , Transducción de Señal/genética
13.
DNA Res ; 12(2): 117-26, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16303743

RESUMEN

We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.


Asunto(s)
Biblioteca de Genes , Proteínas de la Membrana/genética , Señales de Clasificación de Proteína , Región de Flanqueo 5' , Línea Celular Tumoral , Clonación Molecular , Humanos , Oligonucleótidos/genética
14.
Kurume Med J ; 52(1-2): 53-6, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16119613

RESUMEN

Occult bacteremia with Streptococcus pneumoniae (S. pneumoniae) is sometimes experienced in general clinics, while that with Haemophilus influenzae type b (Hib) is less common and mostly develops to serious central nervous infection. Recently we encountered a patient with bacteremia due to Hib, in whom bacteremia recovered spontaneously without intravenous antibiotic therapy. A previously healthy 17-month-old girl was brought to our hospital with the complaint of high fever. Although her clinical condition did not present any of meningeal signs, the laboratory data on the first day showed prominent leukocytosis and sepsis work-up was done. Two days later (third day of illness), blood culture grew Haemophilus influenzae sensitive to ampicillin and the strain isolated from blood was identified as Hib. The febrile condition soon disappeared and bacteremia resolved with the negative result of the next blood culture. It is not clear about the precise mechanisms of this phenomenon, however, it is an extremely rare case for Hib bacteremia to resolve spontaneously.


Asunto(s)
Bacteriemia/etiología , Infecciones por Haemophilus/etiología , Haemophilus influenzae tipo b/aislamiento & purificación , Femenino , Humanos , Lactante
15.
Nat Genet ; 36(1): 40-5, 2004 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-14702039

RESUMEN

As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at approximately 58% compared with a peak at approximately 42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at approximately 42%, relatively low compared with that of protein-coding cDNAs.


Asunto(s)
ADN Complementario , Análisis de Secuencia de ADN , Cromosomas Humanos 21-22 e Y , Cromosomas Humanos Par 20 , Biología Computacional , Humanos , Sistemas de Lectura Abierta , ARN Mensajero
16.
FEMS Immunol Med Microbiol ; 34(4): 289-97, 2002 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-12443829

RESUMEN

Shiga toxin 2 (Stx2) variants have been found to exhibit not only antigenic divergence, but also differences in toxicity for tissue culture cells and animals. To clarify whether all or just a subset of Stx2 variants are important for the virulence of Shiga toxin-producing Escherichia coli, we designed PCR primers to detect and type all reported variants. We classified them into four groups according to the nucleotide sequences of the Stx2 family; for example, group 1 (G1) contains VT2vha and group 2 (G2) contains VT2d-Ount. The 120 strains of Shiga toxin-producing E. coli used in this study were isolated from humans in Japan between 1986 and 1999. Among the four variant groups, the G1 gene only was detected in 23 of the 120 clinical strains (19.2%) and all belonged to the O157 serotype. G1 is considered the most important Stx2 variant group in terms of human pathogenicity. A multiplex PCR that can detect the stx1, stx2, and G1 genes was developed as a means of rapid and easy typing to better understand the roles of the different types of Stx.


Asunto(s)
Infecciones por Escherichia coli/microbiología , Escherichia coli/clasificación , Variación Genética , Reacción en Cadena de la Polimerasa/métodos , Toxina Shiga II/clasificación , Toxina Shiga II/genética , Técnicas de Tipificación Bacteriana , Secuencia de Bases , Cartilla de ADN , Escherichia coli/genética , Escherichia coli/metabolismo , Escherichia coli O157/clasificación , Escherichia coli O157/metabolismo , Humanos , Japón , Datos de Secuencia Molecular , Subunidades de Proteína/química , Subunidades de Proteína/genética , Subunidades de Proteína/metabolismo , Alineación de Secuencia , Análisis de Secuencia de ADN , Toxina Shiga I/química , Toxina Shiga I/genética , Toxina Shiga I/metabolismo , Toxina Shiga II/química , Toxina Shiga II/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...