Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 24(1): 139, 2023 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37031189

RESUMEN

BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS: This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS: The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.


Asunto(s)
Algoritmos , Análisis por Micromatrices , Neoplasias , Humanos , Perfilación de la Expresión Génica/métodos , Técnicas Genéticas , Análisis por Micromatrices/métodos , Neoplasias/clasificación , Neoplasias/genética , Probabilidad
2.
J Comput Biol ; 30(2): 176-188, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36374238

RESUMEN

To promote the use of personal genome information in medicine, it is important to analyze the relationship between diseases and the human genomes. Therefore, statistical analysis using genomic data is often conducted, but there is a privacy concern with respect to releasing the statistics as they are. Existing methods to address this problem using the concept of differential privacy cannot provide accurate outputs under strong privacy guarantees, making them less practical. In this study, for the first time, we investigate the application of a compressive mechanism to genomic statistical data and propose two approaches. The first is to apply the normal compressive mechanism to the statistics vector along with an algorithm to determine the number of nonzero entries in a sparse representation. The second is to alter the mechanism based on the data, aiming to release significant single nucleotide polymorphisms with a high probability. In this algorithm, we apply the compressive mechanism with the input as a sparse vector for significant data and the Laplace mechanism for nonsignificant data. By using the Haar wavelet transform for the compressive mechanism, we can determine the number of nonzero elements and the amount of noise. In addition, we give theoretical guarantees that our proposed methods achieve ϵ-differential privacy. We evaluated our methods in terms of accuracy and rank error compared with the Laplace and exponential mechanisms. The results show that our second method in particular can guarantee high privacy assurance as well as utility.


Asunto(s)
Compresión de Datos , Privacidad , Humanos , Análisis de Ondículas , Genómica , Algoritmos
3.
Pac Symp Biocomput ; 27: 85-96, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34890139

RESUMEN

To achieve the provision of personalized medicine, it is very important to investigate the relationship between diseases and human genomes. For this purpose, large-scale genetic studies such as genome-wide association studies are often conducted, but there is a risk of identifying individuals if the statistics are released as they are. In this study, we propose new efficient differentially private methods for a transmission disequilibrium test, which is a family-based association test. Existing methods are computationally intensive and take a long time even for a small cohort. Moreover, for approximation methods, sensitivity of the obtained values is not guaranteed. We present an exact algorithm with a time complexity of 𝒪(nm) for a dataset containing n families and m single nucleotide polymorphisms (SNPs). We also propose an approximation algorithm that is faster than the exact one and prove that the obtained scores' sensitivity is 1. From our experimental results, we demonstrate that our exact algorithm is 10, 000 times faster than existing methods for a small cohort with 5, 000 SNPs. The results also indicate that the proposed method is the first in the world that can be applied to a large cohort, such as those with 106 SNPs. In addition, we examine a suitable dataset to apply our approximation algorithm. Supplementary materials are available at https://github.com/ay0408/DP-trio-TDT.


Asunto(s)
Biología Computacional , Estudio de Asociación del Genoma Completo , Algoritmos , Genoma Humano , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple
4.
Bioinform Adv ; 1(1): vbab004, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-36700105

RESUMEN

Motivation: Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in the χ2 test with a 3 × 2 contingency table, but they do not cover all the tests used in association studies. In addition, existing methods for releasing differentially private P-values are not practical. Results: In this work, we propose methods for releasing statistics in the χ2 test, the Fisher's exact test and the Cochran-Armitage's trend test while preserving both personal privacy and utility. Our methods for releasing P-values are the first to achieve practicality under the concept of differential privacy by considering their base 10 logarithms. We make theoretical guarantees by showing the sensitivity of the above statistics. From our experimental results, we evaluate the utility of the proposed methods and show appropriate thresholds with high accuracy for using the private statistics in actual tests. Availability and implementation: A python implementation of our experiments is available at https://github.com/ay0408/DP-statistics-GWAS. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

5.
BMC Bioinformatics ; 21(Suppl 3): 136, 2020 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-32321433

RESUMEN

BACKGROUND: Nanopore sequencing is a rapidly developing third-generation sequencing technology, which can generate long nucleotide reads of molecules within a portable device in real-time. Through detecting the change of ion currency signals during a DNA/RNA fragment's pass through a nanopore, genotypes are determined. Currently, the accuracy of nanopore basecalling has a higher error rate than the basecalling of short-read sequencing. Through utilizing deep neural networks, the-state-of-the art nanopore basecallers achieve basecalling accuracy in a range from 85% to 95%. RESULT: In this work, we proposed a novel basecalling approach from a perspective of instance segmentation. Different from previous approaches of doing typical sequence labeling, we formulated the basecalling problem as a multi-label segmentation task. Meanwhile, we proposed a refined U-net model which we call UR-net that can model sequential dependencies for a one-dimensional segmentation task. The experiment results show that the proposed basecaller URnano achieves competitive results on the in-species data, compared to the recently proposed CTC-featured basecallers. CONCLUSION: Our results show that formulating the basecalling problem as a one-dimensional segmentation task is a promising approach, which does basecalling and segmentation jointly.


Asunto(s)
Secuenciación de Nanoporos/métodos , ADN/genética , Redes Neurales de la Computación , ARN/genética
6.
PLoS One ; 12(4): e0176530, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28445522

RESUMEN

Genome-wide scans for positive selection have become important for genomic medicine, and many studies aim to find genomic regions affected by positive selection that are associated with risk allele variations among populations. Most such studies are designed to detect recent positive selection. However, we hypothesize that ancient positive selection is also important for adaptation to pathogens, and has affected current immune-mediated common diseases. Based on this hypothesis, we developed a novel linkage disequilibrium-based pipeline, which aims to detect regions associated with ancient positive selection across populations from single nucleotide polymorphism (SNP) data. By applying this pipeline to the genotypes in the International HapMap project database, we show that genes in the detected regions are enriched in pathways related to the immune system and infectious diseases. The detected regions also contain SNPs reported to be associated with cancers and metabolic diseases, obesity-related traits, type 2 diabetes, and allergic sensitization. These SNPs were further mapped to biological pathways to determine the associations between phenotypes and molecular functions. Assessments of candidate regions to identify functions associated with variations in incidence rates of these diseases are needed in the future.


Asunto(s)
Genoma Humano , Estudio de Asociación del Genoma Completo , Bases de Datos Genéticas , Genética de Población , Genotipo , Proyecto Mapa de Haplotipos , Haplotipos , Humanos , Desequilibrio de Ligamiento , Enfermedades Metabólicas/genética , Enfermedades Metabólicas/patología , Método de Montecarlo , Familia de Multigenes , Neoplasias/genética , Neoplasias/patología , Enfermedades Neurodegenerativas/genética , Enfermedades Neurodegenerativas/patología , Fenotipo , Polimorfismo de Nucleótido Simple
7.
Sci Rep ; 6: 26011, 2016 05 24.
Artículo en Inglés | MEDLINE | ID: mdl-27217144

RESUMEN

Germline mutations in the tumor suppressor gene APC are associated with familial adenomatous polyposis (FAP). Here we applied whole-genome sequencing (WGS) to the DNA of a sporadic FAP patient in which we did not find any pathological APC mutations by direct sequencing. WGS identified a promoter deletion of approximately 10 kb encompassing promoter 1B and exon1B of APC. Additional allele-specific expression analysis by deep cDNA sequencing revealed that the deletion reduced the expression of the mutated APC allele to as low as 11.2% in the total APC transcripts, suggesting that the residual mutant transcripts were driven by other promoter(s). Furthermore, cap analysis of gene expression (CAGE) demonstrated that the deleted promoter 1B region is responsible for the great majority of APC transcription in many tissues except the brain. The deletion decreased the transcripts of APC-1B to 39-45% in the patient compared to the healthy controls, but it did not decrease those of APC-1A. Different deletions including promoter 1B have been reported in FAP patients. Taken together, our results strengthen the evidence that analysis of structural variations in promoter 1B should be considered for the FAP patients whose pathological mutations are not identified by conventional direct sequencing.


Asunto(s)
Proteína de la Poliposis Adenomatosa del Colon/genética , Poliposis Adenomatosa del Colon/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas Supresoras de Tumor/genética , Secuenciación Completa del Genoma/métodos , Adulto , Regulación de la Expresión Génica , Mutación de Línea Germinal/genética , Humanos , Masculino , Especificidad de Órganos/genética , Linaje , Regiones Promotoras Genéticas , Isoformas de Proteínas/genética , Eliminación de Secuencia/genética
9.
Nat Genet ; 48(5): 500-9, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27064257

RESUMEN

Liver cancer, which is most often associated with virus infection, is prevalent worldwide, and its underlying etiology and genomic structure are heterogeneous. Here we provide a whole-genome landscape of somatic alterations in 300 liver cancers from Japanese individuals. Our comprehensive analysis identified point mutations, structural variations (STVs), and virus integrations, in noncoding and coding regions. We discovered mutational signatures related to liver carcinogenesis and recurrently mutated coding and noncoding regions, such as long intergenic noncoding RNA genes (NEAT1 and MALAT1), promoters, CTCF-binding sites, and regulatory regions. STV analysis found a significant association with replication timing and identified known (CDKN2A, CCND1, APC, and TERT) and new (ASH1L, NCOR1, and MACROD2) cancer-related genes that were recurrently affected by STVs, leading to altered expression. These results emphasize the value of whole-genome sequencing analysis in discovering cancer driver mutations and understanding comprehensive molecular profiles of liver cancer, especially with regard to STVs and noncoding mutations.


Asunto(s)
Genoma Humano , Neoplasias Hepáticas/genética , Mutación , Análisis Mutacional de ADN , ADN de Neoplasias , Estructuras Genéticas , Humanos , Proteínas de Neoplasias/genética , Pronóstico , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ADN , Integración Viral
11.
Artículo en Inglés | MEDLINE | ID: mdl-26357315

RESUMEN

In genome assembly graphs, motifs such as tips, bubbles, and cross links are studied in order to find sequencing errors and to understand the nature of the genome. Superbubble, a complex generalization of bubbles, was recently proposed as an important subgraph class for analyzing assembly graphs. At present, a quadratic time algorithm is known. This paper gives an O(m log m)-time algorithm to solve this problem for a graph with m edges.


Asunto(s)
Algoritmos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Humanos
12.
J Hum Genet ; 60(5): 227-31, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25716913

RESUMEN

Familial adenomatous polyposis (FAP) of the colon is characterized by multiple polyps in the intestine and extra-colonic manifestations. Most FAP cases are caused by a germline mutation in the tumor-suppressor gene APC, but some cases of adenomatous polyposis result from germline mutations in MUTYH, POLD1 or POLE. Although sequence analysis of APC by the Sanger method is routinely performed for genetic testing, there remain cases whose mutations are not detected by the analysis. Next-generation sequencing has enabled us to analyze the comprehensive human genome, improving the chance of identifying disease causative variants. In this study, we conducted whole-genome sequencing of a sporadic FAP patient in which we did not find any pathogenic APC mutations by the conventional Sanger sequencing. Whole-genome sequencing and subsequent deep sequencing identified a mosaic mutation of c.3175G>T, p.E1059X in ~12% of his peripheral leukocytes. Additional deep sequencing of his buccal mucosa, hair follicles, non-cancerous mucosa of the stomach and colon disclosed that these tissues harbored the APC mutation at different frequencies. Our data implied that genetic analysis by next-generation sequencing is an effective strategy to identify genetic mosaicism in hereditary diseases.


Asunto(s)
Proteína de la Poliposis Adenomatosa del Colon/genética , Poliposis Adenomatosa del Colon/genética , Mosaicismo , Adulto , Secuencia de Bases , Análisis Mutacional de ADN , Frecuencia de los Genes , Mutación de Línea Germinal , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino
13.
Nat Commun ; 6: 6120, 2015 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-25636086

RESUMEN

Intrahepatic cholangiocarcinoma and combined hepatocellular cholangiocarcinoma show varying degrees of biliary epithelial differentiation, which can be defined as liver cancer displaying biliary phenotype (LCB). LCB is second in the incidence for liver cancers with and without chronic hepatitis background and more aggressive than hepatocellular carcinoma (HCC). To gain insight into its molecular alterations, we performed whole-genome sequencing analysis on 30 LCBs. Here we show, the genome-wide substitution patterns of LCBs developed in chronic hepatitis livers overlapped with those of 60 HCCs, whereas those of hepatitis-negative LCBs diverged. The subsequent validation study on 68 LCBs identified recurrent mutations in TERT promoter, chromatin regulators (BAP1, PBRM1 and ARID2), a synapse organization gene (PCLO), IDH genes and KRAS. The frequencies of KRAS and IDHs mutations, which are associated with poor disease-free survival, were significantly higher in hepatitis-negative LCBs. This study reveals the strong impact of chronic hepatitis on the mutational landscape in liver cancer and the genetic diversity among LCBs.


Asunto(s)
Neoplasias Hepáticas/genética , Anciano , Anciano de 80 o más Años , Neoplasias de los Conductos Biliares/genética , Conductos Biliares Intrahepáticos , Carcinoma Hepatocelular/genética , Colangiocarcinoma/genética , Proteínas del Citoesqueleto/genética , Proteínas de Unión al ADN , Femenino , Hepatitis/genética , Hepatitis/fisiopatología , Humanos , Masculino , Persona de Mediana Edad , Mutación/genética , Neuropéptidos/genética , Proteínas Nucleares/genética , Regiones Promotoras Genéticas/genética , Proteínas Proto-Oncogénicas/genética , Proteínas Proto-Oncogénicas p21(ras) , Telomerasa/genética , Factores de Transcripción/genética , Proteínas Supresoras de Tumor/genética , Ubiquitina Tiolesterasa/genética , Proteínas ras/genética
14.
Hum Genome Var ; 2: 15011, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-27081525

RESUMEN

We present here a case of attenuated familial adenomatous polyposis (AFAP) with a family history of desmoids and thyroid tumors. This patient had no colonic polyps but did have multiple desmoids. Genetic analysis identified a 4-bp deletion in codon 2644 (c.7932_7935delTTAT: p.Tyr2645LysfsX14) of the adenomatous polyposis coli (APC) gene. In cases with limited numbers of colonic polyps and desmoids, AFAP may be caused by a mutation in the 3' region of APC.

15.
PLoS One ; 9(12): e114263, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25526364

RESUMEN

Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV)-related hepatocellular carcinomas (HCCs) and their matched controls. Comparison of whole genome sequence (WGS) and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3), and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.


Asunto(s)
Carcinoma Hepatocelular/genética , Evolución Clonal , Genoma Humano , Neoplasias Hepáticas/genética , Mutación , Transcriptoma , Carcinoma Hepatocelular/metabolismo , Estudios de Casos y Controles , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Hepáticas/metabolismo , Oncogenes/genética
17.
J Nippon Med Sch ; 81(3): 179-85, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24998966

RESUMEN

The patient, a 56-year-old woman, was found during routine checkup to have a disorder of hepatic function. Abdominal ultrasonography showed an ill-defined hypoechoic mass in the head and body of the pancreas; however, no blood-flow signal was observed within the tumor on Doppler ultrasonography. Abdominal computed tomography showed a low-density area in the arterial and portal venous phases. The lesion was visualized as an area of low signal intensity on both T1- and T2-weighted magnetic resonance images, whereas fluorodeoxyglucose positron emission tomography showed fluorodeoxyglucose accumulation in the tumor. Although a preoperative diagnosis was difficult to make, a rapid cytologic examination revealed evidence of a pancreatic endocrine tumor, and subtotal stomach-preserving pancreaticoduodenectomy with portal vein resection was performed. Histopathological examination showed tumor cell nests scattered in abundant fibrotic tissue; the tumor cells had proliferated in a cord-like fashion and showed immunostaining for chromogranin A. Staining for fibroblast activation protein α was seen in the fibroblastic cells contained within the fibrous stroma surrounding the tumor cell nests, whereas both the fibroblastic cells in the tumor and those in the stroma showed a high rate of staining for thrombospondin. We presume that tumor-associated fibroblasts were involved in the fibrosis of the tumor stroma.


Asunto(s)
Diagnóstico por Imagen/métodos , Fibroblastos/patología , Páncreas/patología , Neoplasias Pancreáticas/diagnóstico , Biomarcadores de Tumor/metabolismo , Antígeno CD56/metabolismo , Cromogranina A/metabolismo , Endopeptidasas , Femenino , Fibroblastos/metabolismo , Fibrosis , Gelatinasas/metabolismo , Humanos , Inmunohistoquímica , Proteínas de la Membrana/metabolismo , Persona de Mediana Edad , Páncreas/metabolismo , Páncreas/cirugía , Neoplasias Pancreáticas/metabolismo , Neoplasias Pancreáticas/cirugía , Pancreaticoduodenectomía/métodos , Fosfopiruvato Hidratasa/metabolismo , Serina Endopeptidasas/metabolismo , Trombospondinas/metabolismo
18.
Surg Today ; 44(6): 1104-8, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23880964

RESUMEN

PURPOSE: Elevation of the serum total bilirubin (STB) level not stemming from hepatic dysfunction or biliary obstruction may be seen in cases of acute appendicitis. This paper deals with the clinical significance of such elevations. METHODS: Data from 410 appendectomized patients classified into two groups (a high preoperative STB group and a normal preoperative STB group) were analyzed to reveal the significance of preoperative hyperbilirubinemia. We also examined whether the preoperative STB level might serve as a risk factor for gangrenous appendicitis by a multivariate analysis. RESULTS: Gangrenous appendicitis was more common in the high preoperative STB group (p < 0.001). The multivariate analysis revealed that an elevated preoperative STB level (odds ratio 1.7919) was a risk factor for gangrenous appendicitis. CONCLUSION: In patients with an elevated preoperative STB level, it is very likely that the inflammation is severe and that the disease has progressed to a severe condition histopathologically; therefore, meticulous attention should be paid to the selection of the surgical procedure, as well as to the postoperative clinical course.


Asunto(s)
Apendicitis/diagnóstico , Apéndice/patología , Bilirrubina/sangre , Enfermedad Aguda , Adolescente , Adulto , Apendicectomía , Apendicitis/cirugía , Biomarcadores/sangre , Progresión de la Enfermedad , Femenino , Gangrena/diagnóstico , Humanos , Hiperbilirrubinemia , Masculino , Persona de Mediana Edad , Análisis Multivariante , Valor Predictivo de las Pruebas , Periodo Preoperatorio , Factores de Riesgo , Índice de Severidad de la Enfermedad , Adulto Joven
19.
J Infect Chemother ; 19(1): 118-27, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22971935

RESUMEN

Enterobacteriaceae, carrying the New Delhi metallo-ß-lactamase-1 (NDM-1) gene (bla (NDM-1)), have emerged and posed a threat since 2006. In Japan, bla (NDM-1)-carrying Escherichia coli was first described in 2010. In this study, we characterized NDM-1-positive Klebsiella pneumoniae strain 419 in Japan, which was isolated from the urine of a 90-year-old Japanese patient who had never been to the Indian subcontinent. K. pneumoniae 419 belonged to ST42. It possessed a surface capsule (with untypeable capsular PCR types) and was resistant to serum killing. K. pneumoniae 419 cells were occasionally flagellated or piliated and autoaggregated. K. pneumoniae 419 was resistant to ß-lactams (including carbapenems), aminoglycosides, and fluoroquinolones, and was susceptible to imipenem (or biapenem), aztreonam, polymixin B, and colistin. It possessed at least eight plasmids; of those, a 74-kb plasmid (pKPJ1) of the replicon FIIA carried bla (NDM-1) and was conjugally transferred to E. coli strains, with a 71-kb transferable azithromycin-resistant (mphA (+)) plasmid of the replicon F (pKPJ2), as a large (145-kb) plasmid (pKPJF100) through a transposition event. In addition to bla (NDM-1), pKPJ1 carried arr-2, pKPJ2 carried mphA, and pKPJF100 carried both. They were negative for the 16S rRNA methylase gene, e.g., which is frequently associated with bla (NDM-1). The data demonstrate that K. pneumoniae 419 possessed virulence- and fitness-associated surface structures, was resistant to serum killing, and possessed a unique (or rare) genetic background in terms of ST type and bla (NDM-1)-carrying plasmid.


Asunto(s)
Klebsiella pneumoniae/genética , Klebsiella pneumoniae/ultraestructura , Plásmidos/genética , beta-Lactamasas/biosíntesis , Adulto , Antibacterianos/farmacología , Azitromicina/farmacología , Actividad Bactericida de la Sangre , Conjugación Genética , Farmacorresistencia Bacteriana/genética , Farmacorresistencia Bacteriana Múltiple/genética , Humanos , Japón/epidemiología , Infecciones por Klebsiella/epidemiología , Infecciones por Klebsiella/microbiología , Klebsiella pneumoniae/aislamiento & purificación , Pruebas de Sensibilidad Microbiana , Microscopía Electrónica , Infecciones Urinarias/microbiología , Orina/microbiología , Resistencia betalactámica , beta-Lactamasas/genética
20.
BMC Res Notes ; 5: 243, 2012 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-22591859

RESUMEN

BACKGROUND: Genome assembly is considered to be a challenging problem in computational biology, and has been studied extensively by many researchers. It is extremely difficult to build a general assembler that is able to reconstruct the original sequence instead of many contigs. However, we believe that creating specific assemblers, for solving specific cases, will be much more fruitful than creating general assemblers. FINDINGS: In this paper, we present Arapan-S, a whole-genome assembly program dedicated to handling small genomes. It provides only one contig (along with the reverse complement of this contig) in many cases. Although genomes consist of a number of segments, the implemented algorithm can detect all the segments, as we demonstrate for Influenza Virus A. The Arapan-S program is based on the de Bruijn graph. We have implemented a very sophisticated and fast method to reconstruct the original sequence and neglect erroneous k-mers. The method explores the graph by using neither the shortest nor the longest path, but rather a specific and reliable path based on the coverage level or k-mers' lengths. Arapan-S uses short reads, and it was tested on raw data downloaded from the NCBI Trace Archive. CONCLUSIONS: Our findings show that the accuracy of the assembly was very high; the result was checked against the European Bioinformatics Institute (EBI) database using the NCBI BLAST Sequence Similarity Search. The identity and the genome coverage was more than 99%. We also compared the efficiency of Arapan-S with other well-known assemblers. In dealing with small genomes, the accuracy of Arapan-S is significantly higher than the accuracy of other assemblers. The assembly process is very fast and requires only a few seconds.Arapan-S is available for free to the public. The binary files for Arapan-S are available through http://sourceforge.net/projects/dnascissor/files/.


Asunto(s)
Biología Computacional , ADN Viral/análisis , Bases de Datos Genéticas , Genoma Viral , Virus de la Influenza A/genética , Programas Informáticos , Algoritmos , Mapeo Contig , Reproducibilidad de los Resultados , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...