Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
Cell ; 158(4): 929-944, 2014 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-25109877

RESUMO

Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies.


Assuntos
Neoplasias/classificação , Neoplasias/genética , Análise por Conglomerados , Humanos , Neoplasias/patologia , Transcriptoma
2.
BMC Genomics ; 25(1): 542, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38822237

RESUMO

OBJECTIVES: Homopolymer (HP) sequencing is error-prone in next-generation sequencing (NGS) assays, and may induce false insertion/deletions and substitutions. This study aimed to evaluate the performance of dichromatic and tetrachromatic fluorogenic NGS platforms when sequencing homopolymeric regions. RESULTS: A HP-containing plasmid was constructed and diluted to serial frequencies (3%, 10%, 30%, 60%) to determine the performance of an MGISEQ-2000, MGISEQ-200, and NextSeq 2000 in HP sequencing. An evident negative correlation was observed between the detected frequencies of four nucleotide HPs and the HP length. Significantly decreased rates (P < 0.01) were found in all 8-mer HPs in all three NGS systems at all four expected frequencies, except in the NextSeq 2000 at 3%. With the application of a unique molecular identifier (UMI) pipeline, there were no differences between the detected frequencies of any HPs and the expected frequencies, except for poly-G 8-mers using the MGI 200 platform. UMIs improved the performance of all three NGS platforms in HP sequencing. CONCLUSIONS: We first constructed an HP-containing plasmid based on an EGFR gene backbone to evaluate the performance of NGS platforms when sequencing homopolymeric regions. A highly comparable performance was observed between the MGISEQ-2000 and NextSeq 2000, and introducing UMIs is a promising approach to improve the performance of NGS platforms in sequencing homopolymeric regions.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Plasmídeos/genética , Humanos , Análise de Sequência de DNA/métodos
3.
BMC Genomics ; 25(1): 227, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38429743

RESUMO

BACKGROUND: Hybridization capture-based targeted next generation sequencing (NGS) is gaining importance in routine cancer clinical practice. DNA library preparation is a fundamental step to produce high-quality sequencing data. Numerous unexpected, low variant allele frequency calls were observed in libraries using sonication fragmentation and enzymatic fragmentation. In this study, we investigated the characteristics of the artifact reads induced by sonication and enzymatic fragmentation. We also developed a bioinformatic algorithm to filter these sequencing errors. RESULTS: We used pairwise comparisons of somatic single nucleotide variants (SNVs) and insertions and deletions (indels) of the same tumor DNA samples prepared using both ultrasonic and enzymatic fragmentation protocols. Our analysis revealed that the number of artifact variants was significantly greater in the samples generated using enzymatic fragmentation than using sonication. Most of the artifacts derived from the sonication-treated libraries were chimeric artifact reads containing both cis- and trans-inverted repeat sequences of the genomic DNA. In contrast, chimeric artifact reads of endonuclease-treated libraries contained palindromic sequences with mismatched bases. Based on these distinctive features, we proposed a mechanistic hypothesis model, PDSM (pairing of partial single strands derived from a similar molecule), by which these sequencing errors derive from ultrasonication and enzymatic fragmentation library preparation. We developed a bioinformatic algorithm to generate a custom mutation "blacklist" in the BED region to reduce errors in downstream analyses. CONCLUSIONS: We first proposed a mechanistic hypothesis model (PDSM) of sequencing errors caused by specific structures of inverted repeat sequences and palindromic sequences in the natural genome. This new hypothesis predicts the existence of chimeric reads that could not be explained by previous models, and provides a new direction for further improving NGS analysis accuracy. A bioinformatic algorithm, ArtifactsFinder, was developed and used to reduce the sequencing errors in libraries produced using sonication and enzymatic fragmentation.


Assuntos
Artefatos , Genoma Humano , Humanos , Biblioteca Gênica , Análise de Sequência de DNA/métodos , DNA de Neoplasias , Sequenciamento de Nucleotídeos em Larga Escala/métodos
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36058206

RESUMO

Updated and expert-quality knowledge bases are fundamental to biomedical research. A knowledge base established with human participation and subject to multiple inspections is needed to support clinical decision making, especially in the growing field of precision oncology. The number of original publications in this field has risen dramatically with the advances in technology and the evolution of in-depth research. Consequently, the issue of how to gather and mine these articles accurately and efficiently now requires close consideration. In this study, we present OncoPubMiner (https://oncopubminer.chosenmedinfo.com), a free and powerful system that combines text mining, data structure customisation, publication search with online reading and project-centred and team-based data collection to form a one-stop 'keyword in-knowledge out' oncology publication mining platform. The platform was constructed by integrating all open-access abstracts from PubMed and full-text articles from PubMed Central, and it is updated daily. OncoPubMiner makes obtaining precision oncology knowledge from scientific articles straightforward and will assist researchers in efficiently developing structured knowledge base systems and bring us closer to achieving precision oncology goals.


Assuntos
Neoplasias , Mineração de Dados , Humanos , Oncologia , Medicina de Precisão , PubMed , Publicações
5.
Methods ; 216: 39-50, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37330158

RESUMO

Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.


Assuntos
Compressão de Dados , Software , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Algoritmos , Análise de Sequência de DNA
6.
J Biomed Inform ; 152: 104625, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38479675

RESUMO

Cross-sample contamination is one of the major issues in next-generation sequencing (NGS)-based molecular assays. This type of contamination, even at very low levels, can significantly impact the results of an analysis, especially in the detection of somatic alterations in tumor samples. Several contamination identification tools have been developed and implemented as a crucial quality-control step in the routine NGS bioinformatic pipeline. However, no study has been published to comprehensively and systematically investigate, evaluate, and compare these computational methods in the cancer NGS analysis. In this study, we comprehensively investigated nine state-of-the-art computational methods for detecting cross-sample contamination. To explore their application in cancer NGS analysis, we further compared the performance of five representative tools by qualitative and quantitative analyses using in silico and simulated experimental NGS data. The results showed that Conpair achieved the best performance for identifying contamination and predicting the level of contamination in solid tumors NGS analysis. Moreover, based on Conpair, we developed a Python script, Contamination Source Predictor (ConSPr), to identify the source of contamination. We anticipate that this comprehensive survey and the proposed tool for predicting the source of contamination will assist researchers in selecting appropriate cross-contamination detection tools in cancer NGS analysis and inspire the development of computational methods for detecting sample cross-contamination and identifying its source in the future.


Assuntos
Biologia Computacional , Neoplasias , Humanos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias/diagnóstico , Neoplasias/genética , Controle de Qualidade
7.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32510555

RESUMO

Next-generation sequencing (NGS) technology has revolutionised human cancer research, particularly via detection of genomic variants with its ultra-high-throughput sequencing and increasing affordability. However, the inundation of rich cancer genomics data has resulted in significant challenges in its exploration and translation into biological insights. One of the difficulties in cancer genome sequencing is software selection. Currently, multiple tools are widely used to process NGS data in four stages: raw sequence data pre-processing and quality control (QC), sequence alignment, variant calling and annotation and visualisation. However, the differences between these NGS tools, including their installation, merits, drawbacks and application, have not been fully appreciated. Therefore, a systematic review of the functionality and performance of NGS tools is required to provide cancer researchers with guidance on software and strategy selection. Another challenge is the multidimensional QC of sequencing data because QC can not only report varied sequence data characteristics but also reveal deviations in diverse features and is essential for a meaningful and successful study. However, monitoring of QC metrics in specific steps including alignment and variant calling is neglected in certain pipelines such as the 'Best Practices Workflows' in GATK. In this review, we investigated the most widely used software for the fundamental analysis and QC of cancer genome sequencing data and provided instructions for selecting the most appropriate software and pipelines to ensure precise and efficient conclusions. We further discussed the prospects and new research directions for cancer genomics.


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Controle de Qualidade , Biologia Computacional/métodos , Humanos , Anotação de Sequência Molecular , Software
8.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33461213

RESUMO

MOTIVATION: Microsatellite instability (MSI) is a promising biomarker for cancer prognosis and chemosensitivity. Techniques are rapidly evolving for the detection of MSI from tumor-normal paired or tumor-only sequencing data. However, tumor tissues are often insufficient, unavailable, or otherwise difficult to procure. Increasing clinical evidence indicates the enormous potential of plasma circulating cell-free DNA (cfNDA) technology as a noninvasive MSI detection approach. RESULTS: We developed MSIsensor-ct, a bioinformatics tool based on a machine learning protocol, dedicated to detecting MSI status using cfDNA sequencing data with a potential stable MSIscore threshold of 20%. Evaluation of MSIsensor-ct on independent testing datasets with various levels of circulating tumor DNA (ctDNA) and sequencing depth showed 100% accuracy within the limit of detection (LOD) of 0.05% ctDNA content. MSIsensor-ct requires only BAM files as input, rendering it user-friendly and readily integrated into next generation sequencing (NGS) analysis pipelines. AVAILABILITY: MSIsensor-ct is freely available at https://github.com/niu-lab/MSIsensor-ct. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
DNA Tumoral Circulante/genética , Aprendizado de Máquina , Instabilidade de Microssatélites , Neoplasias/genética , Software , DNA Tumoral Circulante/sangue , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Limite de Detecção , Repetições de Microssatélites , Neoplasias/sangue , Neoplasias/diagnóstico , Neoplasias/patologia , Análise de Sequência de DNA
9.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33851200

RESUMO

Internal tandem duplication (ITD) of FMS-like tyrosine kinase 3 (FLT3-ITD) constitutes an independent indicator of poor prognosis in acute myeloid leukaemia (AML). AML with FLT3-ITD usually presents with poor treatment outcomes, high recurrence rate and short overall survival. Currently, polymerase chain reaction and capillary electrophoresis are widely adopted for the clinical detection of FLT3-ITD, whereas the length and mutation frequency of ITD are evaluated using fragment analysis. With the development of sequencing technology and the high incidence of FLT3-ITD mutations, a multitude of bioinformatics tools and pipelines have been developed to detect FLT3-ITD using next-generation sequencing data. However, systematic comparison and evaluation of the methods or software have not been performed. In this study, we provided a comprehensive review of the principles, functionality and limitations of the existing methods for detecting FLT3-ITD. We further compared the qualitative and quantitative detection capabilities of six representative tools using simulated and biological data. Our results will provide practical guidance for researchers and clinicians to select the appropriate FLT3-ITD detection tools and highlight the direction of future developments in this field. Availability: A Docker image with several programs pre-installed is available at https://github.com/niu-lab/docker-flt3-itd to facilitate the application of FLT3-ITD detection tools.


Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Duplicação Gênica , Leucemia Mieloide/genética , Sequências de Repetição em Tandem/genética , Tirosina Quinase 3 Semelhante a fms/genética , Doença Aguda , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Leucemia Mieloide/diagnóstico , Mutação
10.
Dis Colon Rectum ; 66(11): 1481-1491, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37643197

RESUMO

BACKGROUND: Stage II/III disease is the most predominant form of colorectal cancer, accounting for approximately 70% of cases. Furthermore, approximately 15% to 20% of patients with stage II/III disease have deficient mismatch repair or microsatellite instability-high colorectal cancer. However, there are no identified significant prognostic biomarkers for this disease. OBJECTIVE: To identify prognostic markers for patients with deficient mismatch repair/microsatellite instability-high colon cancer stage II/III. DESIGN: Retrospective study design. SETTING: The study was conducted at a high-volume colorectal center, the Cancer Hospital, Chinese Academy of Medical Sciences. PATIENTS: Patients diagnosed with stage II/III deficient mismatch repair/microsatellite instability-high colon cancer who underwent curative surgery at the Cancer Hospital at the Chinese Academy of Medical Sciences between July 2015 and November 2018 were included. MAIN OUTCOME MEASURES: The primary outcome measure was the influence of differentially mutated genes on progression-free survival. RESULTS: The retrospective deficient mismatch repair/microsatellite instability-high cohort involved 32 patients and The Cancer Genome Atlas-microsatellite instability-high cohort involved 45 patients. Patients with deficient mismatch repair/microsatellite instability-high colon cancer had higher mutational frequencies of MKI67 , TPR , and TCHH than patients with microsatellite stable colon cancer. MKI67 , TPR , TCHH , and gene combination were significantly correlated with prognosis. The biomarker mutation-type colon cancer group had a higher risk of recurrence or death than did the wild-type group. Moreover, biomarker mutation-type tumors had more mutations in the DNA damage repair pathway and tumor mutational burden than did biomarker wild-type tumors. LIMITATIONS: This study was limited by its retrospective nature. CONCLUSIONS: MKI67 , TPR , and TCHH may serve as potential diagnostic and prognostic biomarkers for deficient mismatch repair/microsatellite instability-high colon cancer stage II/III. IDENTIFICACIN DE MUTACIONES MKI, TPR Y TCHH COMO BIOMARCADORES PRONSTICOS PARA PACIENTES CON CNCER DE COLON EN ETAPA II/III CON DEFICIENCIA EN LA REPARACION DE ERRORES DE EMPAREJAMIENTO: ANTECEDENTES:La enfermedad en estadio II/III es la forma más predominante de cáncer colorrectal y representa aproximadamente el 70% de los casos. Además, aproximadamente entre el 15% y el 20% de los pacientes con enfermedad en estadio II/III tienen reparación deficiente de errores de emparejamiento o inestabilidad de microsatélital alta. Sin embargo, no se han identificado biomarcadores pronósticos significativos para esta enfermedad.OBJETIVO:Este estudio tuvo como objetivo identificar marcadores pronósticos para pacientes con cáncer de colon con reparación deficiente de errores de emparejamiento/inestabilidad microsatelital alta en estadio II/III.DISEÑO:Diseño de estudio retrospectivo.ESCENARIO:El estudio se realizó en un centro colorrectal de alto volumen, el Hospital del Cáncer de la Academia China de Ciencias Médicas.PACIENTES:Pacientes diagnosticados con cáncer de colon en estadio II/III con reparación deficiente de errores de emparejamiento o inestabilidad de microsatélital alta que se sometieron a cirugía curativa en el Hospital del Cáncer de la Academia China de Ciencias Médicas entre julio de 2015 y noviembre de 2018.MEDIDAS DE RESULTADO PRINCIPALES:La medida de resultado primaria fue la influencia de los genes con mutaciones diferenciales en la supervivencia libre de progresión.RESULTADOS:La cohorte retrospectiva de reparación deficiente de errores de emparejamiento o inestabilidad de microsatélital alta y la cohorte de inestabilidad microsatelital alta del Atlas del Genoma del Cáncer involucraron a 32 y 45 pacientes, respectivamente. Los pacientes con de reparación deficiente de errores de emparejamiento/inestabilidad microsatélital alta tuvieron frecuencias mutacionales más altas de MKI67 , TPR y TCHH que los pacientes estables de microsatélites. MKI67 , TPR , TCHH , y la combinación de genes se correlacionaron significativamente con el pronóstico. El grupo de cáncer de colon de tipo mutación de biomarcador tenía un mayor riesgo de recurrencia o muerte que el grupo de mutación salvaje. Además, los tumores de tipo mutación de biomarcadores tenían más mutaciones en la vía de reparación del daño del ADN y la carga mutacional del tumor que los tumores de tipo salvaje de biomarcadores.LIMITACIONES:Este estudio estuvo limitado por su naturaleza retrospectiva.CONCLUSIONES:MKI67 , TPR , y TCHH pueden servir como posibles biomarcadores de diagnóstico y pronóstico para cáncer de colon en estadio II/III con reparación deficiente de errores de emparejamiento/inestabilidad microsatélital alta. (Traducción-Dr. Jorge Silva Velazco ).


Assuntos
Neoplasias do Colo , Reparo de Erro de Pareamento de DNA , Humanos , Antígenos , Neoplasias do Colo/genética , Neoplasias do Colo/cirurgia , Reparo de Erro de Pareamento de DNA/genética , Proteínas de Filamentos Intermediários , Instabilidade de Microssatélites , Mutação , Estadiamento de Neoplasias , Prognóstico , Estudos Retrospectivos , Antígeno Ki-67/genética
11.
Bioinformatics ; 37(4): 573-574, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32790850

RESUMO

MOTIVATION: Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes. RESULTS: We present RabbitQC, an extremely fast integrated quality control tool for FASTQ files, which can take full advantage of modern hardware. It includes a variety of operations and supports different sequencing technologies (Illumina, Oxford Nanopore and PacBio). RabbitQC achieves speedups between one and two orders-of-magnitude compared to other state-of-the-art tools. AVAILABILITY AND IMPLEMENTATION: C++ sources and binaries are available at https://github.com/ZekunYin/RabbitQC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Nanoporos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Análise de Sequência de DNA
12.
Bioinformatics ; 36(12): 3944-3946, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32315389

RESUMO

MOTIVATION: HotSpot3D is a widely used software for identifying mutation hotspots on the 3D structures of proteins. To further assist users, we developed a new HotSpot3D web server to make this software more versatile, convenient and interactive. RESULTS: The HotSpot3D web server performs data pre-processing, clustering, visualization and log-viewing on one stop. Users can interactively explore each cluster and easily re-visualize the mutational clusters within browsers. We also provide a database that allows users to search and visualize proximal mutations from 33 cancers in the Cancer Genome Atlas. AVAILABILITY AND IMPLEMENTATION: http://niulab.scgrid.cn/HotSpot3D/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Computadores , Bases de Dados Factuais , Internet , Mutação , Proteínas/genética
13.
Genome Res ; 27(8): 1450-1459, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28522612

RESUMO

Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional "download and analyze" paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.


Assuntos
Computação em Nuvem , Variação Genética , Genoma Humano , Genômica/métodos , Neoplasias/genética , Software , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
14.
Nature ; 502(7471): 333-339, 2013 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-24132290

RESUMO

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/ß-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.


Assuntos
Carcinogênese/genética , Mutação/genética , Neoplasias/classificação , Neoplasias/genética , Ciclo Celular/genética , Células Clonais/metabolismo , Células Clonais/patologia , Estudos de Coortes , Reparo do DNA/genética , Humanos , Mutação INDEL/genética , Proteínas Quinases Ativadas por Mitógeno/genética , Modelos Genéticos , Neoplasias/metabolismo , Neoplasias/patologia , Oncogenes/genética , Fosfatidilinositol 3-Quinases/genética , Mutação Puntual/genética , Receptores Proteína Tirosina Quinases/metabolismo , Análise de Sobrevida , Fatores de Tempo
15.
Bioinformatics ; 33(7): 1090-1092, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28065898

RESUMO

Summary: With the advent of next-generation sequencing, traditional bioinformatics tools are challenged by massive raw metagenomic datasets. One of the bottlenecks of metagenomic studies is lack of large-scale and cloud computing suitable data analysis tools. In this paper, we proposed a Spark based tool, called MetaSpark, to recruit metagenomic reads to reference genomes. MetaSpark benefits from the distributed data set (RDD) of Spark, which makes it able to cache data set in memory across cluster nodes and scale well with the datasets. Compared with previous metagenomics recruitment tools, MetaSpark recruited significantly more reads than many programs such as SOAP2, BWA and LAST and increased recruited reads by ∼4% compared with FR-HIT when there were 1 million reads and 0.75 GB references. Different test cases demonstrate MetaSpark's scalability and overall high performance. Availability: https://github.com/zhouweiyg/metaspark. Contact: bniu@sccas.cn , jingluo@ynu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Software , Algoritmos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Metagenômica/normas , Padrões de Referência
16.
Bioinformatics ; 30(7): 1015-6, 2014 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-24371154

RESUMO

MOTIVATION: Microsatellite instability (MSI) is an important indicator of larger genome instability and has been linked to many genetic diseases, including Lynch syndrome. MSI status is also an independent prognostic factor for favorable survival in multiple cancer types, such as colorectal and endometrial. It also informs the choice of chemotherapeutic agents. However, the current PCR-electrophoresis-based detection procedure is laborious and time-consuming, often requiring visual inspection to categorize samples. We developed MSIsensor, a C++ program for automatically detecting somatic microsatellite changes. It computes length distributions of microsatellites per site in paired tumor and normal sequence data, subsequently using these to statistically compare observed distributions in both samples. Comprehensive testing indicates MSIsensor is an efficient and effective tool for deriving MSI status from standard tumor-normal paired sequence data. AVAILABILITY AND IMPLEMENTATION: https://github.com/ding-lab/msisensor


Assuntos
Instabilidade de Microssatélites , Análise de Sequência de DNA/métodos , Automação Laboratorial , Genoma Humano , Humanos , Neoplasias/genética , Reação em Cadeia da Polimerase , Software
17.
Brief Bioinform ; 13(6): 656-68, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22772836

RESUMO

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.


Assuntos
Algoritmos , Metagenoma , Análise por Conglomerados , Metagenômica , Análise de Sequência de DNA
18.
Bioinformatics ; 29(1): 122-3, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23044549

RESUMO

SUMMARY: Numerous metagenomics projects have produced tremendous amounts of sequencing data. Aligning these sequences to reference genomes is an essential analysis in metagenomics studies. Large-scale alignment data call for intuitive and efficient visualization tool. However, current tools such as various genome browsers are highly specialized to handle intraspecies mapping results. They are not suitable for alignment data in metagenomics, which are often interspecies alignments. We have developed a web browser-based desktop application for interactively visualizing alignment data of metagenomic sequences. This viewer is easy to use on all computer systems with modern web browsers and requires no software installation. AVAILABILITY: http://weizhongli-lab.org/mgaviewer


Assuntos
Metagenômica/métodos , Alinhamento de Sequência/métodos , Software , Gráficos por Computador , Genoma , Humanos , Internet
19.
Aging (Albany NY) ; 16(9): 8110-8141, 2024 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-38728242

RESUMO

The management of patients with advanced non-small cell lung cancer (NSCLC) presents significant challenges due to cancer cells' intricate and heterogeneous nature. Programmed cell death (PCD) pathways are crucial in diverse biological processes. Nevertheless, the prognostic significance of cell death in NSCLC remains incompletely understood. Our study aims to investigate the prognostic importance of PCD genes and their ability to precisely stratify and evaluate the survival outcomes of patients with advanced NSCLC. We employed Weighted Gene Co-expression Network Analysis (WGCNA), Least Absolute Shrinkage and Selection Operator (LASSO), univariate and multivariate Cox regression analyses for prognostic gene screening. Ultimately, we identified seven PCD-related genes to establish the PCD-related risk score for the advanced NSCLC model (PRAN), effectively stratifying overall survival (OS) in patients with advanced NSCLC. Multivariate Cox regression analysis revealed that the PRAN was the independent prognostic factor than clinical baseline factors. It was positively related to specific metabolic pathways, including hexosamine biosynthesis pathways, which play crucial roles in reprogramming cancer cell metabolism. Furthermore, drug prediction for different PRAN risk groups identified several sensitive drugs explicitly targeting the cell death pathway. Molecular docking analysis suggested the potential therapeutic efficacy of navitoclax in NSCLC, as it demonstrated strong binding with the amino acid residues of C-C motif chemokine ligand 14 (CCL14), carboxypeptidase A3 (CPA3), and C-X3-C motif chemokine receptor 1 (CX3CR1) proteins. The PRAN provides a robust personalized treatment and survival assessment tool in advanced NSCLC patients. Furthermore, identifying sensitive drugs for distinct PRAN risk groups holds promise for advancing targeted therapies in NSCLC.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/tratamento farmacológico , Prognóstico , Apoptose/genética , Regulação Neoplásica da Expressão Gênica , Masculino , Feminino , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Simulação de Acoplamento Molecular , Redes Reguladoras de Genes , Pessoa de Meia-Idade , Perfilação da Expressão Gênica
20.
Bioinformatics ; 28(23): 3150-2, 2012 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-23060610

RESUMO

SUMMARY: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. AVAILABILITY: http://cd-hit.org. CONTACT: liwz@sdsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Análise por Conglomerados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA