Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2809: 115-126, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38907894

RESUMO

Human leukocyte antigen (HLA) typing is of great importance in clinical applications such as organ transplantation, blood transfusion, disease diagnosis and treatment, and forensic analysis. In recent years, nanopore sequencing technology has emerged as a rapid and cost-effective option for HLA typing. However, due to the principles and data characteristics of nanopore sequencing, there was a scarcity of robust and generalizable bioinformatics tools for its downstream analysis, posing a significant challenge in deciphering the thousands of HLA alleles present in the human population. To address this challenge, we developed NanoHLA as a tool for high-resolution typing of HLA class I genes without error correction based on nanopore sequencing. The method integrated the concepts of HLA type coverage analysis and the data conversion techniques employed in Nano2NGS, which was characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. In validation with public nanopore sequencing datasets, NanoHLA showed an overall concordance rate of 84.34% for HLA-A, HLA-B, and HLA-C, and demonstrated superior performance in comparison to existing tools such as HLA-LA. NanoHLA provides tools and solutions for use in HLA typing related fields, and look forward to further expanding the application of nanopore sequencing technology in both research and clinical settings. The code is available at https://github.com/langjidong/NanoHLA .


Assuntos
Alelos , Teste de Histocompatibilidade , Sequenciamento por Nanoporos , Humanos , Teste de Histocompatibilidade/métodos , Sequenciamento por Nanoporos/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Antígenos de Histocompatibilidade Classe I/genética , Antígenos HLA/genética , Análise de Sequência de DNA/métodos , Genes MHC Classe I/genética
4.
Front Immunol ; 14: 1100816, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36875075

RESUMO

Background: Autism Spectrum Disorders (ASD) are defined as a group of pervasive neurodevelopmental disorders, and the heterogeneity in the symptomology and etiology of ASD has long been recognized. Altered immune function and gut microbiota have been found in ASD populations. Immune dysfunction has been hypothesized to involve in the pathophysiology of a subtype of ASD. Methods: A cohort of 105 ASD children were recruited and grouped based on IFN-γ levels derived from ex vivo stimulated γδT cells. Fecal samples were collected and analyzed with a metagenomic approach. Comparison of autistic symptoms and gut microbiota composition was made between subgroups. Enriched KEGG orthologues markers and pathogen-host interactions based on metagenome were also analyzed to reveal the differences in functional features. Results: The autistic behavioral symptoms were more severe for children in the IFN-γ-high group, especially in the body and object use, social and self-help, and expressive language performance domains. LEfSe analysis of gut microbiota revealed an overrepresentation of Selenomonadales, Negatiyicutes, Veillonellaceae and Verrucomicrobiaceae and underrepresentation of Bacteroides xylanisolvens and Bifidobacterium longum in children with higher IFN-γ level. Decreased metabolism function of carbohydrate, amino acid and lipid in gut microbiota were found in the IFN-γ-high group. Additional functional profiles analyses revealed significant differences in the abundances of genes encoding carbohydrate-active enzymes between the two groups. And enriched phenotypes related to infection and gastroenteritis and underrepresentation of one gut-brain module associated with histamine degradation were also found in the IFN-γ-High group. Results of multivariate analyses revealed relatively good separation between the two groups. Conclusions: Levels of IFN-γ derived from γδT cell could serve as one of the potential candidate biomarkers to subtype ASD individuals to reduce the heterogeneity associated with ASD and produce subgroups which are more likely to share a more similar phenotype and etiology. A better understanding of the associations among immune function, gut microbiota composition and metabolism abnormalities in ASD would facilitate the development of individualized biomedical treatment for this complex neurodevelopmental disorder.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Microbioma Gastrointestinal , Humanos , Sintomas Comportamentais , Aminoácidos
5.
Front Mol Biosci ; 10: 1093519, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36743210

RESUMO

Short tandem repeats (STRs) are widely present in the human genome. Studies have confirmed that STRs are associated with more than 30 diseases, and they have also been used in forensic identification and paternity testing. However, there are few methods for STR detection based on nanopore sequencing due to the challenges posed by the sequencing principles and the data characteristics of nanopore sequencing. We developed NanoSTR for detection of target STR loci based on the length-number-rank (LNR) information of reads. NanoSTR can be used for STR detection and genotyping based on long-read data from nanopore sequencing with improved accuracy and efficiency compared with other existing methods, such as Tandem-Genotypes and TRiCoLOR. NanoSTR showed 100% concordance with the expected genotypes using error-free simulated data, and also achieved >85% concordance using the standard samples (containing autosomal and Y-chromosomal loci) with MinION sequencing platform, respectively. NanoSTR showed high performance for detection of target STR markers. Although NanoSTR needs further optimization and development, it is useful as an analytical method for the detection of STR loci by nanopore sequencing. This method adds to the toolbox for nanopore-based STR analysis and expands the applications of nanopore sequencing in scientific research and clinical scenarios. The main code and the data are available at https://github.com/langjidong/NanoSTR.

6.
Front Microbiol ; 13: 1004664, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36312946

RESUMO

Background: Human papillomavirus (HPV) infection is the leading cause of cervical cancer. More and more studies discovered that cervical microbiota (CM) composition correlated with HPV infection and the development of cervical cancer. However, more studies need to be implemented to clarify the complex interaction between microbiota and the mechanism of disease development, especially in a specific area of China. Materials and methods: In this study, 16S rDNA sequencing was applied on 276 Thin-prep Cytologic Test (TCT) samples of patients from the Sanmenxia area. Systematical analysis of the microbiota structure, diversity, group, and functional differences between different HPV infection groups and age groups, and co-occurrence relationships of the microbiota was carried out. Results: The major microbiota compositions of all patients include Lactobacillus iners, Escherichia coli, Enterococcus faecalis, and Atopobium vaginae at species level, and Staphylococcus, Lactobacillus, Gardnerella, Bosea, Streptococcus, and Sneathia in genus level. Microbiota diversity was found significantly different between HPV-positive (Chao1 index: 98.8869, p < 0.01), unique-268 infected (infections with one of the HPV genotype 52, 56, or 58, 107.3885, p < 0.01), multi-268 infected (infections with two or more of HPV genotype 52, 56, and 58, 97.5337, p = 0.1012), other1 (94.9619, p < 0.05) groups and HPV-negative group (83.5299). Women older than 60 years old have higher microbiota diversity (108.8851, p < 0.01, n = 255) than younger women (87.0171, n = 21). The abundance of Gardnerella and Atopobium vaginae was significantly higher in the HPV-positive group than in the HPV-negative group, while Burkholderiaceae and Mycoplasma were more abundant in the unique-268 group compared to the negative group. Gamma-proteobacteria and Pseudomonas were found more abundant in older than 60 patients than younger groups. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) analysis revealed the effects on metabolism by microbiota that the metabolism of cells, proteins, and genetic information-related pathways significantly differed between HPV-negative and positive groups. In contrast, lipid metabolism, signal transduction, and cell cycle metabolism pathway significantly differed between multi-268 and negative groups. Conclusion: The HPV infection status and age of women were related to CM's diversity and function pathways. The complex CM co-occurrent relationships and their mechanism in disease development need to be further investigated.

7.
Front Genet ; 13: 1008792, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36186464

RESUMO

Nanopore sequencing technology (NST) has become a rapid and cost-effective method for the diagnosis and epidemiological surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the coronavirus disease 2019 (COVID-19) pandemic. Compared with short-read sequencing platforms (e.g., Illumina's), nanopore long-read sequencing platforms effectively shorten the time required to complete the detection process. However, due to the principles and data characteristics of NST, the accuracy of sequencing data has been reduced, thereby limiting monitoring and lineage analysis of SARS-CoV-2. In this study, we developed an analytical pipeline for SARS-CoV-2 rapid detection and lineage identification that integrates phylogenetic-tree and hotspot mutation analysis, which we have named NanoCoV19. This method not only can distinguish and trace the lineages contained in the alpha, beta, delta, gamma, lambda, and omicron variants of SARS-CoV-2 but is also rapid and efficient, completing overall analysis within 1 h. We hope that NanoCoV19 can be used as an auxiliary tool for rapid subtyping and lineage analysis of SARS-CoV-2 and, more importantly, that it can promote further applications of NST in public-health and -safety plans similar to those formulated to address the COVID-19 outbreak.

8.
Front Cell Infect Microbiol ; 12: 886196, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35800387

RESUMO

Autism is a kind of biologically based neurodevelopmental condition, and the coexistence of atopic dermatitis (AD) is not uncommon. Given that the gut microbiota plays an important role in the development of both diseases, we aimed to explore the differences of gut microbiota and their correlations with urinary organic acids between autistic children with and without AD. We enrolled 61 autistic children including 36 with AD and 25 without AD. The gut microbiota was sequenced by metagenomic shotgun sequencing, and the diversity, compositions, and functional pathways were analyzed further. Urinary organic acids were assayed by gas chromatography-mass spectrometry, and univariate/multivariate analyses were applied. Spearman correlation analysis was conducted to explore their relationships. In our study, AD individuals had more prominent gastrointestinal disorders. The alpha diversity of the gut microbiota was lower in the AD group. LEfSe analysis showed a higher abundance of Anaerostipes caccae, Eubacterium hallii, and Bifidobacterium bifidum in AD individuals, with Akkermansia muciniphila, Roseburia intestinalis, Haemophilus parainfluenzae, and Rothia mucilaginosa in controls. Meanwhile, functional profiles showed that the pathway of lipid metabolism had a higher proportion in the AD group, and the pathway of xenobiotics biodegradation was abundant in controls. Among urinary organic acids, adipic acid, 3-hydroxyglutaric acid, tartaric acid, homovanillic acid, 2-hydroxyphenylacetic acid, aconitic acid, and 2-hydroxyhippuric acid were richer in the AD group. However, only adipic acid remained significant in the multivariate analysis (OR = 1.513, 95% CI [1.042, 2.198], P = 0.030). In the correlation analysis, Roseburia intestinalis had a negative correlation with aconitic acid (r = -0.14, P = 0.02), and the latter was positively correlated with adipic acid (r = 0.41, P = 0.006). Besides, the pathway of xenobiotics biodegradation seems to inversely correlate with adipic acid (r = -0.42, P = 0.18). The gut microbiota plays an important role in the development of AD in autistic children, and more well-designed studies are warranted to explore the underlying mechanism.


Assuntos
Transtorno Autístico , Dermatite Atópica , Microbioma Gastrointestinal , Ácido Aconítico/análise , Adipatos/análise , Criança , Clostridiales , Dermatite Atópica/complicações , Dermatite Atópica/microbiologia , Fezes/microbiologia , Humanos
9.
PLoS One ; 17(5): e0267066, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35594250

RESUMO

Nanopore sequencing produces long reads and offers unique advantages over next-generation sequencing, especially for the assembly of draft bacterial genomes with improved completeness. However, assembly errors can occur due to data characteristics and assembly algorithms. To address these issues, we developed MAECI, a pipeline for generating consensus sequences from multiple assemblies of the same nanopore sequencing data and error correction. Systematic evaluation showed that MAECI is an efficient and effective pipeline to improve the accuracy and completeness of bacterial genome assemblies. The available codes and implementation are at https://github.com/langjidong/MAECI.


Assuntos
Sequenciamento por Nanoporos , Sequência Consenso , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
10.
NAR Genom Bioinform ; 4(2): lqac033, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35464239

RESUMO

Nanopore sequencing, also known as single-molecule real-time sequencing, is a third/fourth generation sequencing technology that enables deciphering single DNA/RNA molecules without the polymerase chain reaction. Although nanopore sequencing has made significant progress in scientific research and clinical practice, its application has been limited compared with next-generation sequencing (NGS) due to specific design principle and data characteristics, especially in hotspot mutation detection. Therefore, we developed Nano2NGS-Muta as a data analysis framework for hotspot mutation detection based on long reads from nanopore sequencing. Nano2NGS-Muta is characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. Long reads can be converted into short reads and then processed through existing NGS analysis pipelines in combination with statistical methods for hotspot mutation detection. Nano2NGS-Muta not only effectively avoids false positive/negative results caused by non-random errors and unexpected insertions-deletions (indels) of nanopore sequencing data, improves the detection accuracy of hotspot mutations compared to conventional nanopore sequencing data analysis algorithms but also breaks the barriers of data analysis methods between short-read sequencing and long-read sequencing. We hope Nano2NGS-Muta can serves as a reference method for nanopore sequencing data and promotes higher application scope of nanopore sequencing technology in scientific research and clinical practice.

11.
BMC Genomics ; 23(Suppl 1): 316, 2022 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-35443609

RESUMO

BACKGROUND: Drug-resistant bacteria are important carriers of antibiotic-resistant genes (ARGs). This fact is crucial for the development of precise clinical drug treatment strategies. Long-read sequencing platforms such as the Oxford Nanopore sequencer can improve genome assembly efficiency particularly when they are combined with short-read sequencing data. RESULTS: Alcaligenes faecalis PGB1 was isolated and identified with resistance to penicillin and three other antibiotics. After being sequenced by Nanopore MinION and Illumina sequencer, its entire genome was hybrid-assembled. One chromosome and one plasmid was assembled and annotated with 4,433 genes (including 91 RNA genes). Function annotation and comparison between strains were performed. A phylogenetic analysis revealed that it was closest to A. faecalis ZD02. Resistome related sequences was explored, including ARGs, Insert sequence, phage. Two plasmid aminoglycoside genes were determined to be acquired ARGs. The main ARG category was antibiotic efflux resistance and ß-lactamase (EC 3.5.2.6) of PGB1 was assigned to Class A, Subclass A1b, and Cluster LSBL3. CONCLUSIONS: The present study identified the newly isolated bacterium A. faecalis PGB1 and systematically annotated its genome sequence and ARGs.


Assuntos
Alcaligenes faecalis , Nanoporos , Alcaligenes faecalis/genética , Antibacterianos/farmacologia , Sequenciamento de Nucleotídeos em Larga Escala , Filogenia , Prostaglandinas B , Análise de Sequência de DNA
12.
Imeta ; 1(4): e47, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38867910

RESUMO

DNA-based biological ingredient identification and downstream pharmacology network analysis are commonly used in research for Traditional Chinese Medicine preparations (TCM formulas). Advancements in bioinformatics tools and the accumulation of related data have become driving forces for progress in this field. However, a lack of a platform integrating biological ingredient identification and downstream pharmacology network analysis hinders the deep understanding of TCM. In this study, we developed the TCM-Suite platform composed of two sub-databases, Holmes-Suite and Watson-Suite, for TCM biological ingredient identification and network pharmacology investigation, respectively, both are among the most complete: In the Holmes-Suite, we collected and processed six types of marker gene sequences, accounting for 1,251,548 marker gene sequences. In the Watson-Suite, we curated and integrated a massive number of entries from more than 10 public databases. Importantly, we developed a comprehensive pipeline to integrate TCM biological ingredient identification and downstream network pharmacology research, allowing users to simultaneously identify components of a TCM formula and analyze its potential pharmacology mechanism. Furthermore, we designed search engines and a user-friendly platform to better search and visualize these rich resources. TCM-Suite is a comprehensive and holistic platform for TCM-based drug discovery and repurposing. TCM-Suite website: http://TCM-Suite.AImicrobiome.cn.

13.
Front Oncol ; 11: 738222, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34868931

RESUMO

Tamoxifen (TAM) is the most commonly used adjuvant endocrine drug for hormone receptor-positive (HR+) breast cancer patients. However, how to accurately evaluate the risk of breast cancer recurrence and metastasis after adjuvant TAM therapy is still a major concern. In recent years, many studies have shown that the clinical outcomes of TAM-treated breast cancer patients are influenced by the activity of some cytochrome P450 (CYP) enzymes that catalyze the formation of active TAM metabolites like endoxifen and 4-hydroxytamoxifen. In this study, we aimed to first develop and validate an algorithm combining polymorphisms in CYP genes and clinicopathological signatures to identify a subpopulation of breast cancer patients who might benefit most from TAM adjuvant therapy and meanwhile evaluate major risk factors related to TAM resistance. Specifically, a total of 256 patients with invasive breast cancer who received adjuvant endocrine therapy were selected. The genotypes at 10 loci from three TAM metabolism-related CYP genes were detected by time-of-flight mass spectrometry and multiplex long PCR. Combining the 10 loci with nine clinicopathological characteristics, we obtained 19 important features whose association with cancer recurrence was assessed by importance score via random forests. After that, a logistic regression model was trained to calculate TAM risk-of-recurrence score (TAM RORs), which is adopted to assess a patient's risk of recurrence after TAM treatment. The sensitivity and specificity of the model in an independent test cohort were 86.67% and 64.56%, respectively. This study showed that breast cancer patients with high TAM RORs were less sensitive to TAM treatment and manifested more invasive characteristics, whereas those with low TAM RORs were highly sensitive to TAM treatment, and their conditions were stable during the follow-up period. There were some risk factors that had a significant effect on the efficacy of TAM. They were tissue classification (tumor Grade < 2 vs. Grade ≥ 2, p = 2.2e-16), the number of lymph node metastases (Node-Negative vs. Node < 4, p = 5.3e-07; Node < 4 vs. Node ≥ 4, p = 0.003; Node-Negative vs. Node ≥ 4, p = 7.2e-15), and the expression levels of estrogen receptor (ER) and progesterone receptor (PR) (ER < 50% vs. ER ≥ 50%, p = 1.3e-12; PR < 50% vs. PR ≥ 50%, p = 2.6e-08). The really remarkable thing is that different genotypes of CYP2D6*10(C188T) show significant differences in prediction function (CYP2D6*10 CC vs. TT, p < 0.019; CYP2D6*10 CT vs. TT, p < 0.037). There are more than 50% Chinese who have CYP2D6*10 mutation. So the genotype of CYP2D6*10(C188T) should be tested before TAM therapy.

14.
Front Genet ; 12: 730519, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34777467

RESUMO

Illumina is the leading sequencing platform in the next-generation sequencing (NGS) market globally. In recent years, MGI Tech has presented a series of new sequencers, including DNBSEQ-T7, MGISEQ-2000 and MGISEQ-200. As a complex application of NGS, cancer-detecting panels pose increasing demands for the high accuracy and sensitivity of sequencing and data analysis. In this study, we used the same capture DNA libraries constructed based on the Illumina protocol to evaluate the performance of the Illumina Nextseq500 and MGISEQ-2000 sequencing platforms. We found that the two platforms had high consistency in the results of hotspot mutation analysis; more importantly, we found that there was a significant loss of fragments in the 101-133 bp size range on the MGISEQ-2000 sequencing platform for Illumina libraries, but not for the capture DNA libraries prepared based on the MGISEQ protocol. This phenomenon may indicate fragment selection or low fragment ligation efficiency during the DNA circularization step, which is a unique step of the MGISEQ-2000 sequence platform. In conclusion, these different sequencing libraries and corresponding sequencing platforms are compatible with each other, but protocol and platform selection need to be carefully evaluated in combination with research purpose.

15.
Front Genet ; 11: 674, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32760423

RESUMO

Patients with carcinoma of unknown primary (CUP) account for 3-5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested that molecular profiling of tissue-specific genes could be useful in inferring the primary tissue of a tumor. The purpose of this study was to evaluate the performance somatic mutations detected in a tumor to identify the cancer tissue of origin. We downloaded the somatic mutation datasets from the International Cancer Genome Consortium project. The random forest algorithm was used to extract features, and a classifier was established based on the logistic regression. Specifically, the somatic mutations of 300 genes were extracted, which are significantly enriched in functions, such as cell-to-cell adhesion. In addition, the prediction accuracy on tissue-of-origin inference for 3,374 cancer samples across 13 cancer types reached 81% in a 10-fold cross-validation. Our method could be useful in the identification of cancer tissue of origin, as well as the diagnosis and treatment of cancers.

16.
Biochim Biophys Acta Mol Basis Dis ; 1866(11): 165916, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32771416

RESUMO

Carcinoma of unknown primary (CUP), defined as metastatic cancers with unknown cancer origin, occurs in 3-5 per 100 cancer patients in the United States. Heterogeneity and metastasis of cancer brings great difficulties to the follow-up diagnosis and treatment for CUP. To find the tissue-of-origin (TOO) of the CUP, multiple methods have been raised. However, the accuracies for computed tomography (CT) and positron emission tomography (PET) to identify TOO were 20%-27% and 24%-40% respectively, which were not enough for determining targeted therapies. In this study, we provide a machine learning framework to trace tumor tissue origin by using gene length-normalized somatic mutation sequencing data. Somatic mutation data was downloaded from the Data Portal (Release 28) of the International Cancer Genome Consortium (ICGC), and 4909 samples for 13 cancers was used to identify primary site of cancers. Optimal results were obtained based on a 600-gene set by using the random forest algorithm with 10-fold cross-validation, and the average accuracy and F1-score were 0.8822 and 0.8886 respectively across 13 types of cancer. In conclusion, we provide an effective computational framework to infer cancer tissue-of-origin by combining DNA sequencing and machine learning techniques, which is promising in assisting clinical diagnosis of cancers.


Assuntos
DNA/genética , Aprendizado de Máquina , Neoplasias Primárias Desconhecidas/genética , Algoritmos , Mutação/genética , Tomografia por Emissão de Pósitrons , Análise de Sequência de DNA
17.
Mol Ther Nucleic Acids ; 21: 676-686, 2020 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-32759058

RESUMO

In this study, we proposed an ensemble learning method, simultaneously integrating a low-rank matrix completion model and a ridge regression model to predict anticancer drug response on cancer cell lines. The model was applied to two benchmark datasets, including the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC). As previous studies suggest, the dual-layer integrated cell line-drug network model was one of the best models by far and outperformed most state-of-the-art models. Thus, we performed a head-to-head comparison between the dual-layer integrated cell line-drug network model and our model by a 10-fold crossvalidation study. For the CCLE dataset, our model has a higher Pearson correlation coefficient between predicted and observed drug responses than that of the dual-layer integrated cell line-drug network model in 18 out of 23 drugs. For the GDSC dataset, our model is better in 26 out of 28 drugs in the phosphatidylinositol 3-kinase (PI3K) pathway and 26 out of 30 drugs in the extracellular signal-regulated kinase (ERK) signaling pathway, respectively. Based on the prediction results, we carried out two types of case studies, which further verified the effectiveness of the proposed model on the drug-response prediction. In addition, our model is more biologically interpretable than the compared method, since it explicitly outputs the genes involved in the prediction, which are enriched in functions, like transcription, Src homology 2/3 (SH2/3) domain, cell cycle, ATP binding, and zinc finger.

18.
Artigo em Inglês | MEDLINE | ID: mdl-32850691

RESUMO

Sequencing-based identification of tumor tissue-of-origin (TOO) is critical for patients with cancer of unknown primary lesions. Even if the TOO of a tumor can be diagnosed by clinicopathological observation, reevaluations by computational methods can help avoid misdiagnosis. In this study, we developed a neural network (NN) framework using the expression of a 150-gene panel to infer the tumor TOO for 15 common solid tumor cancer types, including lung, breast, liver, colorectal, gastroesophageal, ovarian, cervical, endometrial, pancreatic, bladder, head and neck, thyroid, prostate, kidney, and brain cancers. To begin with, we downloaded the RNA-Seq data of 7,460 primary tumor samples across the above mentioned 15 cancer types, with each type of cancer having between 142 and 1,052 samples, from the cancer genome atlas. Then, we performed feature selection by the Pearson correlation method and performed a 150-gene panel analysis; the genes were significantly enriched in the GO:2001242 Regulation of intrinsic apoptotic signaling pathway and the GO:0009755 Hormone-mediated signaling pathway and other similar functions. Next, we developed a novel NN model using the 150 genes to predict tumor TOO for the 15 cancer types. The average prediction sensitivity and precision of the framework are 93.36 and 94.07%, respectively, for the 7,460 tumor samples based on the 10-fold cross-validation; however, the prediction sensitivity and precision for a few specific cancers, like prostate cancer, reached 100%. We also tested the trained model on a 20-sample independent dataset with metastatic tumor, and achieved an 80% accuracy. In summary, we present here a highly accurate method to infer tumor TOO, which has potential clinical implementation.

19.
Artigo em Inglês | MEDLINE | ID: mdl-32850708

RESUMO

Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate.

20.
Artigo em Inglês | MEDLINE | ID: mdl-32850745

RESUMO

Circulating tumor cells (CTCs) derived from primary tumors and/or metastatic tumors are markers for tumor prognosis, and can also be used to monitor therapeutic efficacy and tumor recurrence. Circulating tumor cells enrichment and screening can be automated, but the final counting of CTCs currently requires manual intervention. This not only requires the participation of experienced pathologists, but also easily causes artificial misjudgment. Medical image recognition based on machine learning can effectively reduce the workload and improve the level of automation. So, we use machine learning to identify CTCs. First, we collected the CTC test results of 600 patients. After immunofluorescence staining, each picture presented a positive CTC cell nucleus and several negative controls. The images of CTCs were then segmented by image denoising, image filtering, edge detection, image expansion and contraction techniques using python's openCV scheme. Subsequently, traditional image recognition methods and machine learning were used to identify CTCs. Machine learning algorithms are implemented using convolutional neural network deep learning networks for training. We took 2300 cells from 600 patients for training and testing. About 1300 cells were used for training and the others were used for testing. The sensitivity and specificity of recognition reached 90.3 and 91.3%, respectively. We will further revise our models, hoping to achieve a higher sensitivity and specificity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...