Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
BMC Genomics ; 23(Suppl 1): 316, 2022 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-35443609

RESUMEN

BACKGROUND: Drug-resistant bacteria are important carriers of antibiotic-resistant genes (ARGs). This fact is crucial for the development of precise clinical drug treatment strategies. Long-read sequencing platforms such as the Oxford Nanopore sequencer can improve genome assembly efficiency particularly when they are combined with short-read sequencing data. RESULTS: Alcaligenes faecalis PGB1 was isolated and identified with resistance to penicillin and three other antibiotics. After being sequenced by Nanopore MinION and Illumina sequencer, its entire genome was hybrid-assembled. One chromosome and one plasmid was assembled and annotated with 4,433 genes (including 91 RNA genes). Function annotation and comparison between strains were performed. A phylogenetic analysis revealed that it was closest to A. faecalis ZD02. Resistome related sequences was explored, including ARGs, Insert sequence, phage. Two plasmid aminoglycoside genes were determined to be acquired ARGs. The main ARG category was antibiotic efflux resistance and ß-lactamase (EC 3.5.2.6) of PGB1 was assigned to Class A, Subclass A1b, and Cluster LSBL3. CONCLUSIONS: The present study identified the newly isolated bacterium A. faecalis PGB1 and systematically annotated its genome sequence and ARGs.


Asunto(s)
Alcaligenes faecalis , Nanoporos , Alcaligenes faecalis/genética , Antibacterianos/farmacología , Secuenciación de Nucleótidos de Alto Rendimiento , Filogenia , Prostaglandinas B , Análisis de Secuencia de ADN
2.
Proc Natl Acad Sci U S A ; 115(30): E7091-E7100, 2018 07 24.
Artículo en Inglés | MEDLINE | ID: mdl-29987045

RESUMEN

Worldwide, myopia is the leading cause of visual impairment. It results from inappropriate extension of the ocular axis and concomitant declines in scleral strength and thickness caused by extracellular matrix (ECM) remodeling. However, the identities of the initiators and signaling pathways that induce scleral ECM remodeling in myopia are unknown. Here, we used single-cell RNA-sequencing to identify pathways activated in the sclera during myopia development. We found that the hypoxia-signaling, the eIF2-signaling, and mTOR-signaling pathways were activated in murine myopic sclera. Consistent with the role of hypoxic pathways in mouse model of myopia, nearly one third of human myopia risk genes from the genome-wide association study and linkage analyses interact with genes in the hypoxia-inducible factor-1α (HIF-1α)-signaling pathway. Furthermore, experimental myopia selectively induced HIF-1α up-regulation in the myopic sclera of both mice and guinea pigs. Additionally, hypoxia exposure (5% O2) promoted myofibroblast transdifferentiation with down-regulation of type I collagen in human scleral fibroblasts. Importantly, the antihypoxia drugs salidroside and formononetin down-regulated HIF-1α expression as well as the phosphorylation levels of eIF2α and mTOR, slowing experimental myopia progression without affecting normal ocular growth in guinea pigs. Furthermore, eIF2α phosphorylation inhibition suppressed experimental myopia, whereas mTOR phosphorylation induced myopia in normal mice. Collectively, these findings defined an essential role of hypoxia in scleral ECM remodeling and myopia development, suggesting a therapeutic approach to control myopia by ameliorating hypoxia.


Asunto(s)
Matriz Extracelular/metabolismo , Hipoxia , Miopía/terapia , Esclerótica/metabolismo , Transducción de Señal , Animales , Modelos Animales de Enfermedad , Factor 2 Eucariótico de Iniciación/metabolismo , Matriz Extracelular/patología , Proteínas del Ojo/metabolismo , Cobayas , Humanos , Subunidad alfa del Factor 1 Inducible por Hipoxia/metabolismo , Masculino , Ratones , Miopía/metabolismo , Miopía/patología , Esclerótica/irrigación sanguínea , Esclerótica/patología , Serina-Treonina Quinasas TOR/metabolismo
3.
Environ Sci Technol ; 48(3): 1499-507, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24456276

RESUMEN

Particulate matter (PM) air pollution poses a formidable public health threat to the city of Beijing. Among the various hazards of PM pollutants, microorganisms in PM2.5 and PM10 are thought to be responsible for various allergies and for the spread of respiratory diseases. While the physical and chemical properties of PM pollutants have been extensively studied, much less is known about the inhalable microorganisms. Most existing data on airborne microbial communities using 16S or 18S rRNA gene sequencing to categorize bacteria or fungi into the family or genus levels do not provide information on their allergenic and pathogenic potentials. Here we employed metagenomic methods to analyze the microbial composition of Beijing's PM pollutants during a severe January smog event. We show that with sufficient sequencing depth, airborne microbes including bacteria, archaea, fungi, and dsDNA viruses can be identified at the species level. Our results suggested that the majority of the inhalable microorganisms were soil-associated and nonpathogenic to human. Nevertheless, the sequences of several respiratory microbial allergens and pathogens were identified and their relative abundance appeared to have increased with increased concentrations of PM pollution. Our findings may serve as an important reference for environmental scientists, health workers, and city planners.


Asunto(s)
Microbiología del Aire , Contaminantes Atmosféricos/análisis , Bacterias/clasificación , Monitoreo del Ambiente/métodos , Hongos/clasificación , Esmog/análisis , Microbiología del Aire/normas , Contaminantes Atmosféricos/química , Bacterias/aislamiento & purificación , China , Ciudades , Hongos/aislamiento & purificación , Humanos , Exposición por Inhalación , Tamaño de la Partícula , Filogenia , Salud Pública
4.
Methods Mol Biol ; 2809: 115-126, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38907894

RESUMEN

Human leukocyte antigen (HLA) typing is of great importance in clinical applications such as organ transplantation, blood transfusion, disease diagnosis and treatment, and forensic analysis. In recent years, nanopore sequencing technology has emerged as a rapid and cost-effective option for HLA typing. However, due to the principles and data characteristics of nanopore sequencing, there was a scarcity of robust and generalizable bioinformatics tools for its downstream analysis, posing a significant challenge in deciphering the thousands of HLA alleles present in the human population. To address this challenge, we developed NanoHLA as a tool for high-resolution typing of HLA class I genes without error correction based on nanopore sequencing. The method integrated the concepts of HLA type coverage analysis and the data conversion techniques employed in Nano2NGS, which was characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. In validation with public nanopore sequencing datasets, NanoHLA showed an overall concordance rate of 84.34% for HLA-A, HLA-B, and HLA-C, and demonstrated superior performance in comparison to existing tools such as HLA-LA. NanoHLA provides tools and solutions for use in HLA typing related fields, and look forward to further expanding the application of nanopore sequencing technology in both research and clinical settings. The code is available at https://github.com/langjidong/NanoHLA .


Asunto(s)
Alelos , Prueba de Histocompatibilidad , Secuenciación de Nanoporos , Humanos , Prueba de Histocompatibilidad/métodos , Secuenciación de Nanoporos/métodos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Antígenos de Histocompatibilidad Clase I/genética , Antígenos HLA/genética , Análisis de Secuencia de ADN/métodos , Genes MHC Clase I/genética
5.
Front Mol Biosci ; 10: 1093519, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36743210

RESUMEN

Short tandem repeats (STRs) are widely present in the human genome. Studies have confirmed that STRs are associated with more than 30 diseases, and they have also been used in forensic identification and paternity testing. However, there are few methods for STR detection based on nanopore sequencing due to the challenges posed by the sequencing principles and the data characteristics of nanopore sequencing. We developed NanoSTR for detection of target STR loci based on the length-number-rank (LNR) information of reads. NanoSTR can be used for STR detection and genotyping based on long-read data from nanopore sequencing with improved accuracy and efficiency compared with other existing methods, such as Tandem-Genotypes and TRiCoLOR. NanoSTR showed 100% concordance with the expected genotypes using error-free simulated data, and also achieved >85% concordance using the standard samples (containing autosomal and Y-chromosomal loci) with MinION sequencing platform, respectively. NanoSTR showed high performance for detection of target STR markers. Although NanoSTR needs further optimization and development, it is useful as an analytical method for the detection of STR loci by nanopore sequencing. This method adds to the toolbox for nanopore-based STR analysis and expands the applications of nanopore sequencing in scientific research and clinical scenarios. The main code and the data are available at https://github.com/langjidong/NanoSTR.

6.
Front Immunol ; 14: 1100816, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36875075

RESUMEN

Background: Autism Spectrum Disorders (ASD) are defined as a group of pervasive neurodevelopmental disorders, and the heterogeneity in the symptomology and etiology of ASD has long been recognized. Altered immune function and gut microbiota have been found in ASD populations. Immune dysfunction has been hypothesized to involve in the pathophysiology of a subtype of ASD. Methods: A cohort of 105 ASD children were recruited and grouped based on IFN-γ levels derived from ex vivo stimulated γδT cells. Fecal samples were collected and analyzed with a metagenomic approach. Comparison of autistic symptoms and gut microbiota composition was made between subgroups. Enriched KEGG orthologues markers and pathogen-host interactions based on metagenome were also analyzed to reveal the differences in functional features. Results: The autistic behavioral symptoms were more severe for children in the IFN-γ-high group, especially in the body and object use, social and self-help, and expressive language performance domains. LEfSe analysis of gut microbiota revealed an overrepresentation of Selenomonadales, Negatiyicutes, Veillonellaceae and Verrucomicrobiaceae and underrepresentation of Bacteroides xylanisolvens and Bifidobacterium longum in children with higher IFN-γ level. Decreased metabolism function of carbohydrate, amino acid and lipid in gut microbiota were found in the IFN-γ-high group. Additional functional profiles analyses revealed significant differences in the abundances of genes encoding carbohydrate-active enzymes between the two groups. And enriched phenotypes related to infection and gastroenteritis and underrepresentation of one gut-brain module associated with histamine degradation were also found in the IFN-γ-High group. Results of multivariate analyses revealed relatively good separation between the two groups. Conclusions: Levels of IFN-γ derived from γδT cell could serve as one of the potential candidate biomarkers to subtype ASD individuals to reduce the heterogeneity associated with ASD and produce subgroups which are more likely to share a more similar phenotype and etiology. A better understanding of the associations among immune function, gut microbiota composition and metabolism abnormalities in ASD would facilitate the development of individualized biomedical treatment for this complex neurodevelopmental disorder.


Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Microbioma Gastrointestinal , Humanos , Síntomas Conductuales , Aminoácidos
7.
PLoS One ; 17(5): e0267066, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35594250

RESUMEN

Nanopore sequencing produces long reads and offers unique advantages over next-generation sequencing, especially for the assembly of draft bacterial genomes with improved completeness. However, assembly errors can occur due to data characteristics and assembly algorithms. To address these issues, we developed MAECI, a pipeline for generating consensus sequences from multiple assemblies of the same nanopore sequencing data and error correction. Systematic evaluation showed that MAECI is an efficient and effective pipeline to improve the accuracy and completeness of bacterial genome assemblies. The available codes and implementation are at https://github.com/langjidong/MAECI.


Asunto(s)
Secuenciación de Nanoporos , Secuencia de Consenso , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos
8.
Front Genet ; 13: 1008792, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36186464

RESUMEN

Nanopore sequencing technology (NST) has become a rapid and cost-effective method for the diagnosis and epidemiological surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the coronavirus disease 2019 (COVID-19) pandemic. Compared with short-read sequencing platforms (e.g., Illumina's), nanopore long-read sequencing platforms effectively shorten the time required to complete the detection process. However, due to the principles and data characteristics of NST, the accuracy of sequencing data has been reduced, thereby limiting monitoring and lineage analysis of SARS-CoV-2. In this study, we developed an analytical pipeline for SARS-CoV-2 rapid detection and lineage identification that integrates phylogenetic-tree and hotspot mutation analysis, which we have named NanoCoV19. This method not only can distinguish and trace the lineages contained in the alpha, beta, delta, gamma, lambda, and omicron variants of SARS-CoV-2 but is also rapid and efficient, completing overall analysis within 1 h. We hope that NanoCoV19 can be used as an auxiliary tool for rapid subtyping and lineage analysis of SARS-CoV-2 and, more importantly, that it can promote further applications of NST in public-health and -safety plans similar to those formulated to address the COVID-19 outbreak.

9.
Front Microbiol ; 13: 1004664, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36312946

RESUMEN

Background: Human papillomavirus (HPV) infection is the leading cause of cervical cancer. More and more studies discovered that cervical microbiota (CM) composition correlated with HPV infection and the development of cervical cancer. However, more studies need to be implemented to clarify the complex interaction between microbiota and the mechanism of disease development, especially in a specific area of China. Materials and methods: In this study, 16S rDNA sequencing was applied on 276 Thin-prep Cytologic Test (TCT) samples of patients from the Sanmenxia area. Systematical analysis of the microbiota structure, diversity, group, and functional differences between different HPV infection groups and age groups, and co-occurrence relationships of the microbiota was carried out. Results: The major microbiota compositions of all patients include Lactobacillus iners, Escherichia coli, Enterococcus faecalis, and Atopobium vaginae at species level, and Staphylococcus, Lactobacillus, Gardnerella, Bosea, Streptococcus, and Sneathia in genus level. Microbiota diversity was found significantly different between HPV-positive (Chao1 index: 98.8869, p < 0.01), unique-268 infected (infections with one of the HPV genotype 52, 56, or 58, 107.3885, p < 0.01), multi-268 infected (infections with two or more of HPV genotype 52, 56, and 58, 97.5337, p = 0.1012), other1 (94.9619, p < 0.05) groups and HPV-negative group (83.5299). Women older than 60 years old have higher microbiota diversity (108.8851, p < 0.01, n = 255) than younger women (87.0171, n = 21). The abundance of Gardnerella and Atopobium vaginae was significantly higher in the HPV-positive group than in the HPV-negative group, while Burkholderiaceae and Mycoplasma were more abundant in the unique-268 group compared to the negative group. Gamma-proteobacteria and Pseudomonas were found more abundant in older than 60 patients than younger groups. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) analysis revealed the effects on metabolism by microbiota that the metabolism of cells, proteins, and genetic information-related pathways significantly differed between HPV-negative and positive groups. In contrast, lipid metabolism, signal transduction, and cell cycle metabolism pathway significantly differed between multi-268 and negative groups. Conclusion: The HPV infection status and age of women were related to CM's diversity and function pathways. The complex CM co-occurrent relationships and their mechanism in disease development need to be further investigated.

10.
NAR Genom Bioinform ; 4(2): lqac033, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35464239

RESUMEN

Nanopore sequencing, also known as single-molecule real-time sequencing, is a third/fourth generation sequencing technology that enables deciphering single DNA/RNA molecules without the polymerase chain reaction. Although nanopore sequencing has made significant progress in scientific research and clinical practice, its application has been limited compared with next-generation sequencing (NGS) due to specific design principle and data characteristics, especially in hotspot mutation detection. Therefore, we developed Nano2NGS-Muta as a data analysis framework for hotspot mutation detection based on long reads from nanopore sequencing. Nano2NGS-Muta is characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. Long reads can be converted into short reads and then processed through existing NGS analysis pipelines in combination with statistical methods for hotspot mutation detection. Nano2NGS-Muta not only effectively avoids false positive/negative results caused by non-random errors and unexpected insertions-deletions (indels) of nanopore sequencing data, improves the detection accuracy of hotspot mutations compared to conventional nanopore sequencing data analysis algorithms but also breaks the barriers of data analysis methods between short-read sequencing and long-read sequencing. We hope Nano2NGS-Muta can serves as a reference method for nanopore sequencing data and promotes higher application scope of nanopore sequencing technology in scientific research and clinical practice.

11.
Imeta ; 1(4): e47, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38867910

RESUMEN

DNA-based biological ingredient identification and downstream pharmacology network analysis are commonly used in research for Traditional Chinese Medicine preparations (TCM formulas). Advancements in bioinformatics tools and the accumulation of related data have become driving forces for progress in this field. However, a lack of a platform integrating biological ingredient identification and downstream pharmacology network analysis hinders the deep understanding of TCM. In this study, we developed the TCM-Suite platform composed of two sub-databases, Holmes-Suite and Watson-Suite, for TCM biological ingredient identification and network pharmacology investigation, respectively, both are among the most complete: In the Holmes-Suite, we collected and processed six types of marker gene sequences, accounting for 1,251,548 marker gene sequences. In the Watson-Suite, we curated and integrated a massive number of entries from more than 10 public databases. Importantly, we developed a comprehensive pipeline to integrate TCM biological ingredient identification and downstream network pharmacology research, allowing users to simultaneously identify components of a TCM formula and analyze its potential pharmacology mechanism. Furthermore, we designed search engines and a user-friendly platform to better search and visualize these rich resources. TCM-Suite is a comprehensive and holistic platform for TCM-based drug discovery and repurposing. TCM-Suite website: http://TCM-Suite.AImicrobiome.cn.

12.
Front Cell Infect Microbiol ; 12: 886196, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35800387

RESUMEN

Autism is a kind of biologically based neurodevelopmental condition, and the coexistence of atopic dermatitis (AD) is not uncommon. Given that the gut microbiota plays an important role in the development of both diseases, we aimed to explore the differences of gut microbiota and their correlations with urinary organic acids between autistic children with and without AD. We enrolled 61 autistic children including 36 with AD and 25 without AD. The gut microbiota was sequenced by metagenomic shotgun sequencing, and the diversity, compositions, and functional pathways were analyzed further. Urinary organic acids were assayed by gas chromatography-mass spectrometry, and univariate/multivariate analyses were applied. Spearman correlation analysis was conducted to explore their relationships. In our study, AD individuals had more prominent gastrointestinal disorders. The alpha diversity of the gut microbiota was lower in the AD group. LEfSe analysis showed a higher abundance of Anaerostipes caccae, Eubacterium hallii, and Bifidobacterium bifidum in AD individuals, with Akkermansia muciniphila, Roseburia intestinalis, Haemophilus parainfluenzae, and Rothia mucilaginosa in controls. Meanwhile, functional profiles showed that the pathway of lipid metabolism had a higher proportion in the AD group, and the pathway of xenobiotics biodegradation was abundant in controls. Among urinary organic acids, adipic acid, 3-hydroxyglutaric acid, tartaric acid, homovanillic acid, 2-hydroxyphenylacetic acid, aconitic acid, and 2-hydroxyhippuric acid were richer in the AD group. However, only adipic acid remained significant in the multivariate analysis (OR = 1.513, 95% CI [1.042, 2.198], P = 0.030). In the correlation analysis, Roseburia intestinalis had a negative correlation with aconitic acid (r = -0.14, P = 0.02), and the latter was positively correlated with adipic acid (r = 0.41, P = 0.006). Besides, the pathway of xenobiotics biodegradation seems to inversely correlate with adipic acid (r = -0.42, P = 0.18). The gut microbiota plays an important role in the development of AD in autistic children, and more well-designed studies are warranted to explore the underlying mechanism.


Asunto(s)
Trastorno Autístico , Dermatitis Atópica , Microbioma Gastrointestinal , Ácido Aconítico/análisis , Adipatos/análisis , Niño , Clostridiales , Dermatitis Atópica/complicaciones , Dermatitis Atópica/microbiología , Heces/microbiología , Humanos
13.
Front Oncol ; 11: 738222, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34868931

RESUMEN

Tamoxifen (TAM) is the most commonly used adjuvant endocrine drug for hormone receptor-positive (HR+) breast cancer patients. However, how to accurately evaluate the risk of breast cancer recurrence and metastasis after adjuvant TAM therapy is still a major concern. In recent years, many studies have shown that the clinical outcomes of TAM-treated breast cancer patients are influenced by the activity of some cytochrome P450 (CYP) enzymes that catalyze the formation of active TAM metabolites like endoxifen and 4-hydroxytamoxifen. In this study, we aimed to first develop and validate an algorithm combining polymorphisms in CYP genes and clinicopathological signatures to identify a subpopulation of breast cancer patients who might benefit most from TAM adjuvant therapy and meanwhile evaluate major risk factors related to TAM resistance. Specifically, a total of 256 patients with invasive breast cancer who received adjuvant endocrine therapy were selected. The genotypes at 10 loci from three TAM metabolism-related CYP genes were detected by time-of-flight mass spectrometry and multiplex long PCR. Combining the 10 loci with nine clinicopathological characteristics, we obtained 19 important features whose association with cancer recurrence was assessed by importance score via random forests. After that, a logistic regression model was trained to calculate TAM risk-of-recurrence score (TAM RORs), which is adopted to assess a patient's risk of recurrence after TAM treatment. The sensitivity and specificity of the model in an independent test cohort were 86.67% and 64.56%, respectively. This study showed that breast cancer patients with high TAM RORs were less sensitive to TAM treatment and manifested more invasive characteristics, whereas those with low TAM RORs were highly sensitive to TAM treatment, and their conditions were stable during the follow-up period. There were some risk factors that had a significant effect on the efficacy of TAM. They were tissue classification (tumor Grade < 2 vs. Grade ≥ 2, p = 2.2e-16), the number of lymph node metastases (Node-Negative vs. Node < 4, p = 5.3e-07; Node < 4 vs. Node ≥ 4, p = 0.003; Node-Negative vs. Node ≥ 4, p = 7.2e-15), and the expression levels of estrogen receptor (ER) and progesterone receptor (PR) (ER < 50% vs. ER ≥ 50%, p = 1.3e-12; PR < 50% vs. PR ≥ 50%, p = 2.6e-08). The really remarkable thing is that different genotypes of CYP2D6*10(C188T) show significant differences in prediction function (CYP2D6*10 CC vs. TT, p < 0.019; CYP2D6*10 CT vs. TT, p < 0.037). There are more than 50% Chinese who have CYP2D6*10 mutation. So the genotype of CYP2D6*10(C188T) should be tested before TAM therapy.

14.
Front Genet ; 12: 730519, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34777467

RESUMEN

Illumina is the leading sequencing platform in the next-generation sequencing (NGS) market globally. In recent years, MGI Tech has presented a series of new sequencers, including DNBSEQ-T7, MGISEQ-2000 and MGISEQ-200. As a complex application of NGS, cancer-detecting panels pose increasing demands for the high accuracy and sensitivity of sequencing and data analysis. In this study, we used the same capture DNA libraries constructed based on the Illumina protocol to evaluate the performance of the Illumina Nextseq500 and MGISEQ-2000 sequencing platforms. We found that the two platforms had high consistency in the results of hotspot mutation analysis; more importantly, we found that there was a significant loss of fragments in the 101-133 bp size range on the MGISEQ-2000 sequencing platform for Illumina libraries, but not for the capture DNA libraries prepared based on the MGISEQ protocol. This phenomenon may indicate fragment selection or low fragment ligation efficiency during the DNA circularization step, which is a unique step of the MGISEQ-2000 sequence platform. In conclusion, these different sequencing libraries and corresponding sequencing platforms are compatible with each other, but protocol and platform selection need to be carefully evaluated in combination with research purpose.

15.
Biochim Biophys Acta Mol Basis Dis ; 1866(11): 165916, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-32771416

RESUMEN

Carcinoma of unknown primary (CUP), defined as metastatic cancers with unknown cancer origin, occurs in 3-5 per 100 cancer patients in the United States. Heterogeneity and metastasis of cancer brings great difficulties to the follow-up diagnosis and treatment for CUP. To find the tissue-of-origin (TOO) of the CUP, multiple methods have been raised. However, the accuracies for computed tomography (CT) and positron emission tomography (PET) to identify TOO were 20%-27% and 24%-40% respectively, which were not enough for determining targeted therapies. In this study, we provide a machine learning framework to trace tumor tissue origin by using gene length-normalized somatic mutation sequencing data. Somatic mutation data was downloaded from the Data Portal (Release 28) of the International Cancer Genome Consortium (ICGC), and 4909 samples for 13 cancers was used to identify primary site of cancers. Optimal results were obtained based on a 600-gene set by using the random forest algorithm with 10-fold cross-validation, and the average accuracy and F1-score were 0.8822 and 0.8886 respectively across 13 types of cancer. In conclusion, we provide an effective computational framework to infer cancer tissue-of-origin by combining DNA sequencing and machine learning techniques, which is promising in assisting clinical diagnosis of cancers.


Asunto(s)
ADN/genética , Aprendizaje Automático , Neoplasias Primarias Desconocidas/genética , Algoritmos , Mutación/genética , Tomografía de Emisión de Positrones , Análisis de Secuencia de ADN
16.
Mol Ther Nucleic Acids ; 21: 676-686, 2020 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-32759058

RESUMEN

In this study, we proposed an ensemble learning method, simultaneously integrating a low-rank matrix completion model and a ridge regression model to predict anticancer drug response on cancer cell lines. The model was applied to two benchmark datasets, including the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC). As previous studies suggest, the dual-layer integrated cell line-drug network model was one of the best models by far and outperformed most state-of-the-art models. Thus, we performed a head-to-head comparison between the dual-layer integrated cell line-drug network model and our model by a 10-fold crossvalidation study. For the CCLE dataset, our model has a higher Pearson correlation coefficient between predicted and observed drug responses than that of the dual-layer integrated cell line-drug network model in 18 out of 23 drugs. For the GDSC dataset, our model is better in 26 out of 28 drugs in the phosphatidylinositol 3-kinase (PI3K) pathway and 26 out of 30 drugs in the extracellular signal-regulated kinase (ERK) signaling pathway, respectively. Based on the prediction results, we carried out two types of case studies, which further verified the effectiveness of the proposed model on the drug-response prediction. In addition, our model is more biologically interpretable than the compared method, since it explicitly outputs the genes involved in the prediction, which are enriched in functions, like transcription, Src homology 2/3 (SH2/3) domain, cell cycle, ATP binding, and zinc finger.

17.
Front Genet ; 11: 660, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32714374

RESUMEN

During the carcinogenesis of cervical cancer, the DNA of human papillomavirus (HPV) is frequently integrated into the human genome, which might be a biomarker for the early diagnosis of cervical cancer. Although the detection sensitivity of virus infection status increased significantly through the Illumina sequencing platform, there were still disadvantages remain for further improvement, including the detection accuracy and the complex integrated genome structure identification, etc. Nanopore sequencing has been proven to be a fast yet accurate technique of detecting pathogens in clinical samples with significant longer sequencing length. However, the identification of virus integration sites, especially HPV integration sites was seldom carried out by using nanopore platform. In this study, we evaluated the feasibility of identifying HPV integration sites by nanopore sequencer. Specifically, we re-sequenced the integration sites of a previously published sample by both nanopore and Illumina sequencing. After analyzing the results, three points of conclusions were drawn: first, 13 out of 19 previously published integration sites were found from all three datasets (i.e., nanopore, Illumina, and the published data), indicating a high overlap rate and comparability among the three platforms; second, our pipeline of nanopore and Illumina data identified 66 unique integration sites compared with previous published paper with 13 of them being verified by Sanger sequencing, indicating the higher integration sites detection sensitivity of our results compared with published data; third, we established a pipeline which could be used in HPV integration site detection by nanopore sequencing data without doing error correction analysis. In summary, a new nanopore data analysis method was tested and proved to be reliable in integration sites detection compared with methods of existing Illumina data analysis pipeline with less sequencing data required. It provides a solid evidence and tool to support the potential application of nanopore in virus status identification.

18.
Artículo en Inglés | MEDLINE | ID: mdl-32509741

RESUMEN

Metastatic cancers require further diagnosis to determine their primary tumor sites. However, the tissue-of-origin for around 5% tumors could not be identified by routine medical diagnosis according to a statistics in the United States. With the development of machine learning techniques and the accumulation of big cancer data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), it is now feasible to predict cancer tissue-of-origin by computational tools. Metastatic tumor inherits characteristics from its tissue-of-origin, and both gene expression profile and somatic mutation have tissue specificity. Thus, we developed a computational framework to infer tumor tissue-of-origin by integrating both gene mutation and expression (TOOme). Specifically, we first perform feature selection on both gene expressions and mutations by a random forest method. The selected features are then used to build up a multi-label classification model to infer cancer tissue-of-origin. We adopt a few popular multiple-label classification methods, which are compared by the 10-fold cross validation process. We applied TOOme to the TCGA data containing 7,008 non-metastatic samples across 20 solid tumors. Seventy four genes by gene expression profile and six genes by gene mutation are selected by the random forest process, which can be divided into two categories: (1) cancer type specific genes and (2) those expressed or mutated in several cancers with different levels of expression or mutation rates. Function analysis indicates that the selected genes are significantly enriched in gland development, urogenital system development, hormone metabolic process, thyroid hormone generation prostate hormone generation and so on. According to the multiple-label classification method, random forest performs the best with a 10-fold cross-validation prediction accuracy of 96%. We also use the 19 metastatic samples from TCGA and 256 cancer samples downloaded from GEO as independent testing data, for which TOOme achieves a prediction accuracy of 89%. The cross-validation validation accuracy is better than those using gene expression (i.e., 95%) and gene mutation (53%) alone. In conclusion, TOOme provides a quick yet accurate alternative to traditional medical methods in inferring cancer tissue-of-origin. In addition, the methods combining somatic mutation and gene expressions outperform those using gene expression or mutation alone.

19.
Biomed Res Int ; 2020: 6782046, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32462012

RESUMEN

Gene coexpression analysis is widely used to infer gene modules associated with diseases and other clinical traits. However, a systematic view and comparison of gene coexpression networks and modules across a cohort of tissues are more or less ignored. In this study, we first construct gene coexpression networks and modules of 52 GTEx tissues and cell lines. The network modules are enriched in many tissue-common functions like organelle membrane and tissue-specific functions. We then study the correlation of tissues from the network point of view. As a result, the network modules of most tissues are significantly correlated, indicating a general similar network pattern across tissues. However, the level of similarity among the tissues is different. The tissues closing in a physical location seem to be more similar in their coexpression networks. For example, the two adjacent tissues fallopian tube and bladder have the highest Fisher's exact test p value 8.54E-291 among all tissue pairs. It is known that immune-associated modules are frequently identified in coexperssion modules. In this study, we found immune modules in many tissues like liver, kidney cortex, lung, uterus, adipose subcutaneous, and adipose visceral omentum. However, not all tissues have immune-associated modules, for example, brain cerebellum. Finally, by the clique analysis, we identify the largest clique of modules, in which the genes in each module are significantly overlapped with those in other modules. As a result, we are able to find a clique of size 40 (out of 52 tissues), indicating a strong correlation of modules across tissues. It is not surprising that the 40 modules are most commonly enriched in immune-related functions.


Asunto(s)
Análisis por Conglomerados , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Tejido Adiposo , Encéfalo , Femenino , Perfilación de la Expresión Génica , Ontología de Genes , Humanos , Riñón , Hígado , Pulmón , Útero
20.
Artículo en Inglés | MEDLINE | ID: mdl-32637398

RESUMEN

With the development of high throughput technologies, there are more and more protein-protein interaction (PPI) networks available, which provide a need for efficient computational tools for network alignment. Network alignment is widely used to predict functions of certain proteins, identify conserved network modules, and study the evolutionary relationship across species or biological entities. However, network alignment is an NP-complete problem, and previous algorithms are usually slow or less accurate in aligning big networks like human vs. yeast. In this study, we proposed a fast yet accurate algorithm called Network Alignment by Integrating Biological Process (NAIGO). Specifically, we first divided the networks into subnets taking the advantage of known prior knowledge, such as gene ontology. For each subnet pair, we then developed a novel method to align them by considering both protein orthologous information and their local structural information. After that, we expanded the obtained local network alignments in a greedy manner. Taking the aligned pairs as seeds, we formulated the global network alignment problem as an assignment problem based on similarity matrix, which was solved by the Hungarian method. We applied NAIGO to align human and Saccharomyces cerevisiae S288c PPI network and compared the results with other popular methods like IsoRank, GRAAL, SANA, and NABEECO. As a result, our method outperformed the competitors by aligning more orthologous proteins or matched interactions. In addition, we found a few potential functional orthologous proteins such as RRM2B in human and DNA2 in S. cerevisiae S288c, which are related to DNA repair. We also identified a conserved subnet with six orthologous proteins EXO1, MSH3, MSH2, MLH1, MLH3, and MSH6, and six aligned interactions. All these proteins are associated with mismatch repair. Finally, we predicted a few proteins of S. cerevisiae S288c potentially involving in certain biological processes like autophagosome assembly.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA