Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
1.
iScience ; 27(6): 110009, 2024 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-38868206

RESUMEN

Continuous assessment of the impact of SARS-CoV-2 on the host at the cell-type level is crucial for understanding key mechanisms involved in host defense responses to viral infection. We investigated host response to ancestral-strain and Alpha-variant SARS-CoV-2 infections within air-liquid-interface human nasal epithelial cells from younger adults (26-32 Y) and older children (12-14 Y) using single-cell RNA-sequencing. Ciliated and secretory-ciliated cells formed the majority of highly infected cell-types, with the latter derived from ciliated lineages. Strong innate immune responses were observed across lowly infected and uninfected bystander cells and heightened in Alpha-infection. Alpha highly infected cells showed increased expression of protein-refolding genes compared with ancestral-strain-infected cells in children. Furthermore, oxidative phosphorylation-related genes were down-regulated in bystander cells versus infected and mock-control cells, underscoring the importance of these biological functions for viral replication. Overall, this study highlights the complexity of cell-type-, age- and viral strain-dependent host epithelial responses to SARS-CoV-2.

2.
Gigascience ; 132024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38573185

RESUMEN

BACKGROUND: Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. RESULTS: We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. CONCLUSIONS: Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.


Asunto(s)
Mycobacterium tuberculosis , Humanos , Mycobacterium tuberculosis/genética , Benchmarking , Bases de Datos Factuales , Genoma Humano , Metagenoma
3.
Bioinform Adv ; 4(1): vbae035, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38549946

RESUMEN

Motivation: PE/PPE proteins, highly abundant in the Mycobacterium genome, play a vital role in virulence and immune modulation. Understanding their functions is key to comprehending the internal mechanisms of Mycobacterium. However, a lack of dedicated resources has limited research into PE/PPE proteins. Results: Addressing this gap, we introduce MycobactERIal PE/PPE proTeinS (MERITS), a comprehensive 3D structure database specifically designed for PE/PPE proteins. MERITS hosts 22 353 non-redundant PE/PPE proteins, encompassing details like physicochemical properties, subcellular localization, post-translational modification sites, protein functions, and measures of antigenicity, toxicity, and allergenicity. MERITS also includes data on their secondary and tertiary structure, along with other relevant biological information. MERITS is designed to be user-friendly, offering interactive search and data browsing features to aid researchers in exploring the potential functions of PE/PPE proteins. MERITS is expected to become a crucial resource in the field, aiding in developing new diagnostics and vaccines by elucidating the sequence-structure-functional relationships of PE/PPE proteins. Availability and implementation: MERITS is freely accessible at http://merits.unimelb-biotools.cloud.edu.au/.

4.
Lancet Child Adolesc Health ; 8(5): 325-338, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38513681

RESUMEN

BACKGROUND: Sepsis is defined as dysregulated host response to infection that leads to life-threatening organ dysfunction. Biomarkers characterising the dysregulated host response in sepsis are lacking. We aimed to develop host gene expression signatures to predict organ dysfunction in children with bacterial or viral infection. METHODS: This cohort study was done in emergency departments and intensive care units of four hospitals in Queensland, Australia, and recruited children aged 1 month to 17 years who, upon admission, underwent a diagnostic test, including blood cultures, for suspected sepsis. Whole-blood RNA sequencing of blood was performed with Illumina NovaSeq (San Diego, CA, USA). Samples with completed phenotyping, monitoring, and RNA extraction by March 31, 2020, were included in the discovery cohort; samples collected or completed thereafter and by Oct 27, 2021, constituted the Rapid Paediatric Infection Diagnosis in Sepsis (RAPIDS) internal validation cohort. An external validation cohort was assembled from RNA sequencing gene expression count data from the observational European Childhood Life-threatening Infectious Disease Study (EUCLIDS), which recruited children with severe infection in nine European countries between 2012 and 2016. Feature selection approaches were applied to derive novel gene signatures for disease class (bacterial vs viral infection) and disease severity (presence vs absence of organ dysfunction 24 h post-sampling). The primary endpoint was the presence of organ dysfunction 24 h after blood sampling in the presence of confirmed bacterial versus viral infection. Gene signature performance is reported as area under the receiver operating characteristic curves (AUCs) and 95% CI. FINDINGS: Between Sept 25, 2017, and Oct 27, 2021, 907 patients were enrolled. Blood samples from 595 patients were included in the discovery cohort, and samples from 312 children were included in the RAPIDS validation cohort. We derived a ten-gene disease class signature that achieved an AUC of 94·1% (95% CI 90·6-97·7) in distinguishing bacterial from viral infections in the RAPIDS validation cohort. A ten-gene disease severity signature achieved an AUC of 82·2% (95% CI 76·3-88·1) in predicting organ dysfunction within 24 h of sampling in the RAPIDS validation cohort. Used in tandem, the disease class and disease severity signatures predicted organ dysfunction within 24 h of sampling with an AUC of 90·5% (95% CI 83·3-97·6) for patients with predicted bacterial infection and 94·7% (87·8-100·0) for patients with predicted viral infection. In the external EUCLIDS validation dataset (n=362), the disease class and disease severity predicted organ dysfunction at time of sampling with an AUC of 70·1% (95% CI 44·1-96·2) for patients with predicted bacterial infection and 69·6% (53·1-86·0) for patients with predicted viral infection. INTERPRETATION: In children evaluated for sepsis, novel host transcriptomic signatures specific for bacterial and viral infection can identify dysregulated host response leading to organ dysfunction. FUNDING: Australian Government Medical Research Future Fund Genomic Health Futures Mission, Children's Hospital Foundation Queensland, Brisbane Diamantina Health Partners, Emergency Medicine Foundation, Gold Coast Hospital Foundation, Far North Queensland Foundation, Townsville Hospital and Health Services SERTA Grant, and Australian Infectious Diseases Research Centre.


Asunto(s)
Infecciones Bacterianas , Sepsis , Virosis , Humanos , Niño , Estudios de Cohortes , Transcriptoma , Insuficiencia Multiorgánica/diagnóstico , Insuficiencia Multiorgánica/genética , Estudios Prospectivos , Australia , Sepsis/diagnóstico , Sepsis/genética
5.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37874948

RESUMEN

Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.


Asunto(s)
Aprendizaje Automático , Péptido Hidrolasas , Péptido Hidrolasas/metabolismo , Especificidad por Sustrato , Algoritmos
6.
Lancet Digit Health ; 5(11): e774-e785, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37890901

RESUMEN

BACKGROUND: Differentiating between self-resolving viral infections and bacterial infections in children who are febrile is a common challenge, causing difficulties in identifying which individuals require antibiotics. Studying the host response to infection can provide useful insights and can lead to the identification of biomarkers of infection with diagnostic potential. This study aimed to identify host protein biomarkers for future development into an accurate, rapid point-of-care test that can distinguish between bacterial and viral infections, by recruiting children presenting to health-care settings with fever or a history of fever in the previous 72 h. METHODS: In this multi-cohort machine learning study, patient data were taken from EUCLIDS, the Swiss Pediatric Sepsis study, the GENDRES study, and the PERFORM study, which were all based in Europe. We generated three high-dimensional proteomic datasets (SomaScan and two via liquid chromatography tandem mass spectrometry, referred to as MS-A and MS-B) using targeted and untargeted platforms (SomaScan and liquid chromatography mass spectrometry). Protein biomarkers were then shortlisted using differential abundance analysis, feature selection using forward selection-partial least squares (FS-PLS; 100 iterations), along with a literature search. Identified proteins were tested with Luminex and ELISA and iterative FS-PLS was done again (25 iterations) on the Luminex results alone, and the Luminex and ELISA results together. A sparse protein signature for distinguishing between bacterial and viral infections was identified from the selected proteins. The performance of this signature was finally tested using Luminex assays and by calculating disease risk scores. FINDINGS: 376 children provided serum or plasma samples for use in the discovery of protein biomarkers. 79 serum samples were collected for the generation of the SomaScan dataset, 147 plasma samples for the MS-A dataset, and 150 plasma samples for the MS-B dataset. Differential abundance analysis, and the first round of feature selection using FS-PLS identified 35 protein biomarker candidates, of which 13 had commercial ELISA or Luminex tests available. 16 proteins with ELISA or Luminex tests available were identified by literature review. Further evaluation via Luminex and ELISA and the second round of feature selection using FS-PLS revealed a six-protein signature: three of the included proteins are elevated in bacterial infections (SELE, NGAL, and IFN-γ), and three are elevated in viral infections (IL18, NCAM1, and LG3BP). Performance testing of the signature using Luminex assays revealed area under the receiver operating characteristic curve values between 89·4% and 93·6%. INTERPRETATION: This study has led to the identification of a protein signature that could be ultimately developed into a blood-based point-of-care diagnostic test for rapidly diagnosing bacterial and viral infections in febrile children. Such a test has the potential to greatly improve care of children who are febrile, ensuring that the correct individuals receive antibiotics. FUNDING: European Union's Horizon 2020 research and innovation programme, the European Union's Seventh Framework Programme (EUCLIDS), Imperial Biomedical Research Centre of the National Institute for Health Research, the Wellcome Trust and Medical Research Foundation, Instituto de Salud Carlos III, Consorcio Centro de Investigación Biomédica en Red de Enfermedades Respiratorias, Grupos de Refeencia Competitiva, Swiss State Secretariat for Education, Research and Innovation.


Asunto(s)
Infecciones Bacterianas , Virosis , Humanos , Niño , Proteómica , Infecciones Bacterianas/diagnóstico , Biomarcadores/metabolismo , Virosis/diagnóstico , Antibacterianos
7.
Med ; 4(9): 635-654.e5, 2023 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-37597512

RESUMEN

BACKGROUND: Appropriate treatment and management of children presenting with fever depend on accurate and timely diagnosis, but current diagnostic tests lack sensitivity and specificity and are frequently too slow to inform initial treatment. As an alternative to pathogen detection, host gene expression signatures in blood have shown promise in discriminating several infectious and inflammatory diseases in a dichotomous manner. However, differential diagnosis requires simultaneous consideration of multiple diseases. Here, we show that diverse infectious and inflammatory diseases can be discriminated by the expression levels of a single panel of genes in blood. METHODS: A multi-class supervised machine-learning approach, incorporating clinical consequence of misdiagnosis as a "cost" weighting, was applied to a whole-blood transcriptomic microarray dataset, incorporating 12 publicly available datasets, including 1,212 children with 18 infectious or inflammatory diseases. The transcriptional panel identified was further validated in a new RNA sequencing dataset comprising 411 febrile children. FINDINGS: We identified 161 transcripts that classified patients into 18 disease categories, reflecting individual causative pathogen and specific disease, as well as reliable prediction of broad classes comprising bacterial infection, viral infection, malaria, tuberculosis, or inflammatory disease. The transcriptional panel was validated in an independent cohort and benchmarked against existing dichotomous RNA signatures. CONCLUSIONS: Our data suggest that classification of febrile illness can be achieved with a single blood sample and opens the way for a new approach for clinical diagnosis. FUNDING: European Union's Seventh Framework no. 279185; Horizon2020 no. 668303 PERFORM; Wellcome Trust (206508/Z/17/Z); Medical Research Foundation (MRF-160-0008-ELP-KAFO-C0801); NIHR Imperial BRC.


Asunto(s)
Benchmarking , Investigación Biomédica , Niño , Humanos , Diagnóstico Diferencial , Motivos de Nucleótidos , Fiebre/diagnóstico , Fiebre/genética , ARN
8.
Microb Genom ; 9(8)2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37552534

RESUMEN

Tuberculosis is a global pandemic disease with a rising burden of antimicrobial resistance. As a result, the World Health Organization (WHO) has a goal of enabling universal access to drug susceptibility testing (DST). Given the slowness of and infrastructure requirements for phenotypic DST, whole-genome sequencing, followed by genotype-based prediction of DST, now provides a route to achieving this. Since a central component of genotypic DST is to detect the presence of any known resistance-causing mutations, a natural approach is to use a reference graph that allows encoding of known variation. We have developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of a Mycobacterium tuberculosis drug resistance reference graph. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44 709 and 138 publicly available Illumina and Nanopore samples with associated phenotypes. We find that DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe with Nanopore data. We discover and discuss novel insights into resistance-conferring variation for M. tuberculosis - including deletion of genes katG and pncA - and suggest mutations that may warrant reclassification as associated with resistance.


Asunto(s)
Mycobacterium tuberculosis , Tuberculosis Resistente a Múltiples Medicamentos , Tuberculosis , Humanos , Antituberculosos/farmacología , Antituberculosos/uso terapéutico , Tuberculosis Resistente a Múltiples Medicamentos/genética , Pruebas de Sensibilidad Microbiana , Farmacorresistencia Bacteriana Múltiple/genética , Tuberculosis/microbiología
9.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37291763

RESUMEN

BACKGROUND: Promoters are DNA regions that initiate the transcription of specific genes near the transcription start sites. In bacteria, promoters are recognized by RNA polymerases and associated sigma factors. Effective promoter recognition is essential for synthesizing the gene-encoded products by bacteria to grow and adapt to different environmental conditions. A variety of machine learning-based predictors for bacterial promoters have been developed; however, most of them were designed specifically for a particular species. To date, only a few predictors are available for identifying general bacterial promoters with limited predictive performance. RESULTS: In this study, we developed TIMER, a Siamese neural network-based approach for identifying both general and species-specific bacterial promoters. Specifically, TIMER uses DNA sequences as the input and employs three Siamese neural networks with the attention layers to train and optimize the models for a total of 13 species-specific and general bacterial promoters. Extensive 10-fold cross-validation and independent tests demonstrated that TIMER achieves a competitive performance and outperforms several existing methods on both general and species-specific promoter prediction. As an implementation of the proposed method, the web server of TIMER is publicly accessible at http://web.unimelb-bioinfortools.cloud.edu.au/TIMER/.


Asunto(s)
Bacterias , Redes Neurales de la Computación , Bacterias/genética , Bacterias/metabolismo , ARN Polimerasas Dirigidas por ADN/genética , ARN Polimerasas Dirigidas por ADN/metabolismo , Secuencia de Bases , Regiones Promotoras Genéticas
10.
J Pediatric Infect Dis Soc ; 12(6): 322-331, 2023 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-37255317

RESUMEN

BACKGROUND: To identify a diagnostic blood transcriptomic signature that distinguishes multisystem inflammatory syndrome in children (MIS-C) from Kawasaki disease (KD), bacterial infections, and viral infections. METHODS: Children presenting with MIS-C to participating hospitals in the United Kingdom and the European Union between April 2020 and April 2021 were prospectively recruited. Whole-blood RNA Sequencing was performed, contrasting the transcriptomes of children with MIS-C (n = 38) to those from children with KD (n = 136), definite bacterial (DB; n = 188) and viral infections (DV; n = 138). Genes significantly differentially expressed (SDE) between MIS-C and comparator groups were identified. Feature selection was used to identify genes that optimally distinguish MIS-C from other diseases, which were subsequently translated into RT-qPCR assays and evaluated in an independent validation set comprising MIS-C (n = 37), KD (n = 19), DB (n = 56), DV (n = 43), and COVID-19 (n = 39). RESULTS: In the discovery set, 5696 genes were SDE between MIS-C and combined comparator disease groups. Five genes were identified as potential MIS-C diagnostic biomarkers (HSPBAP1, VPS37C, TGFB1, MX2, and TRBV11-2), achieving an AUC of 96.8% (95% CI: 94.6%-98.9%) in the discovery set, and were translated into RT-qPCR assays. The RT-qPCR 5-gene signature achieved an AUC of 93.2% (95% CI: 88.3%-97.7%) in the independent validation set when distinguishing MIS-C from KD, DB, and DV. CONCLUSIONS: MIS-C can be distinguished from KD, DB, and DV groups using a 5-gene blood RNA expression signature. The small number of genes in the signature and good performance in both discovery and validation sets should enable the development of a diagnostic test for MIS-C.


Asunto(s)
COVID-19 , Síndrome Mucocutáneo Linfonodular , Niño , Humanos , COVID-19/diagnóstico , COVID-19/genética , Síndrome de Respuesta Inflamatoria Sistémica/diagnóstico , Síndrome de Respuesta Inflamatoria Sistémica/genética , Hospitales , Síndrome Mucocutáneo Linfonodular/diagnóstico , Síndrome Mucocutáneo Linfonodular/genética , Prueba de COVID-19
11.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37150785

RESUMEN

A-to-I editing is the most prevalent RNA editing event, which refers to the change of adenosine (A) bases to inosine (I) bases in double-stranded RNAs. Several studies have revealed that A-to-I editing can regulate cellular processes and is associated with various human diseases. Therefore, accurate identification of A-to-I editing sites is crucial for understanding RNA-level (i.e. transcriptional) modifications and their potential roles in molecular functions. To date, various computational approaches for A-to-I editing site identification have been developed; however, their performance is still unsatisfactory and needs further improvement. In this study, we developed a novel stacked-ensemble learning model, ATTIC (A-To-I ediTing predICtor), to accurately identify A-to-I editing sites across three species, including Homo sapiens, Mus musculus and Drosophila melanogaster. We first comprehensively evaluated 37 RNA sequence-derived features combined with 14 popular machine learning algorithms. Then, we selected the optimal base models to build a series of stacked ensemble models. The final ATTIC framework was developed based on the optimal models improved by the feature selection strategy for specific species. Extensive cross-validation and independent tests illustrate that ATTIC outperforms state-of-the-art tools for predicting A-to-I editing sites. We also developed a web server for ATTIC, which is publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/ATTIC/. We anticipate that ATTIC can be utilized as a useful tool to accelerate the identification of A-to-I RNA editing events and help characterize their roles in post-transcriptional regulation.


Asunto(s)
Drosophila melanogaster , Edición de ARN , Animales , Ratones , Humanos , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , ARN/genética , Adenosina/genética , Adenosina/metabolismo , Inosina/genética , Inosina/metabolismo
12.
Nat Commun ; 14(1): 1051, 2023 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-36828918

RESUMEN

A new variant of Streptococcus pyogenes serotype M1 (designated 'M1UK') has been reported in the United Kingdom, linked with seasonal scarlet fever surges, marked increase in invasive infections, and exhibiting enhanced expression of the superantigen SpeA. The progenitor S. pyogenes 'M1global' and M1UK clones can be differentiated by 27 SNPs and 4 indels, yet the mechanism for speA upregulation is unknown. Here we investigate the previously unappreciated expansion of M1UK in Australia, now isolated from the majority of serious infections caused by serotype M1 S. pyogenes. M1UK sub-lineages circulating in Australia also contain a novel toxin repertoire associated with epidemic scarlet fever causing S. pyogenes in Asia. A single SNP in the 5' transcriptional leader sequence of the transfer-messenger RNA gene ssrA drives enhanced SpeA superantigen expression as a result of ssrA terminator read-through in the M1UK lineage. This represents a previously unappreciated mechanism of toxin expression and urges enhanced international surveillance.


Asunto(s)
Escarlatina , Infecciones Estreptocócicas , Humanos , Streptococcus pyogenes/genética , Escarlatina/epidemiología , Superantígenos , Proteínas Bacterianas/genética , Reino Unido , Exotoxinas/genética , Mutación , Australia
13.
Bioinformatics ; 38(17): 4053-4061, 2022 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-35799358

RESUMEN

MOTIVATION: Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identify GSRs as the performance of most existing approaches is unsatisfactory. As such, it is highly desirable to develop more accurate computational methods for GSRs prediction. RESULTS: In this study, we propose a general deep learning framework termed DeepGenGrep, a general predictor for the systematic identification of multiple different GSRs from genomic DNA sequences. DeepGenGrep leverages the power of hybrid neural networks comprising a three-layer convolutional neural network and a two-layer long short-term memory to effectively learn useful feature representations from sequences. Benchmarking experiments demonstrate that DeepGenGrep outperforms several state-of-the-art approaches on identifying polyadenylation signals, translation initiation sites and splice sites across four eukaryotic species including Homo sapiens, Mus musculus, Bos taurus and Drosophila melanogaster. Overall, DeepGenGrep represents a useful tool for the high-throughput and cost-effective identification of potential GSRs in eukaryotic genomes. AVAILABILITY AND IMPLEMENTATION: The webserver and source code are freely available at http://bigdata.biocie.cn/deepgengrep/home and Github (https://github.com/wx-cie/DeepGenGrep/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Ratones , Bovinos , Animales , Drosophila melanogaster/genética , Genómica/métodos , Genoma , Programas Informáticos
15.
Front Immunol ; 13: 832223, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35464437

RESUMEN

Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) using Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analyzed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~101 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2, which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.


Asunto(s)
COVID-19 , Animales , COVID-19/genética , Células CACO-2 , Chlorocebus aethiops , Humanos , Poliadenilación , ARN Mensajero/metabolismo , Proteínas Ribosómicas/metabolismo , SARS-CoV-2 , Análisis de Secuencia de ARN , Células Vero
16.
Comput Struct Biotechnol J ; 20: 662-674, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35140886

RESUMEN

Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins.

17.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35021193

RESUMEN

Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.


Asunto(s)
Drosophila melanogaster , Eucariontes , Animales , Biología Computacional/métodos , Drosophila melanogaster/genética , Células Eucariotas , Ratones , Células Procariotas , Regiones Promotoras Genéticas
18.
BMC Cancer ; 22(1): 85, 2022 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-35057759

RESUMEN

BACKGROUND: Circulating cell-free DNA (cfDNA) in the plasma of cancer patients contains cell-free tumour DNA (ctDNA) derived from tumour cells and it has been widely recognized as a non-invasive source of tumour DNA for diagnosis and prognosis of cancer. Molecular profiling of ctDNA is often performed using targeted sequencing or low-coverage whole genome sequencing (WGS) to identify tumour specific somatic mutations or somatic copy number aberrations (sCNAs). However, these approaches cannot efficiently detect all tumour-derived genomic changes in ctDNA. METHODS: We performed WGS analysis of cfDNA from 4 breast cancer patients and 2 patients with benign tumours. We sequenced matched germline DNA for all 6 patients and tumour samples from the breast cancer patients. All samples were sequenced on Illumina HiSeqXTen sequencing platform and achieved approximately 30x, 60x and 100x coverage on germline, tumour and plasma DNA samples, respectively. RESULTS: The mutational burden of the plasma samples (1.44 somatic mutations/Mb of genome) was higher than the matched tumour samples. However, 90% of high confidence somatic cfDNA variants were not detected in matched tumour samples and were found to comprise two background plasma mutational signatures. In contrast, cfDNA from the di-nucleosome fraction (300 bp-350 bp) had much higher proportion (30%) of variants shared with tumour. Despite high coverage sequencing we were unable to detect sCNAs in plasma samples. CONCLUSIONS: Deep sequencing analysis of plasma samples revealed higher fraction of unique somatic mutations in plasma samples, which were not detected in matched tumour samples. Sequencing of di-nucleosome bound cfDNA fragments may increase recovery of tumour mutations from plasma.


Asunto(s)
Neoplasias de la Mama/genética , ADN Tumoral Circulante/sangre , Análisis Mutacional de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación Completa del Genoma/métodos , Adulto , Biomarcadores de Tumor/genética , Neoplasias de la Mama/sangre , Femenino , Humanos , Mutación , Pronóstico
19.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34729589

RESUMEN

Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.


Asunto(s)
Algoritmos , Biología Computacional , Biología Computacional/métodos , Aprendizaje Automático Supervisado
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA