Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 14.259
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38752856

RESUMO

Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.


Assuntos
Biologia Computacional , Software , Humanos , Biologia Computacional/métodos , Reprodutibilidade dos Testes , Receptores Imunológicos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunidade Adaptativa/genética , Guias como Assunto
3.
Front Cell Infect Microbiol ; 14: 1395239, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38774626

RESUMO

Background: Traditional microbiological detection methods used to detect pulmonary infections in people living with HIV (PLHIV) are usually time-consuming and have low sensitivity, leading to delayed treatment. We aimed to evaluate the diagnostic value of metagenomics next-generation sequencing (mNGS) for microbial diagnosis of suspected pulmonary infections in PLHIV. Methods: We retrospectively analyzed PLHIV who were hospitalized due to suspected pulmonary infections at the sixth people hospital of Zhengzhou from November 1, 2021 to June 30, 2022. Bronchoalveolar lavage fluid (BALF) samples of PLHIV were collected and subjected to routine microbiological examination and mNGS detection. The diagnostic performance of the two methods was compared to evaluate the diagnostic value of mNGS for unknown pathogens. Results: This study included a total of 36 PLHIV with suspected pulmonary infections, of which 31 were male. The reporting period of mNGS is significantly shorter than that of CMTs. The mNGS positive rate of BALF samples in PLHIV was 83.33%, which was significantly higher than that of smear and culture (44.4%, P<0.001). In addition, 11 patients showed consistent results between the two methods. Futhermore, mNGS showed excellent performance in identifying multi-infections in PLHIV, and 27 pathogens were detected in the BALF of 30 PLHIV by mNGS, among which 15 PLHIV were found to have multiple microbial infections (at least 3 pathogens). Pneumocystis jirovecii, human herpesvirus type 5, and human herpesvirus type 4 were the most common pathogen types. Conclusions: For PLHIV with suspected pulmonary infections, mNGS is capable of rapidly and accurately identifying the pathogen causing the pulmonary infection, which contributes to implement timely and accurate anti-infective treatment.


Assuntos
Líquido da Lavagem Broncoalveolar , Infecções por HIV , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Masculino , Feminino , Infecções por HIV/complicações , Infecções por HIV/virologia , Estudos Retrospectivos , Líquido da Lavagem Broncoalveolar/microbiologia , Líquido da Lavagem Broncoalveolar/virologia , Adulto , Pessoa de Meia-Idade , China , Coinfecção/diagnóstico , Coinfecção/microbiologia , Coinfecção/virologia , Infecções Respiratórias/diagnóstico , Infecções Respiratórias/virologia , Infecções Respiratórias/microbiologia
4.
Curr Protoc ; 4(5): e1041, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38774978

RESUMO

The detection, validation, and subsequent interpretation of potentially mosaic single-nucleotide variants (SNV) within next-generation sequencing data remains a challenge in both research and clinical laboratory settings. The ability to identify mosaic variants in high genome coverage sequencing data at levels of ≤1% underscores the necessity for developing guidelines and best practices to verify these variants orthogonally. Droplet digital PCR (ddPCR) has proven to be a powerful and precise method that allows for the determination of low-level variant fractions within a given sample. Herein we describe two precise ddPCR methods using either a fluorescent TaqMan hydrolysis probe approach or an EvaGreen fluorescent dye protocol. The TaqMan approach relies on two different fluorescent probes (FAM and HEX/VIC), each designed to amplify selectively only in the presence of a single nucleotide change denoting the variant or reference position. The fractional abundance is then calculated to determine the relative quantities of both alleles in the final sample. The EvaGreen protocol relies on two independent reactions with oligonucleotide primers designed with the single nucleotide change denoting the variant at the penultimate position of the primer. The relative amplification efficiency of both primer sets (reference and variant) can be compared to determine the mosaic level of a given variant. As the cost of high-coverage sequencing continues to decrease, the identification of potentially mosaic variants will also increase. The approaches outlined will allow clinicians and researchers a more precise determination of the true mosaic level of a given variant allowing them to better assess not only its potential pathogenicity but also its possible recurrence risk when offering genetic counseling to families. © 2024 Wiley Periodicals LLC. Basic Protocol: Droplet digital PCR (ddPCR) with TaqMan hydrolysis probes Alternate Protocol: EvaGreen oligonucleotide-specific ddPCR.


Assuntos
Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , Polimorfismo de Nucleotídeo Único/genética , Humanos , Reação em Cadeia da Polimerase/métodos , Mosaicismo , Corantes Fluorescentes/química , Sequenciamento de Nucleotídeos em Larga Escala/métodos
5.
PLoS One ; 19(5): e0303171, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38768113

RESUMO

Tumor microenvironment (TME) is a complex dynamic system with many tumor-interacting components including tumor-infiltrating leukocytes (TILs), cancer associated fibroblasts, blood vessels, and other stromal constituents. It intrinsically affects tumor development and pharmacology of oncology therapeutics, particularly immune-oncology (IO) treatments. Accurate measurement of TME is therefore of great importance for understanding the tumor immunity, identifying IO treatment mechanisms, developing predictive biomarkers, and ultimately, improving the treatment of cancer. Here, we introduce a mouse-IO NGS-based (NGSmIO) assay for accurately detecting and quantifying the mRNA expression of 1080 TME related genes in mouse tumor models. The NGSmIO panel was shown to be superior to the commonly used microarray approach by hosting 300 more relevant genes to better characterize various lineage of immune cells, exhibits improved mRNA and protein expression correlation to flow cytometry, shows stronger correlation with mRNA expression than RNAseq with 10x higher sequencing depth, and demonstrates higher sensitivity in measuring low-expressed genes. We describe two studies; firstly, detecting the pharmacodynamic change of interferon-γ expression levels upon anti-PD-1: anti-CD4 combination treatment in MC38 and Hepa 1-6 tumors; and secondly, benchmarking baseline TILs in 14 syngeneic tumors using transcript level expression of lineage specific genes, which demonstrate effective and robust applications of the NGSmIO panel.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Microambiente Tumoral , Animais , Camundongos , Microambiente Tumoral/imunologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Interferon gama/genética , Interferon gama/metabolismo , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Modelos Animais de Doenças , Camundongos Endogâmicos C57BL , RNA Mensageiro/genética , Receptor de Morte Celular Programada 1/genética , Receptor de Morte Celular Programada 1/metabolismo , Neoplasias/genética , Neoplasias/imunologia , Feminino , Linfócitos do Interstício Tumoral/imunologia , Linfócitos do Interstício Tumoral/metabolismo , Perfilação da Expressão Gênica/métodos
6.
Genes Chromosomes Cancer ; 63(5): e23238, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38722224

RESUMO

Pleomorphic rhabdomyosarcoma (PRMS) is a rare and highly aggressive sarcoma, occurring mostly in the deep soft tissues of middle-aged adults and showing a variable degree of skeletal muscle differentiation. The diagnosis is challenging as pathologic features overlap with embryonal rhabdomyosarcoma (ERMS), malignant Triton tumor, and other pleomorphic sarcomas. As recurrent genetic alterations underlying PRMS have not been described to date, ancillary molecular diagnostic testing is not useful in subclassification. Herein, we perform genomic profiling of a well-characterized cohort of 14 PRMS, compared to a control group of 23 ERMS and other pleomorphic sarcomas (undifferentiated pleomorphic sarcoma and pleomorphic liposarcoma) using clinically validated DNA-targeted Next generation sequencing (NGS) panels (MSK-IMPACT). The PRMS cohort included eight males and six females, with a median age of 53 years (range 31-76 years). Despite similar tumor mutation burdens, the genomic landscape of PRMS, with a high frequency of TP53 (79%) and RB1 (43%) alterations, stood in stark contrast to ERMS, with 4% and 0%, respectively. CDKN2A deletions were more common in PRMS (43%), compared to ERMS (13%). In contrast, ERMS harbored somatic driver mutations in the RAS pathway and loss of function mutations in BCOR, which were absent in PRMS. Copy number variations in PRMS showed multiple chromosomal arm-level changes, most commonly gains of chr17p and chr22q and loss of chr6q. Notably, gain of chr8, commonly seen in ERMS (61%) was conspicuously absent in PRMS. The genomic profiles of other pleomorphic sarcomas were overall analogous to PRMS, showing shared alterations in TP53, RB1, and CDKN2A. Overall survival and progression-free survival of PRMS were significantly worse (p < 0.0005) than that of ERMS. Our findings revealed that the molecular landscape of PRMS aligns with other adult pleomorphic sarcomas and is distinct from that of ERMS. Thus, NGS assays may be applied in select challenging cases toward a refined classification. Finally, our data corroborate the inclusion of PRMS in the therapeutic bracket of pleomorphic sarcomas, given that their clinical outcomes are comparable.


Assuntos
Rabdomiossarcoma Embrionário , Humanos , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Idoso , Rabdomiossarcoma Embrionário/genética , Rabdomiossarcoma Embrionário/patologia , Rabdomiossarcoma/genética , Rabdomiossarcoma/patologia , Rabdomiossarcoma/classificação , Mutação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Biomarcadores Tumorais/genética , Proteínas de Ligação a Retinoblastoma/genética , Ubiquitina-Proteína Ligases
7.
BMC Bioinformatics ; 25(1): 180, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720249

RESUMO

BACKGROUND: High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling. RESULTS: Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants. CONCLUSIONS: Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.


Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/genética , Mutação , Algoritmos , DNA de Neoplasias/genética , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos
8.
Microbiome ; 12(1): 84, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38725076

RESUMO

BACKGROUND: Emergence of antibiotic resistance in bacteria is an important threat to global health. Antibiotic resistance genes (ARGs) are some of the key components to define bacterial resistance and their spread in different environments. Identification of ARGs, particularly from high-throughput sequencing data of the specimens, is the state-of-the-art method for comprehensively monitoring their spread and evolution. Current computational methods to identify ARGs mainly rely on alignment-based sequence similarities with known ARGs. Such approaches are limited by choice of reference databases and may potentially miss novel ARGs. The similarity thresholds are usually simple and could not accommodate variations across different gene families and regions. It is also difficult to scale up when sequence data are increasing. RESULTS: In this study, we developed ARGNet, a deep neural network that incorporates an unsupervised learning autoencoder model to identify ARGs and a multiclass classification convolutional neural network to classify ARGs that do not depend on sequence alignment. This approach enables a more efficient discovery of both known and novel ARGs. ARGNet accepts both amino acid and nucleotide sequences of variable lengths, from partial (30-50 aa; 100-150 nt) sequences to full-length protein or genes, allowing its application in both target sequencing and metagenomic sequencing. Our performance evaluation showed that ARGNet outperformed other deep learning models including DeepARG and HMD-ARG in most of the application scenarios especially quasi-negative test and the analysis of prediction consistency with phylogenetic tree. ARGNet has a reduced inference runtime by up to 57% relative to DeepARG. CONCLUSIONS: ARGNet is flexible, efficient, and accurate at predicting a broad range of ARGs from the sequencing data. ARGNet is freely available at https://github.com/id-bioinfo/ARGNet , with an online service provided at https://ARGNet.hku.hk . Video Abstract.


Assuntos
Bactérias , Redes Neurais de Computação , Bactérias/genética , Bactérias/efeitos dos fármacos , Bactérias/classificação , Farmacorresistência Bacteriana/genética , Antibacterianos/farmacologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Genes Bacterianos/genética , Resistência Microbiana a Medicamentos/genética , Humanos , Aprendizado Profundo
9.
Front Cell Infect Microbiol ; 14: 1366908, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38725449

RESUMO

Background: Metagenomic next-generation sequencing (mNGS) is a novel non-invasive and comprehensive technique for etiological diagnosis of infectious diseases. However, its practical significance has been seldom reported in the context of hematological patients with high-risk febrile neutropenia, a unique patient group characterized by neutropenia and compromised immune responses. Methods: This retrospective study evaluated the results of plasma cfDNA sequencing in 164 hematological patients with high-risk febrile neutropenia. We assessed the diagnostic efficacy and clinical impact of mNGS, comparing it with conventional microbiological tests. Results: mNGS identified 68 different pathogens in 111 patients, whereas conventional methods detected only 17 pathogen types in 36 patients. mNGS exhibited a significantly higher positive detection rate than conventional methods (67.7% vs. 22.0%, P < 0.001). This improvement was consistent across bacterial (30.5% vs. 9.1%), fungal (19.5% vs. 4.3%), and viral (37.2% vs. 9.1%) infections (P < 0.001 for all comparisons). The anti-infective treatment strategies were adjusted for 51.2% (84/164) of the patients based on the mNGS results. Conclusions: mNGS of plasma cfDNA offers substantial promise for the early detection of pathogens and the timely optimization of anti-infective therapies in hematological patients with high-risk febrile neutropenia.


Assuntos
Neutropenia Febril , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica , Humanos , Metagenômica/métodos , Masculino , Estudos Retrospectivos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Feminino , Pessoa de Meia-Idade , Neutropenia Febril/microbiologia , Neutropenia Febril/sangue , Neutropenia Febril/diagnóstico , Adulto , Idoso , Adulto Jovem , Adolescente , Idoso de 80 Anos ou mais , Infecções Bacterianas/diagnóstico , Infecções Bacterianas/microbiologia , Bactérias/genética , Bactérias/isolamento & purificação , Bactérias/classificação , Micoses/diagnóstico , Micoses/microbiologia , Viroses/diagnóstico , Viroses/virologia
10.
14.
HLA ; 103(5): e15518, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38733247

RESUMO

Donor-derived cell-free DNA (dd-cfDNA) has been widely studied as biomarker for non-invasive allograft rejection monitoring. Earlier rejection detection enables more prompt diagnosis and intervention, ultimately improving patient treatment and outcomes. This multi-centre study aims to verify analytical performance of a next-generation sequencing-based dd-cfDNA assay at end-user environments. Three independent laboratories received the same experimental design and 16 blinded samples to perform cfDNA extraction and the dd-cfDNA assay workflow. dd-cfDNA results were compared between sites and against manufacturer validation to evaluate concordance, reproducibility, repeatability and verify analytical performance. A total of 247 sample libraries were generated across 18 runs, with completion time of <24 h. A 96.0% first pass rate highlighted minimal failures. Overall observed versus expected dd-cfDNA results demonstrated good concordance and a strong positive correlation with linear least squares regression r2 = 0.9989, and high repeatability and reproducibility within and between sites, respectively (p > 0.05). Manufacturer validation established limit of blank 0.18%, limit of detection 0.23% and limit of quantification 0.23%, and results from independent sites verified those limits. Parallel analyses illustrated no significant difference (p = 0.951) between dd-cfDNA results with or without recipient genotype. The dd-cfDNA assay evaluated here has been verified as a reliable method for efficient, reproducible dd-cfDNA quantification in plasma from solid organ transplant recipients without requiring genotyping. Implementation of onsite dd-cfDNA testing at clinical laboratories could facilitate earlier detection of allograft injury, bearing great potential for patient care.


Assuntos
Ácidos Nucleicos Livres , Rejeição de Enxerto , Sequenciamento de Nucleotídeos em Larga Escala , Transplante de Órgãos , Doadores de Tecidos , Transplantados , Humanos , Ácidos Nucleicos Livres/sangue , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Rejeição de Enxerto/diagnóstico , Rejeição de Enxerto/sangue , Rejeição de Enxerto/genética , Biomarcadores/sangue
15.
Int J Mol Sci ; 25(9)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38732187

RESUMO

Dynamic changes in genomic DNA methylation patterns govern the epigenetic developmental programs and accompany the organism's aging. Epigenetic clock (eAge) algorithms utilize DNA methylation to estimate the age and risk factors for diseases as well as analyze the impact of various interventions. High-throughput bisulfite sequencing methods, such as reduced-representation bisulfite sequencing (RRBS) or whole genome bisulfite sequencing (WGBS), provide an opportunity to identify the genomic regions of disordered or heterogeneous DNA methylation, which might be associated with cell-type heterogeneity, DNA methylation erosion, and allele-specific methylation. We systematically evaluated the applicability of five scores assessing the variability of methylation patterns by evaluating within-sample heterogeneity (WSH) to construct human blood epigenetic clock models using RRBS data. The best performance was demonstrated by the model based on a metric designed to assess DNA methylation erosion with an MAE of 3.686 years. We also trained a prediction model that uses the average methylation level over genomic regions. Although this region-based model was relatively more efficient than the WSH-based model, the latter required the analysis of just a few short genomic regions and, therefore, could be a useful tool to design a reduced epigenetic clock that is analyzed by targeted next-generation sequencing.


Assuntos
Envelhecimento , Metilação de DNA , Epigênese Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Envelhecimento/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Ilhas de CpG , Feminino , Masculino , Epigenômica/métodos , Idoso , Adulto , Pessoa de Meia-Idade , Análise de Sequência de DNA/métodos
17.
Nat Commun ; 15(1): 3972, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730241

RESUMO

The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length of sequencing to several kilobases, thereby facilitating the identification of alternative splicing events and isoform expressions. Recently, numerous computational tools for isoform detection using long-read sequencing data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate the performance of these tools, which are implemented with different algorithms, under various simulations that encompass potential influencing factors. In this study, we conducted a benchmark analysis of thirteen methods implemented in nine tools capable of identifying isoform structures from long-read RNA-seq data. We evaluated their performances using simulated data, which represented diverse sequencing platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) data, as well as experimental data. Our findings demonstrate IsoQuant as a highly effective tool for isoform detection with LRS, with Bambu and StringTie2 also exhibiting strong performance. These results offer valuable guidance for future research on alternative splicing analysis and the ongoing improvement of tools for isoform detection using LRS data.


Assuntos
Algoritmos , Processamento Alternativo , RNA Mensageiro , Análise de Sequência de RNA , Humanos , RNA Mensageiro/genética , RNA Mensageiro/análise , Análise de Sequência de RNA/métodos , Isoformas de RNA/genética , Software , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética
18.
BMC Bioinformatics ; 25(1): 186, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730374

RESUMO

BACKGROUND: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. RESULTS: We present CAREx-an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. CONCLUSION: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos , Alinhamento de Sequência/métodos
19.
Hum Genomics ; 18(1): 46, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730490

RESUMO

BACKGROUND: Current clinical diagnosis pathway for lysosomal storage disorders (LSDs) involves sequential biochemical enzymatic tests followed by DNA sequencing, which is iterative, has low diagnostic yield and is costly due to overlapping clinical presentations. Here, we describe a novel low-cost and high-throughput sequencing assay using single-molecule molecular inversion probes (smMIPs) to screen for causative single nucleotide variants (SNVs) and copy number variants (CNVs) in genes associated with 29 common LSDs in India. RESULTS: 903 smMIPs were designed to target exon and exon-intron boundaries of targeted genes (n = 23; 53.7 kb of the human genome) and were equimolarly pooled to create a sequencing library. After extensive validation in a cohort of 50 patients, we screened 300 patients with either biochemical diagnosis (n = 187) or clinical suspicion (n = 113) of LSDs. A diagnostic yield of 83.4% was observed in patients with prior biochemical diagnosis of LSD. Furthermore, diagnostic yield of 73.9% (n = 54/73) was observed in patients with high clinical suspicion of LSD in contrast with 2.4% (n = 1/40) in patients with low clinical suspicion of LSD. In addition to detecting SNVs, the assay could detect single and multi-exon copy number variants with high confidence. Critically, Niemann-Pick disease type C and neuronal ceroid lipofuscinosis-6 diseases for which biochemical testing is unavailable, could be diagnosed using our assay. Lastly, we observed a non-inferior performance of the assay in DNA extracted from dried blood spots in comparison with whole blood. CONCLUSION: We developed a flexible and scalable assay to reliably detect genetic causes of 29 common LSDs in India. The assay consolidates the detection of multiple variant types in multiple sample types while having improved diagnostic yield at same or lower cost compared to current clinical paradigm.


Assuntos
Variações do Número de Cópias de DNA , Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Doenças por Armazenamento dos Lisossomos , Humanos , Doenças por Armazenamento dos Lisossomos/genética , Doenças por Armazenamento dos Lisossomos/diagnóstico , Índia , Variações do Número de Cópias de DNA/genética , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Feminino , Masculino , Sondas Moleculares/genética
20.
Comput Biol Med ; 175: 108542, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38714048

RESUMO

The genomics landscape has undergone a revolutionary transformation with the emergence of third-generation sequencing technologies. Fueled by the exponential surge in sequencing data, there is an urgent demand for accurate and rapid algorithms to effectively handle this burgeoning influx. Under such circumstances, we developed a parallelized, yet accuracy-lossless algorithm for maximal exact match (MEM) retrieval to strategically address the computational bottleneck of uLTRA, a leading spliced alignment algorithm known for its precision in handling long RNA sequencing (RNA-seq) reads. The design of the algorithm incorporates a multi-threaded strategy, enabling the concurrent processing of multiple reads simultaneously. Additionally, we implemented the serialization of index required for MEM retrieval to facilitate its reuse, resulting in accelerated startup for practical tasks. Extensive experiments demonstrate that our parallel algorithm achieves significant improvements in runtime, speedup, throughput, and memory usage. When applied to the largest human dataset, the algorithm achieves an impressive speedup of 10.78 × , significantly improving throughput on a large scale. Moreover, the integration of the parallel MEM retrieval algorithm into the uLTRA pipeline introduces a dual-layered parallel capability, consistently yielding a speedup of 4.99 × compared to the multi-process and single-threaded execution of uLTRA. The thorough analysis of experimental results underscores the adept utilization of parallel processing capabilities and its advantageous performance in handling large datasets. This study provides a showcase of parallelized strategies for MEM retrieval within the context of spliced alignment algorithm, effectively facilitating the process of RNA-seq data analysis. The code is available at https://github.com/RongxingWong/AcceleratingSplicedAlignment.


Assuntos
Algoritmos , Análise de Sequência de RNA , Humanos , Análise de Sequência de RNA/métodos , Splicing de RNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA