Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
1.
Methods Mol Biol ; 2812: 39-46, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39068356

RESUMEN

In this chapter, we outline an approach to analyzing metatranscriptomic data, focusing on the assessment of differential enzyme expression and metabolic pathway activities using a novel bioinformatics software tool, EMPathways2. The analysis pipeline commences with raw data originating from a sequencer and concludes with an output of enzyme expressions and an estimate of metabolic pathway activities. The initial step involves aligning specific transcriptomes assembled from RNA-Seq data using Bowtie2 and acquiring gene expression data with IsoEM2. Subsequently, the pipeline proceeds to quality assessment and preprocessing of the input data, ensuring accurate estimates of enzymes and their differential regulation. Upon completion of the preprocessing stage, EMPathways2 is employed to decipher the intricate relationships between genes, enzymes, and pathways. An online repository containing sample data has been made available, alongside custom Python scripts designed to modify the output of the programs within the pipeline for diverse downstream analyses. This chapter highlights the technical aspects and practical applications of using EMPathways2, which facilitates the advancement of transcriptome data analysis and contributes to a deeper understanding of the complex regulatory mechanisms underlying living systems.


Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Redes y Vías Metabólicas , RNA-Seq , Programas Informáticos , RNA-Seq/métodos , Redes y Vías Metabólicas/genética , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Transcriptoma , Humanos , Análisis de Secuencia de ARN/métodos
2.
Nat Commun ; 15(1): 2838, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38565543

RESUMEN

The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data.


Asunto(s)
Evolución Biológica , Genoma Viral , Filogenia , Diagnóstico Precoz , Genoma Viral/genética , Genómica , SARS-CoV-2/genética
3.
bioRxiv ; 2024 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-38293199

RESUMEN

Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.

4.
J Comput Biol ; 30(9): 1009-1018, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37695837

RESUMEN

Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.


Asunto(s)
COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Benchmarking , Análisis por Conglomerados , Progresión de la Enfermedad
5.
Front Cell Infect Microbiol ; 13: 1115350, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37113133

RESUMEN

Lyme disease (LD), the most prevalent tick-borne disease of humans in the Northern Hemisphere, is caused by the spirochetal bacterium of Borreliella burgdorferi (Bb) sensu lato complex. In nature, Bb spirochetes are continuously transmitted between Ixodes ticks and mammalian or avian reservoir hosts. Peromyscus leucopus mice are considered the primary mammalian reservoir of Bb in the United States. Earlier studies demonstrated that experimentally infected P. leucopus mice do not develop disease. In contrast, C3H mice, a widely used laboratory strain of Mus musculus in the LD field, develop severe Lyme arthritis. To date, the exact tolerance mechanism of P. leucopus mice to Bb-induced infection remains unknown. To address this knowledge gap, the present study has compared spleen transcriptomes of P. leucopus and C3H/HeJ mice infected with Bb strain 297 with those of their respective uninfected controls. Overall, the data showed that the spleen transcriptome of Bb-infected P. leucopus mice was much more quiescent compared to that of the infected C3H mice. To date, the current investigation is one of the few that have examined the transcriptome response of natural reservoir hosts to Borreliella infection. Although the experimental design of this study significantly differed from those of two previous investigations, the collective results of the current and published studies have consistently demonstrated very limited transcriptomic responses of different reservoir hosts to the persistent infection of LD pathogens. Importance: The bacterium Borreliella burgdorferi (Bb) causes Lyme disease, which is one of the emerging and highly debilitating human diseases in countries of the Northern Hemisphere. In nature, Bb spirochetes are maintained between hard ticks of Ixodes spp. and mammals or birds. In the United States, the white-footed mouse, Peromyscus leucopus, is one of the main Bb reservoirs. In contrast to humans and laboratory mice (e.g., C3H mice), white-footed mice rarely develop clinical signs (disease) despite being (persistently) infected with Bb. How the white-footed mouse tolerates Bb infection is the question that the present study has attempted to address. Comparisons of genetic responses between Bb-infected and uninfected mice demonstrated that, during a long-term Bb infection, C3H mice reacted much stronger, whereas P. leucopus mice were relatively unresponsive.


Asunto(s)
Borrelia burgdorferi , Ixodes , Enfermedad de Lyme , Animales , Ratones , Humanos , Peromyscus/microbiología , Transcriptoma , Ratones Endogámicos C3H , Reservorios de Enfermedades , Enfermedad de Lyme/microbiología , Borrelia burgdorferi/genética , Ixodes/microbiología , Perfilación de la Expresión Génica
6.
J Comput Biol ; 30(4): 502-517, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36716280

RESUMEN

With the properties of aggressive cancer and heterogeneous tumor biology, triple-negative breast cancer (TNBC) is a type of breast cancer known for its poor clinical outcome. The lack of estrogen, progesterone, and human epidermal growth factor receptor in the tumors of TNBC leads to fewer treatment options in clinics. The incidence of TNBC is higher in African American (AA) women compared with European American (EA) women with worse clinical outcomes. The significant factors responsible for the racial disparity in TNBC are socioeconomic lifestyle and tumor biology. The current study considered the open-source gene expression data of triple-negative breast cancer samples' racial information. We implemented a state-of-the-art classification Support Vector Machine (SVM) method with a recurrent feature elimination approach to the gene expression data to identify significant biomarkers deregulated in AA women and EA women. We also included Spearman's rho and Ward's linkage method in our feature selection workflow. Our proposed method generates 24 features/genes that can classify the AA and EA samples 98% accurately. We also performed the Kaplan-Meier analysis and log-rank test on the 24 features/genes. We only discussed the correlation between deregulated expression and cancer progression with a poor survival rate of 2 genes, KLK10 and LRRC37A2, out of 24 genes. We believe that further improvement of our method with a higher number of RNA-seq gene expression data will more accurately provide insight into racial disparity in TNBC.


Asunto(s)
Disparidades en el Estado de Salud , Neoplasias de la Mama Triple Negativas , Femenino , Humanos , Biomarcadores de Tumor/genética , Negro o Afroamericano/genética , Máquina de Vectores de Soporte , Neoplasias de la Mama Triple Negativas/etnología , Neoplasias de la Mama Triple Negativas/patología , Blanco/genética
7.
Cancer Immunol Res ; 10(9): 1141-1154, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-35834791

RESUMEN

Peripheral neurons comprise a critical component of the tumor microenvironment (TME). The role of the autonomic innervation in cancer has been firmly established. However, the effect of the afferent (sensory) neurons on tumor progression remains unclear. Utilizing surgical and chemical skin sensory denervation methods, we showed that afferent neurons supported the growth of melanoma tumors in vivo and demonstrated that sensory innervation limited the activation of effective antitumor immune responses. Specifically, sensory ablation led to improved leukocyte recruitment into tumors, with decreased presence of lymphoid and myeloid immunosuppressive cells and increased activation of T-effector cells within the TME. Cutaneous sensory nerves hindered the maturation of intratumoral high endothelial venules and limited the formation of mature tertiary lymphoid-like structures containing organized clusters of CD4+ T cells and B cells. Denervation further increased T-cell clonality and expanded the B-cell repertoire in the TME. Importantly, CD8a depletion prevented denervation-dependent antitumor effects. Finally, we observed that gene signatures of inflammation and the content of neuron-associated transcripts inversely correlated in human primary cutaneous melanomas, with the latter representing a negative prognostic marker of patient overall survival. Our results suggest that tumor-associated sensory neurons negatively regulate the development of protective antitumor immune responses within the TME, thereby defining a novel target for therapeutic intervention in the melanoma setting.


Asunto(s)
Melanoma , Neoplasias Cutáneas , Estructuras Linfoides Terciarias , Humanos , Inmunidad , Microambiente Tumoral
9.
Artículo en Inglés | MEDLINE | ID: mdl-32149652

RESUMEN

The opioid abuse epidemic represents a major public health threat to global populations. The role social media may play in facilitating illicit drug trade is largely unknown due to limited research. However, it is known that social media use among adults in the US is widespread, there is vast capability for online promotion of illegal drugs with delayed or limited deterrence of such messaging, and further, general commercial sale applications provide safeguards for transactions; however, they do not discriminate between legal and illegal sale transactions. These characteristics of the social media environment present challenges to surveillance which is needed for advancing knowledge of online drug markets and the role they play in the drug abuse and overdose deaths. In this paper, we present a computational framework developed to automatically detect illicit drug ads and communities of vendors. The SVM- and CNN- based methods for detecting illicit drug ads, and a matrix factorization based method for discovering overlapping communities have been extensively validated on the large dataset collected from Google+, Flickr and Tumblr. Pilot test results demonstrate that our computational methods can effectively identify illicit drug ads and detect vendor-community with accuracy. These methods hold promise to advance scientific knowledge surrounding the role social media may play in perpetuating the drug abuse epidemic.


Asunto(s)
Publicidad , Drogas Ilícitas , Medios de Comunicación Sociales , Humanos , Proyectos de Investigación
10.
J Comput Biol ; 28(11): 1113-1129, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34698508

RESUMEN

The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.


Asunto(s)
Análisis por Conglomerados , Biología Computacional/métodos , Brasil , Bases de Datos Genéticas , Entropía , Humanos , Método de Montecarlo , Sudáfrica , Reino Unido , Estados Unidos
11.
J Comput Biol ; 28(11): 1130-1141, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34698524

RESUMEN

This article presents a novel scalable character-based phylogeny algorithm for dense viral sequencing data called SPHERE (Scalable PHylogEny with REcurrent mutations). The algorithm is based on an evolutionary model where recurrent mutations are allowed, but backward mutations are prohibited. The algorithm creates rooted character-based phylogeny trees, wherein all leaves and internal nodes are labeled by observed taxa. We show that SPHERE phylogeny is more stable than Nextstrain's, and that it accurately infers known transmission links from the early pandemic. SPHERE is a fast algorithm that can process >200,000 sequences in <2 hours, which offers a compact phylogenetic visualization of Global Initiative on Sharing All Influenza Data (GISAID).


Asunto(s)
Mutación , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19/transmisión , COVID-19/virología , Bases de Datos Genéticas , Humanos
12.
Genome Biol ; 22(1): 249, 2021 08 26.
Artículo en Inglés | MEDLINE | ID: mdl-34446078

RESUMEN

Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Alineación de Secuencia , Genoma Humano , VIH/fisiología , Humanos , Metagenómica , Sulfitos
13.
Nucleic Acids Res ; 49(17): e102, 2021 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-34214168

RESUMEN

Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Infecciones por Virus ARN/diagnóstico , Virus ARN/genética , COVID-19/diagnóstico , COVID-19/virología , Frecuencia de los Genes , Infecciones por VIH/diagnóstico , Infecciones por VIH/virología , VIH-1/genética , Humanos , Mutación , Polimorfismo de Nucleótido Simple , Infecciones por Virus ARN/virología , Reproducibilidad de los Resultados , SARS-CoV-2/genética , Sensibilidad y Especificidad
14.
ArXiv ; 2021 Apr 28.
Artículo en Inglés | MEDLINE | ID: mdl-33948451

RESUMEN

More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encoded in these vast amounts of data requires substantial effort across the research and public health communities. Studies of SARS-CoV-2 genomes have been critical in tracking the spread of variants and understanding its epidemic dynamics, and may prove crucial for controlling future epidemics and alleviating significant public health burdens. Together, genomic data and bioinformatics methods enable broad-scale investigations of the spread of SARS-CoV-2 at the local, national, and global scales and allow researchers the ability to efficiently track the emergence of novel variants, reconstruct epidemic dynamics, and provide important insights into drug and vaccine development and disease control. Here, we discuss the tremendous opportunities that genomics offers to unlock the effective use of SARS-CoV-2 genomic data for efficient public health surveillance and guiding timely responses to COVID-19.

15.
Virus Evol ; 7(1): veaa103, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33505710

RESUMEN

Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays.

16.
BMC Genomics ; 21(Suppl 6): 405, 2020 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-33349236

RESUMEN

BACKGROUND: Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS: We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS: Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.


Asunto(s)
Genómica , Aprendizaje Automático , Algoritmos , Análisis por Conglomerados , Biología Computacional , Humanos , Cuasiespecies
17.
BMC Genomics ; 21(Suppl 5): 582, 2020 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-33327932

RESUMEN

BACKGROUND: RNA viruses mutate at extremely high rates, forming an intra-host viral population of closely related variants, which allows them to evade the host's immune system and makes them particularly dangerous. Viral outbreaks pose a significant threat for public health, and, in order to deal with it, it is critical to infer transmission clusters, i.e., decide whether two viral samples belong to the same outbreak. Next-generation sequencing (NGS) can significantly help in tackling outbreak-related problems. While NGS data is first obtained as short reads, existing methods rely on assembled sequences. This requires reconstruction of the entire viral population, which is complicated, error-prone and time-consuming. RESULTS: The experimental validation using sequencing data from HCV outbreaks shows that the proposed algorithm can successfully identify genetic relatedness between viral populations, infer transmission direction, transmission clusters and outbreak sources, as well as decide whether the source is present in the sequenced outbreak sample and identify it. CONCLUSIONS: Introduced algorithm allows to cluster genetically related samples, infer transmission directions and predict sources of outbreaks. Validation on experimental data demonstrated that algorithm is able to reconstruct various transmission characteristics. Advantage of the method is the ability to bypass cumbersome read assembly, thus eliminating the chance to introduce new errors, and saving processing time by allowing to use raw NGS reads.


Asunto(s)
Hepacivirus , Virus ARN , Algoritmos , Brotes de Enfermedades , Hepacivirus/genética , Secuenciación de Nucleótidos de Alto Rendimiento
18.
PLoS Comput Biol ; 16(11): e1008454, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33253159

RESUMEN

One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.


Asunto(s)
Mutación , Neoplasias/genética , Neoplasias/patología , Análisis de la Célula Individual/métodos , Algoritmos , Inestabilidad Genómica , Humanos
20.
Nat Commun ; 11(1): 3126, 2020 06 19.
Artículo en Inglés | MEDLINE | ID: mdl-32561710

RESUMEN

Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be cost-ineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.


Asunto(s)
Regiones Determinantes de Complementariedad/genética , Biología Computacional/métodos , RNA-Seq , Conjuntos de Datos como Asunto , Estudios de Factibilidad , Humanos , Receptores de Antígenos de Linfocitos B/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...