Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49.410
Filtrar
2.
BMC Bioinformatics ; 22(1): 5, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407064

RESUMEN

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) enables the possibility of many in-depth transcriptomic analyses at a single-cell resolution. It's already widely used for exploring the dynamic development process of life, studying the gene regulation mechanism, and discovering new cell types. However, the low RNA capture rate, which cause highly sparse expression with dropout, makes it difficult to do downstream analyses. RESULTS: We propose a new method SCC to impute the dropouts of scRNA-seq data. Experiment results show that SCC gives competitive results compared to two existing methods while showing superiority in reducing the intra-class distance of cells and improving the clustering accuracy in both simulation and real data. CONCLUSIONS: SCC is an effective tool to resolve the dropout noise in scRNA-seq data. The code is freely accessible at https://github.com/nwpuzhengyan/SCC .


Asunto(s)
Perfilación de la Expresión Génica/métodos , ARN Citoplasmático Pequeño/genética , Análisis de la Célula Individual/métodos , Regulación de la Expresión Génica/genética , Genómica/métodos , Modelos Genéticos
3.
BMC Bioinformatics ; 22(1): 2, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407065

RESUMEN

BACKGROUND: The relentless continuing emergence of new genomic sequencing protocols and the resulting generation of ever larger datasets continue to challenge the meaningful summarization and visualization of the underlying signal generated to answer important qualitative and quantitative biological questions. As a result, the need for novel software able to reliably produce quick, comprehensive, and easily repeatable genomic signal visualizations in a user-friendly manner is rapidly re-emerging. RESULTS: recoup is a Bioconductor package for quick, flexible, versatile, and accurate visualization of genomic coverage profiles generated from Next Generation Sequencing data. Coupled with a database of precalculated genomic regions for multiple organisms, recoup offers processing mechanisms for quick, efficient, and multi-level data interrogation with minimal effort, while at the same time creating publication-quality visualizations. Special focus is given on plot reusability, reproducibility, and real-time exploration and formatting options, operations rarely supported in similar visualization tools in a profound way. recoup was assessed using several qualitative user metrics and found to balance the tradeoff between important package features, including speed, visualization quality, overall friendliness, and the reusability of the results with minimal additional calculations. CONCLUSION: While some existing solutions for the comprehensive visualization of NGS data signal offer satisfying results, they are often compromised regarding issues such as effortless tracking of processing and preparation steps under a common computational environment, visualization quality and user friendliness. recoup is a unique package presenting a balanced tradeoff for a combination of assessment criteria while remaining fast and friendly.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Procesamiento de Imagen Asistido por Computador/métodos , Programas Informáticos , Visualización de Datos , Procesamiento de Señales Asistido por Computador
4.
BMC Bioinformatics ; 22(1): 12, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407074

RESUMEN

BACKGROUND: Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. RESULTS: This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package "mixIndependR" calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy-Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. CONCLUSION: The package "mixIndependR" is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package "mixIndependR" makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. AVAILABILITY: The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html .


Asunto(s)
Sitios Genéticos/genética , Genómica/métodos , Programas Informáticos , Bases de Datos Genéticas , Genotipo , Desequilibrio de Ligamiento/genética
5.
BMC Bioinformatics ; 22(1): 11, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407081

RESUMEN

BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user. RESULTS: Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator . CONCLUSIONS: We demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.


Asunto(s)
Genoma Microbiano/genética , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Escherichia coli/genética
6.
BMC Bioinformatics ; 22(1): 9, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407090

RESUMEN

BACKGROUND: Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding. RESULTS: Here we illustrate a new method, which we call HapSolo, that identifies secondary contigs and defines a primary assembly based on multiple pairwise contig alignment metrics. HapSolo evaluates candidate primary assemblies using BUSCO scores and then distinguishes among candidate assemblies using a cost function. The cost function can be defined by the user but by default considers the number of missing, duplicated and single BUSCO genes within the assembly. HapSolo performs hill climbing to minimize cost over thousands of candidate assemblies. We illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape (Vitis vinifera), with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the Thorny Skate (Amblyraja radiata; 2650 Mb). CONCLUSIONS: HapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~ 84%. The improvements for the mosquito's largest three scaffolds, representing the number of chromosomes, were from 61 to 86%, and the improvement was even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on PurgeDups for identifying secondary contigs, with generally superior results for HapSolo.


Asunto(s)
Mapeo Cromosómico/métodos , Diploidia , Genoma/genética , Genómica/métodos , Animales , Anopheles/genética , Genes de Insecto/genética , Programas Informáticos
7.
Ecotoxicol Environ Saf ; 208: 111709, 2021 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-33396040

RESUMEN

A novel phenol-degrading strain was isolated and identified as Rhodococcus ruber C1. The degradation analysis shows that 1806 mg/L of phenol can be completely degraded by strain C1 within 38 h, and the maximum specific growth rate (µmax=1.527 h-1) and maximum specific phenol degradation rate (qmax=3.674 h-1) indicate its excellent phenol metabolism capability. More importantly, phenol can be degraded by strain C1 in the temperature range of 20-45 °C within 72 h, and with longer degradation time, phenol can be completely degraded even at 10, 15 and 50 °C. The whole genome of strain C1 was sequenced, and a comparative genome analysis of strain C1 with 36 other genomes of Rhodococcus was performed. A remarkable gene family expansion occurred during the evolution of Rhodococcus, and a comprehensive evolutionary picture of Rhodococcus at genomic level was presented. Moreover, the copy number of genes involved in phenol metabolism was compared among genus Rhodococcus, and the results demonstrate high phenol degradation capability of strain C1 at genomic level. These findings suggest that Rhodococcus ruber C1 is a bacterium capable of degrading phenol efficiently in the temperature range of 10-50 °C.


Asunto(s)
Genoma Bacteriano/genética , Fenol/metabolismo , Rhodococcus/genética , Rhodococcus/metabolismo , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Biodegradación Ambiental , Dosificación de Gen , Genómica , Fenoles/metabolismo , Rhodococcus/clasificación , Temperatura
8.
Gan To Kagaku Ryoho ; 48(1): 12-16, 2021 Jan.
Artículo en Japonés | MEDLINE | ID: mdl-33468715

RESUMEN

Based on the results of comprehensive cancer genomic profiling(CGP), only about 10% of cancer patients can be candidates for anticancer drugs that match genetic abnormalities. In the current situation in Japan, clinical trial(BELIEVE trial)is on-going to provide patients with medical treatment opportunities based on the results of CGP test. In this study, patients can try an off-label drug under the Patient-Proposed Healthcare Services who are not eligible for clinical trials such as trials for drug approval, or Advanced Medical Treatment B. More than 40 patients have already participated in the BELIEVE trial. In this section, I will discuss the current status of the BELIEVE trial, challenges in multidisciplinary collaboration, and future prospects.


Asunto(s)
Terapia Molecular Dirigida , Uso Fuera de lo Indicado , Prestación de Atención de Salud , Genómica , Humanos , Japón , Estudios Prospectivos
9.
Braz J Med Biol Res ; 54(3): e9571, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33470396

RESUMEN

Cancer cell lines are widely used as in vitro models of tumorigenesis, facilitating fundamental discoveries in cancer biology and translational medicine. Currently, there are few options for glioblastoma (GBM) treatment and limited in vitro models with accurate genomic and transcriptomic characterization. Here, a detailed characterization of a new GBM cell line, namely AHOL1, was conducted in order to fully characterize its molecular composition based on its karyotype, copy number alteration (CNA), and transcriptome profiling, followed by the validation of key elements associated with GBM tumorigenesis. Large numbers of CNAs and differentially expressed genes (DEGs) were identified. CNAs were distributed throughout the genome, including gains at Xq11.1-q28, Xp22.33-p11.1, Xq21.1-q21.33, 4p15.1-p14, 8q23.2-q23.3 and losses at Yq11.21-q12, Yp11.31-p11.2, and 15q11.1-q11.2 positions. Nine druggable genes were identified, including HCRTR2, ETV1, PTPRD, PRKX, STS, RPS6KA6, ZFY, USP9Y, and KDM5D. By integrating DEGs and CNAs, we identified 57 overlapping genes enriched in fourteen pathways. Altered expression of several cancer-related candidates found in the DEGs-CNA dataset was confirmed by RT-qPCR. Taken together, this first comprehensive genomic and transcriptomic landscape of AHOL1 provides unique resources for further studies and identifies several druggable targets that may be useful for therapeutics and biologic and molecular investigation of GBM.


Asunto(s)
Glioblastoma , Línea Celular Tumoral , Regulación Neoplásica de la Expresión Génica , Genoma , Genómica , Glioblastoma/genética , Histona Demetilasas , Humanos , Antígenos de Histocompatibilidad Menor , Transcriptoma
10.
Gan To Kagaku Ryoho ; 48(1): 7-11, 2021 Jan.
Artículo en Japonés | MEDLINE | ID: mdl-33468714

RESUMEN

In June 2019, 2 comprehensive cancer genome profiling(CGP)tests were approved with reimbursement, and are now available at designated hospitals stratified to 3 layers on the basis of their roles. The reimbursement-approved CGP tests were restricted to patients with solid tumors that have progressed on standard chemotherapy, rare tumors, or tumors of unknown primary, and perform primary structure analysis of cancer genome on several hundred genes at a time using next generation sequencer. In tumor molecular board, appropriate treatments were recommended based on the interpretation made for results of CGP. Because 2 CGP tests differ functionally in terms of the sample requirements, the target gene sets, and items to be reported, results need to be evaluated carefully. Although the detection rate of genomic alterations in CGP tests is high, the number of cases lead to treatments consistent with genomic alterations is limited. Improving this ratio will be the key for Japanese precision oncology to meet the full potential of the CGP tests.


Asunto(s)
Neoplasias , Genómica , Humanos , Oncología Médica , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Medicina de Precisión
11.
Gan To Kagaku Ryoho ; 48(1): 17-21, 2021 Jan.
Artículo en Japonés | MEDLINE | ID: mdl-33468716

RESUMEN

"Counseling on Cancer Genomic Medicine" was added to the list of roles of cancer consultation centers located in Designated Cancer Hospitals under on the Third Cancer Control Act established in 2018. Although cancer consultation centers do not make decisions on whether to conduct genetic testing, it has been revealed that a certain number of such consultations are taking place. Consultations on genetic panel testing are expected to further increase in the future. In order to accommodate this need, individual cancer consultation staff needs to provide services based on the principles of consultation and support. For example, they must have adequate knowledge on the characteristics and limitations of genetic panel testing, understand the true needs of patients and their families, enable such individuals to understand the relevant information, and collaborate with patients and their families to consider the course forward. Furthermore, in order to ensure the quality of individual support, physicians and genetic counselors are expected to contribute by participating in organization-building between genetic counselors, genomic medicine coordinators, and other experts in and outside of hospitals. It is also anticipated that networks will be formed with nearby external institutions and organization, such as Designated Core Hospitals for Cancer Genomic Medicine, Designated Core Hospitals, and Medical Cooperation Hospitals.


Asunto(s)
Pruebas Genéticas , Neoplasias , Instituciones Oncológicas , Genómica , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Neoplasias/terapia , Derivación y Consulta
12.
Gan To Kagaku Ryoho ; 48(1): 22-25, 2021 Jan.
Artículo en Japonés | MEDLINE | ID: mdl-33468717

RESUMEN

Genome information has been utilized for the determination of cancer therapy, where some of the hereditary cancer chance to be diagnosed by the analysis of germline variants. The more genomic technology has advanced, the more interpretation of genomic information has been complicated. It is required to enhance the medical network to provide strong support and to determine the appropriate medicine for patients. In this paper, we will discuss the essential roles of genomic counselors in the age of genomic medicine.


Asunto(s)
Consejeros , Neoplasias , Asesoramiento Genético , Pruebas Genéticas , Genómica , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Neoplasias/terapia
13.
Bioresour Technol ; 319: 124117, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-32979594

RESUMEN

Azo dyes pose hazards to ecosystems and human health and the cosubstrate strategy has become the focus for the bioremediation of azo dyes. Herein, Brilliant Crocein (BC), a model pollutant, was biodegraded by Providencia rettgeri domesticated from activated sludge. Additional ethanol, as a cosubstrate, could accelerate P. rettgeri growth and BC biodegradation, as reflected by the Gompertz models. This phenomenon was attributed to the smaller metabolites and greater number of potential pathways observed under the synergistic effect of ethanol. Genomic analysis of P. rettgeri showed that functional genes related to azo bond cleavage, redox reactions, ring opening and hydrolysis played crucial roles in azo dye biodegradation. Furthermore, the mechanism proposed was that ethanol might stimulate the production of additional reducing power via the expression of related genes, leading to the cleavage of azo bonds and aromatic rings. However, biodegradation without ethanol could only partly cleave the azo bonds.


Asunto(s)
Etanol , Providencia , Compuestos Azo , Biodegradación Ambiental , Colorantes , Ecosistema , Genómica , Humanos , Cinética , Providencia/genética
14.
Crit Rev Food Sci Nutr ; 61(1): 75-84, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31997650

RESUMEN

Rice bran is an invaluable by-product of paddy processing industry. It is rich in minerals, protein, lipids, and crude fiber. In addition, it also possesses compounds with anti-oxidant, anti-allergic, anti-diabetic, and anti-cancer properties. It forms a basis for the extraction of rice bran oil and preparation of various functional foods with health benefits and potential to prevent chronic health issues. Nevertheless, the rapid deterioration of bran upon storage acts as a major limitation in exploiting the full potential of rice bran. In this review, we have discussed three strategies to address rapid rancidity of rice bran and enhance its shelf life and storability vis-a-vis emphasizing the importance of rice bran in terms of its nutritional composition. One strategy is through exploitation of the null mutations in the genes governing lipases and lipoxygenases leading to nonfunctional enzymes (enzyme deficient approach), another strategy is through reducing the PUFA content that is more prone to oxidation (substrate deficient approach) and a third strategy is through enhancing the antioxidant content that effectively terminate the lipid peroxidation by donating the hydrogen atom.


Asunto(s)
Oryza , Antioxidantes , Genómica , Lipasa , Oryza/genética , Aceite de Salvado de Arroz
15.
Gene ; 767: 145268, 2021 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-33157201

RESUMEN

A key phenotypic characteristic of the Gram-positive bacterial pathogen, Staphylococcus aureus, is its ability to grow in low aw environments. A homology transfer based approach, using the well characterised osmotic stress response systems of Bacillus subtilis and Escherichia coli, was used to identify putative osmotolerance loci in Staphylococcus aureus ST772-MRSA-V. A total of 17 distinct putative hyper and hypo-osmotic stress response systems, comprising 78 genes, were identified. The ST772-MRSA-V genome exhibits significant degeneracy in terms of the osmotic stress response; with three copies of opuD, two copies each of nhaK and mrp/mnh, and five copies of opp. Furthermore, regulation of osmotolerance in ST772-MRSA-V appears to be mediated at the transcriptional, translational, and post-translational levels.


Asunto(s)
Osmorregulación/genética , Staphylococcus aureus/genética , Proteínas Bacterianas/genética , Simulación por Computador , Genoma Bacteriano/genética , Genómica/métodos , Staphylococcus aureus Resistente a Meticilina/genética , Infecciones Estafilocócicas/genética
16.
Pest Manag Sci ; 77(1): 43-54, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-32815250

RESUMEN

Amaranthus tuberculatus is the major weed species in many midwestern US row-crop production fields, and it is among the most problematic weeds in the world in terms of its ability to evolve herbicide resistance. It has now evolved resistance to herbicides spanning seven unique sites of action, with populations and even individual plants often possessing resistance to several herbicides/herbicide groups. Historically, herbicide target-site changes accounted for most of the known resistance mechanisms in this weed; however, over the last few years, non-target-site mechanisms, particularly enhanced herbicide detoxification, have become extremely common in A. tuberculatus. Unravelling the genetics and molecular details of non-target-site resistance mechanisms, understanding the extent to which they confer cross resistance to other herbicides, and understanding how they evolve remain as critical research endeavors. Transcriptomic and genomics approaches are already facilitating such studies, the results of which hopefully will inform better resistance-mitigation strategies. The largely unprecedented level of herbicide resistance in A. tuberculatus is not only a fascinating example of evolution in action, but it is a serious and growing threat to the sustainability of midwestern US cropping systems. © 2020 Society of Chemical Industry.


Asunto(s)
Amaranthus , Herbicidas , Amaranthus/genética , Genómica , Resistencia a los Herbicidas/genética , Herbicidas/farmacología , Malezas/genética
17.
Crit Rev Oncol Hematol ; 157: 103191, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33309572

RESUMEN

The development of cyclin-dependent kinases (CDK) 4 and 6 inhibitors represented a substantial breakthrough in the treatment of estrogen receptor positive (ER+), human epidermal growth factor receptor 2 (HER2) negative metastatic breast cancer. These drugs showed a significant clinical benefit in pivotal clinical trials. However, resistance eventually occurs, leading to disease progression. Next Generation Sequencing methodologies have been employed to investigate predictive biomarkers of response or resistance to CDK4/6 inhibitors. Whole exome and targeted sequencing of solid and liquid biopsies have revealed several possible genomic alterations associated with resistance. Notably, genomic alterations identified by DNA-sequencing did not fully recapitulate the entire landscape of resistance to CDK4/6 inhibitors. Gene expression analysis, such as RNA-Seq methodologies, have provided insights into transcriptional profiles and may need further application. Herein, we report the main findings derived from the use of NGS analysis in the context of resistance to CDK4/6 inhibitors in ER + breast cancer.


Asunto(s)
Neoplasias de la Mama , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Quinasa 4 Dependiente de la Ciclina/genética , Quinasa 6 Dependiente de la Ciclina/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Proteínas Quinasas/uso terapéutico
18.
BMC Bioinformatics ; 21(Suppl 21): 562, 2020 Dec 28.
Artículo en Inglés | MEDLINE | ID: mdl-33371881

RESUMEN

BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). RESULTS: Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. CONCLUSIONS: Our results highlight the value of studying a gene's distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort.


Asunto(s)
Interpretación Estadística de Datos , Perfilación de la Expresión Génica , Neoplasias/genética , Biomarcadores de Tumor/genética , Genómica , Humanos , Masculino , Persona de Mediana Edad , Neoplasias/diagnóstico , Pronóstico
19.
BMC Bioinformatics ; 21(Suppl 21): 542, 2020 Dec 28.
Artículo en Inglés | MEDLINE | ID: mdl-33371889

RESUMEN

BACKGROUND: Short tandem repeat (STR), or "microsatellite", is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. RESULTS: In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. CONCLUSIONS: In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM .


Asunto(s)
Genómica , Repeticiones de Microsatélite/genética , Secuenciación Completa del Genoma , Humanos , Masculino , Ataxias Espinocerebelosas/genética
20.
Genome Med ; 12(1): 115, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-33371892

RESUMEN

The identification of genetic variation that directly impacts infection susceptibility to SARS-CoV-2 and disease severity of COVID-19 is an important step towards risk stratification, personalized treatment plans, therapeutic, and vaccine development and deployment. Given the importance of study design in infectious disease genetic epidemiology, we use simulation and draw on current estimates of exposure, infectivity, and test accuracy of COVID-19 to demonstrate the feasibility of detecting host genetic factors associated with susceptibility and severity in published COVID-19 study designs. We demonstrate that limited phenotypic data and exposure/infection information in the early stages of the pandemic significantly impact the ability to detect most genetic variants with moderate effect sizes, especially when studying susceptibility to SARS-CoV-2 infection. Our insights can aid in the interpretation of genetic findings emerging in the literature and guide the design of future host genetic studies.


Asunto(s)
/epidemiología , Estudios de Casos y Controles , Genómica/métodos , Pandemias , Proyectos de Investigación , /genética , Simulación por Computador , Factores de Confusión Epidemiológicos , Exposoma , Reacciones Falso Negativas , Predisposición Genética a la Enfermedad , Variación Genética , Interacciones Huésped-Patógeno/genética , Humanos , Proyectos de Investigación/estadística & datos numéricos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Riesgo , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA