Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 15.367
Filtrar
1.
Bioresour Technol ; 291: 121890, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31378447

RESUMO

In recent impetus of phycological research, microalgae have emerged as a potential candidate for various arena of application-driven research. Omics-based tactics are used for disentangling the regulation and network integration for biosynthesis/degradation of metabolic precursors, intermediates, end products, and identifying the networks that regulate the metabolic flux. Multi-omics coupled with data analytics have facilitated understanding of biological processes and allow ample access to diverse metabolic pathways utilized for genetic manipulations making microalgal factories more efficient. The present review discusses state-of-art "Algomics" and the prospect of microalgae and their role in symbiotic association by using omics approaches including genomics, transcriptomics, proteomics and metabolomics. Microalgal based uni- and multi-omics approaches are critically analyzed in wastewater treatment, metal toxicity and remediation, biofuel production, and therapeutics to provide an imminent outlook for an array of environmentally sustainable and economically viable microalgal applications.


Assuntos
Microalgas/metabolismo , Animais , Genômica/métodos , Humanos , Metabolômica , Proteômica , Águas Residuárias/química
2.
Lancet ; 394(10198): 604-610, 2019 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-31395443

RESUMO

Human genomic sequencing has potential diagnostic, prognostic, and therapeutic value across a wide breadth of clinical disciplines. One barrier to widespread adoption is the paucity of evidence for improved outcomes in patients who do not already have an indication for more focused testing. In this Series paper, we review clinical outcome studies in genomic medicine and discuss the important features and key challenges to building evidence for next generation sequencing in the context of routine patient care.


Assuntos
Genômica/métodos , Medicina de Precisão/métodos , Testes Diagnósticos de Rotina , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Avaliação de Resultados da Assistência ao Paciente , Padrão de Cuidado
4.
Lancet ; 394(10197): 511-520, 2019 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-31395439

RESUMO

Advances in technologies for assessing genomic variation and an increasing understanding of the effects of genomic variants on health and disease are driving the transition of genomics from the research laboratory into clinical care. Genomic medicine, or the use of an individual's genomic information as part of their clinical care, is increasingly gaining acceptance in routine practice, including in assessing disease risk in individuals and their families, diagnosing rare and undiagnosed diseases, and improving drug safety and efficacy. We describe the major types and measurement tools of genomic variation that are currently of clinical importance, review approaches to interpreting genomic sequence variants, identify publicly available tools and resources for genomic test interpretation, and discuss several key barriers in using genomic information in routine clinical practice.


Assuntos
Genômica/métodos , Medicina de Precisão/métodos , Predisposição Genética para Doença , Humanos , Variantes Farmacogenômicos
5.
Implement Sci ; 14(1): 79, 2019 08 13.
Artigo em Inglês | MEDLINE | ID: mdl-31409417

RESUMO

BACKGROUND: Next-generation sequencing (NGS) is increasingly being translated into routine public health practice, affecting the surveillance and control of many pathogens. The purpose of this scoping review is to identify and characterize the recent literature concerning the application of bacterial pathogen genomics for public health practice and to assess the added value, challenges, and needs related to its implementation from an epidemiologist's perspective. METHODS: In this scoping review, a systematic PubMed search with forward and backward snowballing was performed to identify manuscripts in English published between January 2015 and September 2018. Included studies had to describe the application of NGS on bacterial isolates within a public health setting. The studied pathogen, year of publication, country, number of isolates, sampling fraction, setting, public health application, study aim, level of implementation, time orientation of the NGS analyses, and key findings were extracted from each study. Due to a large heterogeneity of settings, applications, pathogens, and study measurements, a descriptive narrative synthesis of the eligible studies was performed. RESULTS: Out of the 275 included articles, 164 were outbreak investigations, 70 focused on strategy-oriented surveillance, and 41 on control-oriented surveillance. Main applications included the use of whole-genome sequencing (WGS) data for (1) source tracing, (2) early outbreak detection, (3) unraveling transmission dynamics, (4) monitoring drug resistance, (5) detecting cross-border transmission events, (6) identifying the emergence of strains with enhanced virulence or zoonotic potential, and (7) assessing the impact of prevention and control programs. The superior resolution over conventional typing methods to infer transmission routes was reported as an added value, as well as the ability to simultaneously characterize the resistome and virulome of the studied pathogen. However, the full potential of pathogen genomics can only be reached through its integration with high-quality contextual data. CONCLUSIONS: For several pathogens, it is time for a shift from proof-of-concept studies to routine use of WGS during outbreak investigations and surveillance activities. However, some implementation challenges from the epidemiologist's perspective remain, such as data integration, quality of contextual data, sampling strategies, and meaningful interpretations. Interdisciplinary, inter-sectoral, and international collaborations are key for an appropriate genomics-informed surveillance.


Assuntos
Genoma Bacteriano , Genômica/métodos , Prática de Saúde Pública , Humanos , Sequenciamento Completo do Genoma
6.
Vet Clin North Am Small Anim Pract ; 49(5): 809-818, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31256903

RESUMO

We introduce a next phase in the evolution of medicine affecting human and veterinary patients. This evolution, genomic cancer medicine (Pmed), involves expansion of genomic and molecular biology into clinical medicine. The implementation of these new technologies has already begun and is a commercial reality. We introduce the underpinnings for this evolution, and focus on application in complex disease states. Pet owners have begun requesting Pmed technologies. To meet this demand, it is important to be aware of the opportunities and obstacles associated with available Pmed offerings as well as the current state of the field.


Assuntos
Doenças do Gato/genética , Doenças do Gato/terapia , Doenças do Cão/genética , Doenças do Cão/terapia , Neoplasias/veterinária , Medicina de Precisão/veterinária , Animais , Gatos , Cães , Predisposição Genética para Doença , Genômica/métodos , Hemangiossarcoma/genética , Hemangiossarcoma/terapia , Hemangiossarcoma/veterinária , Humanos , Neoplasias/genética , Neoplasias/terapia , Medicina de Precisão/métodos , Análise de Sequência , Medicina Veterinária/métodos
7.
BMC Bioinformatics ; 20(1): 371, 2019 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-31266441

RESUMO

BACKGROUND: The falling cost of next-generation sequencing technology has allowed deep sequencing across related species and of individuals within species. Whole genome assemblies from these data remain high time- and resource-consuming computational tasks, particularly if best solutions are sought using different assembly strategies and parameter sets. However, in many cases, the underlying research questions are not genome-wide but rather target specific genes or sets of genes. We describe a novel assembly tool, SRAssembler, that efficiently assembles only contigs containing potential homologs of a gene or protein query, thus enabling gene-specific genome studies over large numbers of short read samples. RESULTS: We demonstrate the functionality of SRAssembler with examples largely drawn from plant genomics. The workflow implements a recursive strategy by which relevant reads are successively pulled from the input sets based on overlapping significant matches, resulting in virtual chromosome walking. The typical workflow behavior is illustrated with assembly of simulated reads. Applications to real data show that SRAssembler produces homologous contigs of equivalent quality to whole genome assemblies. Settings can be chosen to not only assemble presumed orthologs but also paralogous gene loci in distinct contigs. A key application is assembly of the same locus in many individuals from population genome data, which provides assessment of structural variation beyond what can be inferred from read mapping to a reference genome alone. SRAssembler can be used on modest computing resources or used in parallel on high performance computing clusters (most easily by invoking a dedicated Singularity image). CONCLUSIONS: SRAssembler offers an efficient tool to complement whole genome assembly software. It can be used to solve gene-specific research questions based on large genomic read samples from multiple sources and would be an expedient choice when whole genome assembly from the reads is either not feasible, too costly, or unnecessary. The program can also aid decision making on the depth of sequencing in an ongoing novel genome sequencing project or with respect to ultimate whole genome assembly strategies.


Assuntos
Genômica/métodos , Software , Arabidopsis/genética , Loci Gênicos , Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
8.
Nat Commun ; 10(1): 2907, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31266958

RESUMO

Single-nucleus RNA-seq (snRNA-seq) enables the interrogation of cellular states in complex tissues that are challenging to dissociate or are frozen, and opens the way to human genetics studies, clinical trials, and precise cell atlases of large organs. However, such applications are currently limited by batch effects, processing, and costs. Here, we present an approach for multiplexing snRNA-seq, using sample-barcoded antibodies to uniquely label nuclei from distinct samples. Comparing human brain cortex samples profiled with or without hashing antibodies, we demonstrate that nucleus hashing does not significantly alter recovered profiles. We develop DemuxEM, a computational tool that detects inter-sample multiplets and assigns singlets to their sample of origin, and validate its accuracy using sex-specific gene expression, species-mixing and natural genetic variation. Our approach will facilitate tissue atlases of isogenic model organisms or from multiple biopsies or longitudinal samples of one donor, and large-scale perturbation screens.


Assuntos
Anticorpos/análise , Núcleo Celular/genética , Genômica/métodos , Análise de Célula Única/métodos , Idoso , Idoso de 80 Anos ou mais , Animais , Núcleo Celular/química , Núcleo Celular/metabolismo , DNA/genética , Feminino , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Neurônios/química , Neurônios/citologia , Neurônios/metabolismo , Córtex Pré-Frontal/química , Córtex Pré-Frontal/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
9.
Gene ; 714: 143984, 2019 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-31330237

RESUMO

Intrinsically disordered proteins (IDPs) are highly abundant in eukaryotic proteomes and involved in key biological and cellular processes. Although some resources of disordered protein predictions are available from animal and plant proteomes, those related to cereals are largely unknown. Here, we present an overview of IDPomes from Oryza sativa, Zea mays, Sorghum bicolor and Brachypodium distachyon. The work includes a comparative analysis with the model plant Arabidopsis thaliana. The data show that the intrinsic disorder content increases with the proteome size. Gene Ontology analysis reveals that IDPs in the studied species are involved mainly in regulation of cellular and metabolic processes and responses to stimulus. Our findings strongly suggest that higher plants may use common cellular and regulatory mechanisms for adaptation to various environmental constraints.


Assuntos
Grão Comestível/genética , Proteínas Intrinsicamente Desordenadas/genética , Adaptação Biológica/genética , Arabidopsis/genética , Brachypodium/genética , Ontologia Genética , Genômica/métodos , Oryza/genética , Proteínas de Plantas/genética , Proteoma/genética , Sorghum/genética , Zea mays/genética
10.
Genet Sel Evol ; 51(1): 38, 2019 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-31286857

RESUMO

BACKGROUND: Pig and poultry breeding programs aim at improving crossbred (CB) performance. Selection response may be suboptimal if only purebred (PB) performance is used to compute genomic estimated breeding values (GEBV) because the genetic correlation between PB and CB performance ([Formula: see text]) is often lower than 1. Thus, it may be beneficial to use information on both PB and CB performance. In addition, the accuracy of GEBV of PB animals for CB performance may improve when the breed-of-origin of alleles (BOA) is considered in the genomic relationship matrix (GRM). Thus, our aim was to compare scenarios where GEBV are computed and validated by using (1) either CB offspring averages or individual CB records for validation, (2) either a PB or CB reference population, and (3) a GRM that either accounts for or ignores BOA in the CB individuals. For this purpose, we used data on body weight measured at around 7 (BW7) or 35 (BW35) days in PB and CB broiler chickens and evaluated the accuracy of GEBV based on the correlation GEBV with phenotypes in the validation population (validation correlation). RESULTS: With validation on CB offspring averages, the validation correlation of GEBV of PB animals for CB performance was lower with a CB reference population than with a PB reference population for BW35 ([Formula: see text] = 0.96), and about equal for BW7 ([Formula: see text] = 0.80) when BOA was ignored. However, with validation on individual CB records, the validation correlation was higher with a CB reference population for both traits. The use of a GRM that took BOA into account increased the validation correlation for BW7 but reduced it for BW35. CONCLUSIONS: We argue that the benefit of using a CB reference population for genomic prediction of PB animals for CB performance should be assessed either by validation on CB offspring averages, or by validation on individual CB records while using a GRM that accounts for BOA in the CB individuals. With this recommendation in mind, our results show that the accuracy of GEBV of PB animals for CB performance was equal to or higher with a CB reference population than with a PB reference population for a trait with an [Formula: see text] of 0.8, but lower for a trait with an [Formula: see text] of 0.96. In addition, taking BOA into account was beneficial for a trait with an [Formula: see text] of 0.8 but not for a trait with an [Formula: see text] of 0.96.


Assuntos
Peso Corporal/genética , Cruzamento , Galinhas/genética , Genômica/métodos , Alelos , Animais , Feminino , Genótipo , Masculino , Fenótipo , Valores de Referência
11.
Nat Commun ; 10(1): 2756, 2019 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-31227702

RESUMO

Flight loss in birds is as characteristic of the class Aves as flight itself. Although morphological and physiological differences are recognized in flight-degenerate bird species, their contributions to recurrent flight degeneration events across modern birds and underlying genetic mechanisms remain unclear. Here, in an analysis of 295 million nucleotides from 48 bird genomes, we identify two convergent sites causing amino acid changes in ATGLSer321Gly and ACOT7Ala197Val in flight-degenerate birds, which to our knowledge have not previously been implicated in loss of flight. Functional assays suggest that Ser321Gly reduces lipid hydrolytic ability of ATGL, and Ala197Val enhances acyl-CoA hydrolytic activity of ACOT7. Modeling simulations suggest a switch of main energy sources from lipids to carbohydrates in flight-degenerate birds. Our results thus suggest that physiological convergence plays an important role in flight degeneration, and anatomical convergence often invoked may not.


Assuntos
Evolução Biológica , Aves/fisiologia , Metabolismo Energético/genética , Voo Animal/fisiologia , Genoma/genética , Animais , Metabolismo dos Carboidratos/fisiologia , Genômica/métodos , Lipase/genética , Lipase/metabolismo , Lipólise/fisiologia , Palmitoil-CoA Hidrolase/genética , Palmitoil-CoA Hidrolase/metabolismo , Filogenia
12.
Arch Virol ; 164(8): 2209-2213, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31161389

RESUMO

The complete genome of a double-stranded RNA (dsRNA) mycovirus, Phoma matteuccicola partitivirus 1 (PmPV1) was sequenced. It consists of two dsRNA segments, 1664 bp (dsRNA-1) and 1383 bp (dsRNA-2) in length, each containing a single open reading frame (ORF) potentially encoding a 46.78-kDa protein and a 40.92-kDa protein, respectively. dsRNA-1 encodes a putative polypeptide with a conserved RNA-dependent RNA polymerase (RdRp) domain that shows sequence similarity to the corresponding proteins of partitiviruses. The protein encoded by dsRNA-2 has no significant similarity to the typical coat proteins (CPs) of partitiviruses, but structure analysis nevertheless suggested that it might function as a coat protein. Purified viral particles of PmPV1 were isometric and approximately 29 nm in diameter. Phylogenetic analysis showed that PmPV1 is closely related to members of the genus Gammapartitivirus within the family Partitiviridae but forms a separate branch with Colletotrichum acutatum RNA virus 1 and Ustilaginoidea virens partitivirus 2. This is the first report of the full-length nucleotide sequence of a novel virus of the genus Gammapartitivirus infecting P. matteuccicola strain LG915, the causal agent of leaf blight of Curcuma wenyujin.


Assuntos
Ascomicetos/virologia , Micovírus/genética , Genoma Viral/genética , Sequência de Aminoácidos , Sequência de Bases , Proteínas do Capsídeo/genética , Curcuma/virologia , Genômica/métodos , Fases de Leitura Aberta/genética , Filogenia , Doenças das Plantas/virologia , RNA Replicase/genética , Vírus de RNA/genética , RNA de Cadeia Dupla/genética , RNA Viral/genética , Análise de Sequência de DNA/métodos
13.
Hum Genet ; 138(7): 739-748, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31154530

RESUMO

Metabolic syndrome is a complex human disorder characterized by a cluster of conditions (increased blood pressure, hyperglycemia, excessive body fat around the waist, and abnormal cholesterol or triglyceride levels). Any of these conditions increases the risk of serious disorders such as diabetes or cardiovascular disease. Currently, the degree of genetic regulation of this syndrome is under debate and partially unknown. The principal aim of this study was to estimate the genetic component and the common environmental effects in different populations using full pedigree and genomic information. We used three large populations (Gubbio, ARIC, and Ogliastra cohorts) to estimate the heritability of metabolic syndrome. Due to both pedigree and genotyped data, different approaches were applied to summarize relatedness conditions. Linear mixed models (LLM) using average information restricted maximum likelihood (AIREML) algorithm were applied to partition the variances and estimate heritability (h2) and common sib-household effect (c2). Globally, results obtained from pedigree information showed a significant heritability (h2: 0.286 and 0.271 in Gubbio and Ogliastra, respectively), whereas a lower, but still significant heritability was found using SNPs data ([Formula: see text]: 0.167 and 0.254 in ARIC and Ogliastra). The remaining heritability between h2 and [Formula: see text] ranged between 0.031 and 0.237. Finally, the common environmental c2 in Gubbio and Ogliastra were also significant accounting for about 11% of the phenotypic variance. Availability of different kinds of populations and data helped us to better understand what happened when heritability of metabolic syndrome is estimated and account for different possible confounding. Furthermore, the opportunity of comparing different results provided more precise and less biased estimation of heritability.


Assuntos
Predisposição Genética para Doença , Genética Populacional/métodos , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica/métodos , Síndrome Metabólica/genética , Polimorfismo de Nucleotídeo Único , Estudos de Coortes , Feminino , Genótipo , Humanos , Masculino , Modelos Genéticos , Linhagem
14.
Hum Genet ; 138(7): 691-701, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31161416

RESUMO

Most genotype-phenotype studies have historically lacked population diversity, impacting the generalizability of findings and thereby limiting the ability to equitably implement precision medicine. This well-documented problem has generated much interest in the ascertainment of new cohorts with an emphasis on multiple dimensions of diversity, including race/ethnicity, gender, age, socioeconomic status, disability, and geography. The most well known of these new cohort efforts is arguably All of Us, formerly known as the Precision Medicine Cohort Initiative Program. All of Us intends to ascertain at least one million participants in the United States representative of the multiple dimensions of diversity. As an incentive to participate, All of Us is offering the return of research results, including whole genome sequencing data, as well as the opportunity to contribute to the scientific process as non-scientists. The scale and scope of the proposed return of research results are unprecedented. Here, we briefly review possible return of genetic data models, including the likely data file formats and modes of data transfer or access. We also review the resources required to access and interpret the genetic or genomic data once received by the average participant, highlighting the nuanced anticipated barriers that will challenge both the digitally, computationally literate and illiterate participant alike. This inventory of resources required to receive, process, and interpret return of research results exposes the potential for access disparities and warns the scientific community to mind the gap so that all participants have equal access and understanding of the benefits of human genetic research.


Assuntos
Interpretação Estatística de Dados , Mineração de Dados/normas , Pesquisa em Genética , Genoma Humano , Genômica/métodos , Medicina de Precisão , Humanos , Estados Unidos
15.
BMC Bioinformatics ; 20(1): 338, 2019 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-31208327

RESUMO

BACKGROUND: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale. RESULTS: We present OCELOT, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as OCELOT), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning.


Assuntos
Genoma Fúngico , Genômica/métodos , Anotação de Sequência Molecular , Proteínas/genética , Saccharomyces cerevisiae/genética , Algoritmos , Tomada de Decisões , Ontologia Genética
16.
Medicine (Baltimore) ; 98(23): e15871, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31169691

RESUMO

To evaluate the ability of a radiomics signature based on 3T dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) to distinguish between low and non-low Oncotype DX (OD) risk groups in estrogen receptor (ER)-positive invasive breast cancers.Between May 2011 and March 2016, 67 women with ER-positive invasive breast cancer who performed preoperative 3T MRI and OD assay were included. We divided the patients into low (OD recurrence score [RS] <18) and non-low risk (RS ≥18) groups. Extracted radiomics features included 8 morphological, 76 histogram-based, and 72 higher-order texture features. A radiomics signature (Rad-score) was generated using the least absolute shrinkage and selection operator (LASSO). Univariate and multivariate logistic regression analyses were performed to investigate the association between clinicopathologic factors, MRI findings, and the Rad-score with OD risk groups, and the areas under the receiver operating characteristic curves (AUC) were used to assess classification performance of the Rad-score.The Rad-score was constructed for each tumor by extracting 10 (6.3%) from 158 radiomics features. A higher Rad-score (odds ratio [OR], 65.209; P <.001), Ki-67 expression (OR, 17.462; P = .007), and high p53 (OR = 8.449; P = .077) were associated with non-low OD risk. The Rad-score classified low and non-low OD risk with an AUC of 0.759.The Rad-score showed the potential for discrimination between low and non-low OD risk groups in patients with ER-positive invasive breast cancers.


Assuntos
Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética , Genômica/métodos , Processamento de Imagem Assistida por Computador/métodos , Imagem por Ressonância Magnética/métodos , Receptores Estrogênicos/biossíntese , Adulto , Neoplasias da Mama/patologia , Meios de Contraste/administração & dosagem , Feminino , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Humanos , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/genética , Recidiva Local de Neoplasia/patologia , Compostos Organometálicos/administração & dosagem , Curva ROC , Reprodutibilidade dos Testes , Medição de Risco , Sensibilidade e Especificidade
17.
BMC Bioinformatics ; 20(1): 324, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31195961

RESUMO

BACKGROUND: As DNA sequencing technologies are improving and getting cheaper, genomic data can be utilized for diagnosis of many diseases such as cancer. Human raw genome data is huge in size for computational systems. Therefore, there is a need for a compact and accurate representation of the valuable information in DNA. The occurrence of complex genetic disorders often results from multiple gene mutations. The effect of each mutation is not equal for the development of a disease. Inspired from the field of information retrieval, we propose using the term frequency (tf) and BM25 term weighting measures with the inverse document frequency (idf) and relevance frequency (rf) measures to weight genes based on their mutations. The underlying assumption is that the more mutations a gene has in patients with a certain disease and the less mutations it has in other patients, the more discriminative that gene is. RESULTS: We evaluated the proposed representations on the task of cancer type classification. We applied various machine learning techniques using the tf-idf and tf-rf schemes and their BM25 versions. Our results show that the BM25-tf-rf representation leads to improved classification accuracy and f-score values compared to the other representations. The highest accuracy (76.44%) and f-score (76.95%) are achieved with the BM25-tf-rf based data representation. CONCLUSIONS: As a result of our experiments, the BM25-tf-rf scheme and the proposed neural network model is shown to be the best performing classification system for our case study of cancer type classification. This system is further utilized for causal gene analysis. Examples from the most effective genes that are used for decision making are found to be in the literature as target or causal genes.


Assuntos
Genômica/métodos , Modelos Genéticos , Modelos Estatísticos , Mutação/genética , Bases de Dados Genéticas , Éxons/genética , Humanos , Íntrons/genética , Aprendizado de Máquina , Neoplasias/genética , Redes Neurais (Computação)
18.
BMC Bioinformatics ; 20(1): 325, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31196002

RESUMO

BACKGROUND: Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization. RESULTS: We apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples. CONCLUSIONS: The flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development.


Assuntos
Algoritmos , Genômica/métodos , Medicina de Precisão , Área Sob a Curva , Carcinoma Pulmonar de Células não Pequenas/genética , Bases de Dados Genéticas , Humanos , Neoplasias Pulmonares/genética , Aprendizado de Máquina , Masculino , Neoplasias da Próstata/genética , Análise de Sobrevida
19.
Nat Genet ; 51(6): 1052-1059, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31152161

RESUMO

Maize is one of the most important crops globally, and it shows remarkable genetic diversity. Knowledge of this diversity could help in crop improvement; however, gold-standard genomes have been elucidated only for modern temperate varieties. Here, we present a high-quality reference genome (contig N50 of 15.78 megabases) of the maize small-kernel inbred line, which is derived from a tropical landrace. Using haplotype maps derived from B73, Mo17 and SK, we identified 80,614 polymorphic structural variants across 521 diverse lines. Approximately 22% of these variants could not be detected by traditional single-nucleotide-polymorphism-based approaches, and some of them could affect gene expression and trait performance. To illustrate the utility of the diverse SK line, we used it to perform map-based cloning of a major effect quantitative trait locus controlling kernel weight-a key trait selected during maize improvement. The underlying candidate gene ZmBARELY ANY MERISTEM1d provides a target for increasing crop yields.


Assuntos
Estudos de Associação Genética , Genoma de Planta , Genômica , Fenótipo , Zea mays/genética , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Anotação de Sequência Molecular , Melhoramento Vegetal , Plantas Geneticamente Modificadas , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
20.
Genome Biol ; 20(1): 117, 2019 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-31159850

RESUMO

BACKGROUND: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.


Assuntos
Variação Estrutural do Genoma , Genômica/métodos , Sequenciamento Completo do Genoma , Algoritmos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA