Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Exp Bot ; 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38954539

RESUMO

Linear mixed models (LMMs) are a commonly used method for genome-wide association studies (GWAS) that aim to detect associations between genetic markers and phenotypic measurements in a population of individuals while accounting for population structure and cryptic relatedness. In a standard GWAS, hundreds of thousands to millions of statistical tests are performed, requiring control for multiple hypothesis testing. Typically, static corrections that penalize the number of tests performed are used to control for the family-wise error rate, which is the probability of making at least one false positive. However, it has been shown that in practice this threshold is too conservative for normally distributed phenotypes and not stringent enough for non-normally distributed phenotypes. Therefore, permutation-based LMM approaches have recently been proposed to provide a more realistic threshold that takes phenotypic distributions into account. In this work, we will discuss the advantages of permutation-based GWAS approaches, including new simulations and results from a re-analysis of all publicly available Arabidopsis thaliana phenotypes from the AraPheno database.

2.
Brief Bioinform ; 22(1): 178-193, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31848574

RESUMO

Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).


Assuntos
Metagenômica/métodos , Microbiota/genética , Guias de Prática Clínica como Assunto , Animais , Código de Barras de DNA Taxonômico/métodos , Código de Barras de DNA Taxonômico/normas , Humanos , Metagenômica/normas , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
3.
Bioinformatics ; 38(Suppl_2): ii5-ii12, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124808

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) are an integral tool for studying the architecture of complex genotype and phenotype relationships. Linear mixed models (LMMs) are commonly used to detect associations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of the residuals and that the genetic markers are independent and identically distributed-both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice, they are rarely implemented due to the high computational complexity. RESULTS: We propose permGWAS, an efficient LMM reformulation based on 4D tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than 8 days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results. AVAILABILITY AND IMPLEMENTATION: permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Marcadores Genéticos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Modelos Lineares , Fenótipo
4.
Health Care Manag Sci ; 26(4): 785-806, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38015289

RESUMO

Assigning inpatients to hospital beds impacts patient satisfaction and the workload of nurses and doctors. The assignment is subject to unknown inpatient arrivals, in particular for emergency patients. Hospitals, therefore, need to deal with uncertainty on actual bed requirements and potential shortage situations as bed capacities are limited. This paper develops a model and solution approach for solving the patient bed-assignment problem that is based on a machine learning (ML) approach to forecasting emergency patients. First, it contributes by improving the anticipation of emergency patients using ML approaches, incorporating weather data, time and dates, important local and regional events, as well as current and historical occupancy levels. Drawing on real-life data from a large case hospital, we were able to improve forecasting accuracy for emergency inpatient arrivals. We achieved up to 17% better root mean square error (RMSE) when using ML methods compared to a baseline approach relying on averages for historical arrival rates. We further show that the ML methods outperform time series forecasts. Second, we develop a new hyper-heuristic for solving real-life problem instances based on the pilot method and a specialized greedy look-ahead (GLA) heuristic. When applying the hyper-heuristic in test sets we were able to increase the objective function by up to 5.3% in comparison to the benchmark approach in [40]. A benchmark with a Genetic Algorithm shows also the superiority of the hyper-heuristic. Third, the combination of ML for emergency patient admission forecasting with advanced optimization through the hyper-heuristic allowed us to obtain an improvement of up to 3.3% on a real-life problem.


Assuntos
Serviço Hospitalar de Emergência , Hospitalização , Humanos , Hospitais , Admissão do Paciente , Aprendizado de Máquina
5.
Bioinformatics ; 37(1): 57-65, 2021 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-32573681

RESUMO

MOTIVATION: Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions. RESULTS: We propose a novel method SiNIMin that addresses these two aspects by finding pairs of interacting genes that are, upon combination, associated with a phenotype of interest under a model of genetic heterogeneity. We guide the interaction search using biological prior knowledge in the form of protein-protein interaction networks. Our method controls type I error and outperforms state-of-the-art methods with respect to statistical power. Additionally, we find novel associations for multiple Arabidopsis thaliana phenotypes, and, with an adapted variant of SiNIMin, for a study of rare variants in migraine patients. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/SiNIMin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Heterogeneidade Genética , Mapas de Interação de Proteínas , Loci Gênicos , Humanos , Fenótipo , Software
6.
Nucleic Acids Res ; 48(D1): D1063-D1068, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31642487

RESUMO

Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.


Assuntos
Arabidopsis/genética , Biologia Computacional , Bases de Dados Genéticas , Genoma de Planta , Estudo de Associação Genômica Ampla , Fenótipo , Biologia Computacional/métodos , Técnicas de Inativação de Genes , Estudo de Associação Genômica Ampla/métodos , Genótipo , Mutação , Locos de Características Quantitativas , Característica Quantitativa Herdável , Análise de Sequência de RNA , Navegador
7.
PLoS Genet ; 14(2): e1007155, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29432421

RESUMO

By following the evolution of populations that are initially genetically homogeneous, much can be learned about core biological principles. For example, it allows for detailed studies of the rate of emergence of de novo mutations and their change in frequency due to drift and selection. Unfortunately, in multicellular organisms with generation times of months or years, it is difficult to set up and carry out such experiments over many generations. An alternative is provided by "natural evolution experiments" that started from colonizations or invasions of new habitats by selfing lineages. With limited or missing gene flow from other lineages, new mutations and their effects can be easily detected. North America has been colonized in historic times by the plant Arabidopsis thaliana, and although multiple intercrossing lineages are found today, many of the individuals belong to a single lineage, HPG1. To determine in this lineage the rate of substitutions-the subset of mutations that survived natural selection and drift-, we have sequenced genomes from plants collected between 1863 and 2006. We identified 73 modern and 27 herbarium specimens that belonged to HPG1. Using the estimated substitution rate, we infer that the last common HPG1 ancestor lived in the early 17th century, when it was most likely introduced by chance from Europe. Mutations in coding regions are depleted in frequency compared to those in other portions of the genome, consistent with purifying selection. Nevertheless, a handful of mutations is found at high frequency in present-day populations. We link these to detectable phenotypic variance in traits of known ecological importance, life history and growth, which could reflect their adaptive value. Our work showcases how, by applying genomics methods to a combination of modern and historic samples from colonizing lineages, we can directly study new mutations and their potential evolutionary relevance.


Assuntos
Genoma de Planta , Taxa de Mutação , Mutação/fisiologia , Desenvolvimento Vegetal/genética , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Cruzamentos Genéticos , Evolução Molecular Direcionada , Evolução Molecular , Fluxo Gênico/fisiologia , Espécies Introduzidas , Fenótipo , Filogenia , Plantas Daninhas/genética , Plantas Daninhas/crescimento & desenvolvimento , Seleção Genética , Análise de Sequência de DNA
8.
Plant Cell ; 29(1): 5-19, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27986896

RESUMO

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.


Assuntos
Biologia Computacional/métodos , Genoma de Planta/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Flores/genética , Flores/crescimento & desenvolvimento , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador
9.
Nucleic Acids Res ; 46(D1): D1150-D1156, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29059333

RESUMO

The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10-4, of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
11.
Nucleic Acids Res ; 45(D1): D1054-D1059, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27924043

RESUMO

Natural genetic variation makes it possible to discover evolutionary changes that have been maintained in a population because they are advantageous. To understand genotype-phenotype relationships and to investigate trait architecture, the existence of both high-resolution genotypic and phenotypic data is necessary. Arabidopsis thaliana is a prime model for these purposes. This herb naturally occurs across much of the Eurasian continent and North America. Thus, it is exposed to a wide range of environmental factors and has been subject to natural selection under distinct conditions. Full genome sequencing data for more than 1000 different natural inbred lines are available, and this has encouraged the distributed generation of many types of phenotypic data. To leverage these data for meta analyses, AraPheno (https://arapheno.1001genomes.org) provide a central repository of population-scale phenotypes for A. thaliana inbred lines. AraPheno includes various features to easily access, download and visualize the phenotypic data. This will facilitate a comparative analysis of the many different types of phenotypic data, which is the base to further enhance our understanding of the genotype-phenotype map.


Assuntos
Arabidopsis/genética , Arabidopsis/metabolismo , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Genótipo , Fenótipo , Ferramenta de Busca , Sistemas de Gerenciamento de Base de Dados , Software , Navegador
12.
Proc Natl Acad Sci U S A ; 113(46): E7317-E7326, 2016 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-27803326

RESUMO

The ubiquity of nonparental hybrid phenotypes, such as hybrid vigor and hybrid inferiority, has interested biologists for over a century and is of considerable agricultural importance. Although examples of both phenomena have been subject to intense investigation, no general model for the molecular basis of nonadditive genetic variance has emerged, and prediction of hybrid phenotypes from parental information continues to be a challenge. Here we explore the genetics of hybrid phenotype in 435 Arabidopsis thaliana individuals derived from intercrosses of 30 parents in a half diallel mating scheme. We find that nonadditive genetic effects are a major component of genetic variation in this population and that the genetic basis of hybrid phenotype can be mapped using genome-wide association (GWA) techniques. Significant loci together can explain as much as 20% of phenotypic variation in the surveyed population and include examples that have both classical dominant and overdominant effects. One candidate region inherited dominantly in the half diallel contains the gene for the MADS-box transcription factor AGAMOUS-LIKE 50 (AGL50), which we show directly to alter flowering time in the predicted manner. Our study not only illustrates the promise of GWA approaches to dissect the genetic architecture underpinning hybrid performance but also demonstrates the contribution of classical dominance to genetic variance.


Assuntos
Arabidopsis/genética , Vigor Híbrido/genética , Cruzamentos Genéticos , Variação Genética , Hibridização Genética , Fenótipo
13.
Genome Res ; 25(2): 246-56, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25367294

RESUMO

The spatial arrangement of interphase chromosomes in the nucleus is important for gene expression and genome function in animals and in plants. The recently developed Hi-C technology is an efficacious method to investigate genome packing. Here we present a detailed Hi-C map of the three-dimensional genome organization of the plant Arabidopsis thaliana. We find that local chromatin packing differs from the patterns seen in animals, with kilobasepair-sized segments that have much higher intrachromosome interaction rates than neighboring regions, representing a dominant local structural feature of genome conformation in A. thaliana. These regions, which appear as positive strips on two-dimensional representations of chromatin interaction, are enriched in epigenetic marks H3K27me3, H3.1, and H3.3. We also identify more than 400 insulator-like regions. Furthermore, although topologically associating domains (TADs), which are prominent in animals, are not an obvious feature of A. thaliana genome packing, we found more than 1000 regions that have properties of TAD boundaries, and a similar number of regions analogous to the interior of TADs. The insulator-like, TAD-boundary-like, and TAD-interior-like regions are each enriched for distinct epigenetic marks and are each correlated with different gene expression levels. We conclude that epigenetic modifications, gene density, and transcriptional activity combine to shape the local packing of the A. thaliana nuclear genome.


Assuntos
Arabidopsis/genética , Arabidopsis/metabolismo , Montagem e Desmontagem da Cromatina , Cromatina/metabolismo , Genômica , Análise por Conglomerados , Biologia Computacional/métodos , Epigênese Genética , Genoma de Planta , Genômica/métodos , Histonas/metabolismo , Elementos Isolantes
14.
Mol Biol Evol ; 33(9): 2257-72, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27189551

RESUMO

Understanding how new species form requires investigation of evolutionary forces that cause phenotypic and genotypic changes among populations. However, the mechanisms underlying speciation vary and little is known about whether genomes diversify in the same ways in parallel at the incipient scale. We address this using the nematode, Pristionchus pacificus, which resides at an interesting point on the speciation continuum (distinct evolutionary lineages without reproductive isolation), and inhabits heterogeneous environments subject to divergent environmental pressures. Using whole genome re-sequencing of 264 strains, we estimate FST to identify outlier regions of extraordinary differentiation (∼1.725 Mb of the 172.5 Mb genome). We find evidence for shared divergent genomic regions occurring at a higher frequency than expected by chance among populations of the same evolutionary lineage. We use allele frequency spectra to find that, among lineages, 53% of divergent regions are consistent with adaptive selection, whereas 24% and 23% of such regions suggest background selection and restricted gene flow, respectively. In contrast, among populations from the same lineage, similar proportions (34-48%) of divergent regions correspond to adaptive selection and restricted gene flow, whereas 13-22% suggest background selection. Because speciation often involves phenotypic and genomic divergence, we also evaluate phenotypic variation, focusing on pH tolerance, which we find is diverging in a manner corresponding to environmental differences among populations. Taking a genome-wide association approach, we functionally validate a significant genotype-phenotype association for this trait. Our results are consistent with P. pacificus undergoing heterogeneous genotypic and phenotypic diversification related to both evolutionary and environmental processes.


Assuntos
Rabditídios/genética , Animais , Evolução Biológica , Evolução Molecular , Fluxo Gênico , Frequência do Gene , Estudos de Associação Genética , Especiação Genética , Variação Genética , Isolamento Reprodutivo , Seleção Genética , Transcriptoma
15.
Bioinformatics ; 31(12): i240-9, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072488

RESUMO

MOTIVATION: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals. RESULTS: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping. CONCLUSIONS: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes. AVAILABILITY AND IMPLEMENTATION: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html.


Assuntos
Heterogeneidade Genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Arabidopsis/genética , Fenótipo
16.
Hum Mutat ; 36(5): 513-23, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25684150

RESUMO

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.


Assuntos
Biologia Computacional/métodos , Mutação de Sentido Incorreto , Software , Conjuntos de Dados como Assunto , Humanos , Internet , Reprodutibilidade dos Testes , Navegador
17.
Bioinformatics ; 29(13): i171-9, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812981

RESUMO

MOTIVATION: As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. Although several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci or do not scale to genome-wide settings. RESULTS: We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidopsis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature. AVAILABILITY: Code is available at http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Loci Gênicos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Flores , Genótipo , Humanos
18.
Sci Data ; 11(1): 109, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38263173

RESUMO

Sustainable weed management strategies are critical to feeding the world's population while preserving ecosystems and biodiversity. Therefore, site-specific weed control strategies based on automation are needed to reduce the additional time and effort required for weeding. Machine vision-based methods appear to be a promising approach for weed detection, but require high quality data on the species in a specific agricultural area. Here we present a dataset, the Moving Fields Weed Dataset (MFWD), which captures the growth of 28 weed species commonly found in sorghum and maize fields in Germany. A total of 94,321 images were acquired in a fully automated, high-throughput phenotyping facility to track over 5,000 individual plants at high spatial and temporal resolution. A rich set of manually curated ground truth information is also provided, which can be used not only for plant species classification, object detection and instance segmentation tasks, but also for multiple object tracking.

19.
BMC Genomics ; 14: 132, 2013 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-23442375

RESUMO

BACKGROUND: One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives. RESULTS: Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project ( http://www.1001genomes.org) in Arabidopsis thaliana. CONCLUSION: In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL/genética , Software , Algoritmos , Arabidopsis/genética , Inteligência Artificial , Biologia Computacional
20.
Methods Mol Biol ; 2698: 301-322, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37682482

RESUMO

Genome-wide association studies (GWAS) are a powerful tool to elucidate the genotype-phenotype map. Although GWAS are usually used to assess simple univariate associations between genetic markers and traits of interest, it is also possible to infer the underlying genetic architecture and to predict gene regulatory interactions. In this chapter, we describe the latest methods and tools to perform GWAS by calculating permutation-based significance thresholds. For this purpose, we first provide guidelines on univariate GWAS analyses that are extended in the second part of this chapter to more complex models that enable the inference of gene regulatory networks and how these networks vary.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Redes Reguladoras de Genes , Fenótipo , Variação Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA