Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
1.
Proc Natl Acad Sci U S A ; 121(8): e2314228121, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38363866

RESUMO

In problems such as variable selection and graph estimation, models are characterized by Boolean logical structure such as the presence or absence of a variable or an edge. Consequently, false-positive error or false-negative error can be specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, there are several other problems such as ranking, clustering, and causal inference in which the associated model classes do not admit transparent notions of false-positive and false-negative errors due to the lack of an underlying Boolean logical structure. In this paper, we present a generic approach to endow a collection of models with partial order structure, which leads to a hierarchical organization of model classes as well as natural analogs of false-positive and false-negative errors. We describe model selection procedures that provide false-positive error control in our general setting, and we illustrate their utility with numerical experiments.

2.
Biostatistics ; 25(2): 541-558, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-37037190

RESUMO

Whole-brain connectome data characterize the connections among distributed neural populations as a set of edges in a large network, and neuroscience research aims to systematically investigate associations between brain connectome and clinical or experimental conditions as covariates. A covariate is often related to a number of edges connecting multiple brain areas in an organized structure. However, in practice, neither the covariate-related edges nor the structure is known. Therefore, the understanding of underlying neural mechanisms relies on statistical methods that are capable of simultaneously identifying covariate-related connections and recognizing their network topological structures. The task can be challenging because of false-positive noise and almost infinite possibilities of edges combining into subnetworks. To address these challenges, we propose a new statistical approach to handle multivariate edge variables as outcomes and output covariate-related subnetworks. We first study the graph properties of covariate-related subnetworks from a graph and combinatorics perspective and accordingly bridge the inference for individual connectome edges and covariate-related subnetworks. Next, we develop efficient algorithms to exact covariate-related subnetworks from the whole-brain connectome data with an $\ell_0$ norm penalty. We validate the proposed methods based on an extensive simulation study, and we benchmark our performance against existing methods. Using our proposed method, we analyze two separate resting-state functional magnetic resonance imaging data sets for schizophrenia research and obtain highly replicable disease-related subnetworks.


Assuntos
Conectoma , Esquizofrenia , Humanos , Conectoma/métodos , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Esquizofrenia/diagnóstico por imagem , Simulação por Computador
3.
Mediterr J Math ; 21(1): 39, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38333636

RESUMO

In this paper, we give a simple criterion to verify that functions of the form eg are in the Hayman class when g is a power series with nonnegative coefficients. Thus, using the Hayman and Báez-Duarte formulas, we obtain asymptotics for the coefficients of generating functions that arise in many examples of set construction in analytic combinatorics. This new criterion greatly simplifies the one obtained previously by the authors.

4.
Stud Hist Philos Sci ; 106: 60-69, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38901328

RESUMO

Leibniz's famous proposition that God has created the best of all possible worlds holds a significant place in his philosophical system. However, the precise manner in which God determines which world is the best remains somewhat ambiguous. Leibniz suggests that a form of "Divine mathematics" is employed to construct and evaluate possible worlds. In this paper, I uncover the underlying mechanics of Divine mathematics by formally reconstructing it. I argue that Divine mathematics is a one-player combinatorial game, in which God's goal is to find the best combination among many possibilities. Drawing on the combinatorial theory, I provide new solutions to some puzzles of compossibility.


Assuntos
Matemática , Matemática/história , Filosofia/história
5.
Bull Math Biol ; 85(11): 107, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37749280

RESUMO

Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention is given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony-both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods, with the goal of using model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and transpositions. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.


Assuntos
Rearranjo Gênico , Conceitos Matemáticos , Filogenia , Modelos Biológicos , Genoma , Algoritmos , Modelos Genéticos
6.
Ann Inst Stat Math ; 75(4): 683-704, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36590375

RESUMO

After a rich history in medicine, randomized control trials (RCTs), both simple and complex, are in increasing use in other areas, such as web-based A/B testing and planning and design of decisions. A main objective of RCTs is to be able to measure parameters, and contrasts in particular, while guarding against biases from hidden confounders. After careful definitions of classical entities such as contrasts, an algebraic method based on circuits is introduced which gives a wide choice of randomization schemes.

7.
Philos Trans A Math Phys Eng Sci ; 380(2214): 20210121, 2022 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-34802274

RESUMO

We develop a statistical model for the testing of disease prevalence in a population. The model assumes a binary test result, positive or negative, but allows for biases in sample selection and both type I (false positive) and type II (false negative) testing errors. Our model also incorporates multiple test types and is able to distinguish between retesting and exclusion after testing. Our quantitative framework allows us to directly interpret testing results as a function of errors and biases. By applying our testing model to COVID-19 testing data and actual case data from specific jurisdictions, we are able to estimate and provide uncertainty quantification of indices that are crucial in a pandemic, such as disease prevalence and fatality ratios. This article is part of the theme issue 'Data science approach to infectious disease surveillance'.


Assuntos
Teste para COVID-19 , COVID-19 , Viés , Reações Falso-Positivas , Humanos , Modelos Estatísticos , SARS-CoV-2 , Viés de Seleção , Sensibilidade e Especificidade
8.
Entropy (Basel) ; 24(11)2022 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-36359652

RESUMO

The main goal of group testing is to identify a small number of specific items among a large population of items. In this paper, we consider specific items as positives and inhibitors and non-specific items as negatives. In particular, we consider a novel model called group testing with blocks of positives and inhibitors. A test on a subset of items is positive if the subset contains at least one positive and does not contain any inhibitors, and it is negative otherwise. In this model, the input items are linearly ordered, and the positives and inhibitors are subsets of small blocks (at unknown locations) of consecutive items over that order. We also consider two specific instantiations of this model. The first instantiation is that model that contains a single block of consecutive items consisting of exactly known numbers of positives and inhibitors. The second instantiation is the model that contains a single block of consecutive items containing known numbers of positives and inhibitors. Our contribution is to propose efficient encoding and decoding schemes such that the numbers of tests used to identify only positives or both positives and inhibitors are less than the ones in the state-of-the-art schemes. Moreover, the decoding times mostly scale to the numbers of tests that are significantly smaller than the state-of-the-art ones, which scale to both the number of tests and the number of items.

9.
Entropy (Basel) ; 24(5)2022 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-35626483

RESUMO

The present paper offers, in its first part, a unified approach for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties. It applies this approach for the derivation of information inequalities with Shannon information measures. Connections of the considered approach to a generalized version of Shearer's lemma, and other related results in the literature are considered. Some of the derived information inequalities are new, and also known results (such as a generalized version of Han's inequality) are reproduced in a simple and unified way. In its second part, this paper applies the generalized Han's inequality to analyze a problem in extremal graph theory. This problem is motivated and analyzed from the perspective of information theory, and the analysis leads to generalized and refined bounds. The two parts of this paper are meant to be independently accessible to the reader.

10.
Ecol Lett ; 24(9): 2025-2039, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34142760

RESUMO

Exploring and accounting for the emergent properties of ecosystems as complex systems is a promising horizon in the search for general processes to explain common ecological patterns. For example the ubiquitous hollow-curve form of the species abundance distribution is frequently assumed to reflect ecological processes structuring communities, but can also emerge as a statistical phenomenon from the mathematical definition of an abundance distribution. Although the hollow curve may be a statistical artefact, ecological processes may induce subtle deviations between empirical species abundance distributions and their statistically most probable forms. These deviations may reflect biological processes operating on top of mathematical constraints and provide new avenues for advancing ecological theory. Examining ~22,000 communities, we found that empirical SADs are highly uneven and dominated by rare species compared to their statistical baselines. Efforts to detect deviations may be less informative in small communities-those with few species or individuals-because these communities have poorly resolved statistical baselines. The uneven nature of many empirical SADs demonstrates a path forward for leveraging complexity to understand ecological processes governing the distribution of abundance, while the issues posed by small communities illustrate the limitations of using this approach to study ecological patterns in small samples.


Assuntos
Biodiversidade , Ecossistema , Humanos , Modelos Biológicos
11.
J Math Biol ; 82(6): 47, 2021 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-33818665

RESUMO

Two errors in the article Best Match Graphs (Geiß et al. in JMB 78: 2015-2057, 2019) are corrected. One concerns the tacit assumption that digraphs are sink-free, which has to be added as an additional precondition in Lemma 9, Lemma 11, Theorem 4. Correspondingly, Algorithm 2 requires that its input is sink-free. The second correction concerns an additional necessary condition in Theorem 9 required to characterize best match graphs. The amended results simplify the construction of least resolved trees for n-cBMGs, i.e., Algorithm 1. All other results remain unchanged and are correct as stated.

12.
Theor Comput Sci ; 859: 134-148, 2021 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-34163096

RESUMO

Prefix normal words are binary words with the property that no factor has more 1s than the prefix of the same length. Finite prefix normal words were introduced in [Fici and Lipták, DLT 2011]. In this paper, we study infinite prefix normal words and explore their relationship to some known classes of infinite binary words. In particular, we establish a connection between prefix normal words and Sturmian words, between prefix normal words and abelian complexity, and between prefix normality and lexicographic order.

13.
Theor Popul Biol ; 134: 92-105, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32485202

RESUMO

The Kingman coalescent process is a classical model of gene genealogies in population genetics. It generates Yule-distributed, binary ranked tree topologies - also called histories - with a finite number of n leaves, together with n-1 exponentially distributed time lengths: one for each layer of the history. Using a discrete approach, we study the lengths of the external branches of Yule distributed histories, where the length of an external branch is defined as the rank of its parent node. We study the multiplicity of external branches of given length in a random history of n leaves. A correspondence between the external branches of the ordered histories of size n and the non-peak entries of the permutations of size n-1 provides easy access to the length distributions of the first and second longest external branches in a random Yule history and coalescent tree of size n. The length of the longest external branch is also studied in dependence of root balance of a random tree. As a practical application, we compare the observed and expected number of mutations on the longest external branches in samples from natural populations.


Assuntos
Modelos Genéticos , Árvores , Genética Populacional , Mutação , Filogenia , Árvores/genética
14.
J Theor Biol ; 501: 110335, 2020 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-32450075

RESUMO

Rearrangements are discrete processes whereby discrete segments of DNA are deleted, replicated and inserted into novel positions. A sequence of such configurations, termed a rearrangement evolution, results in jumbled DNA arrangements, frequently observed in cancer genomes. We introduce a method that allows us to precisely count these different evolutions for a range of processes including breakage-fusion-bridge-cycles, tandem-duplications, inverted-duplications, reversals, transpositions and deletions, showing that the space of rearrangement evolution is super-exponential in size. These counts assume the infinite sites model of unique breakpoint usage.


Assuntos
DNA , Genoma , Rearranjo Gênico/genética , Genoma/genética
15.
J Math Biol ; 80(5): 1353-1388, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32060618

RESUMO

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.


Assuntos
Evolução Molecular , Modelos Genéticos , Algoritmos , Biologia Computacional , Simulação por Computador , Deleção de Genes , Duplicação Gênica , Transferência Genética Horizontal , Especiação Genética , Conceitos Matemáticos , Família Multigênica , Filogenia
16.
J Math Biol ; 80(5): 1459-1495, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32002659

RESUMO

A wide variety of problems in computational biology, most notably the assessment of orthology, are solved with the help of reciprocal best matches. Using an evolutionary definition of best matches that captures the intuition behind the concept we clarify rigorously the relationships between reciprocal best matches, orthology, and evolutionary events under the assumption of duplication/loss scenarios. We show that the orthology graph is a subgraph of the reciprocal best match graph (RBMG). We furthermore give conditions under which an RBMG that is a cograph identifies the correct orthlogy relation. Using computer simulations we find that most false positive orthology assignments can be identified as so-called good quartets-and thus corrected-in the absence of horizontal transfer. Horizontal transfer, however, may introduce also false-negative orthology assignments.


Assuntos
Evolução Molecular , Especiação Genética , Modelos Genéticos , Filogenia , Algoritmos , Biologia Computacional , Gráficos por Computador , Simulação por Computador , Deleção de Genes , Duplicação Gênica , Transferência Genética Horizontal , Conceitos Matemáticos
17.
Bull Math Biol ; 81(2): 384-407, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-28913585

RESUMO

An ancestral configuration is one of the combinatorially distinct sets of gene lineages that, for a given gene tree, can reach a given node of a specified species tree. Ancestral configurations have appeared in recursive algebraic computations of the conditional probability that a gene tree topology is produced under the multispecies coalescent model for a given species tree. For matching gene trees and species trees, we study the number of ancestral configurations, considered up to an equivalence relation introduced by Wu (Evolution 66:763-775, 2012) to reduce the complexity of the recursive probability computation. We examine the largest number of non-equivalent ancestral configurations possible for a given tree size n. Whereas the smallest number of non-equivalent ancestral configurations increases polynomially with n, we show that the largest number increases with [Formula: see text], where k is a constant that satisfies [Formula: see text]. Under a uniform distribution on the set of binary labeled trees with a given size n, the mean number of non-equivalent ancestral configurations grows exponentially with n. The results refine an earlier analysis of the number of ancestral configurations considered without applying the equivalence relation, showing that use of the equivalence relation does not alter the exponential nature of the increase with tree size.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Biologia Computacional , Evolução Molecular , Especiação Genética , Conceitos Matemáticos , Modelos Estatísticos , Probabilidade
18.
J Math Biol ; 78(7): 2015-2057, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30968198

RESUMO

Best match graphs arise naturally as the first processing intermediate in algorithms for orthology detection. Let T be a phylogenetic (gene) tree T and [Formula: see text] an assignment of leaves of T to species. The best match graph [Formula: see text] is a digraph that contains an arc from x to y if the genes x and y reside in different species and y is one of possibly many (evolutionary) closest relatives of x compared to all other genes contained in the species [Formula: see text]. Here, we characterize best match graphs and show that it can be decided in cubic time and quadratic space whether [Formula: see text] derived from a tree in this manner. If the answer is affirmative, there is a unique least resolved tree that explains [Formula: see text], which can also be constructed in cubic time.


Assuntos
Algoritmos , Evolução Biológica , Gráficos por Computador , Genes/genética , Modelos Genéticos , Humanos , Filogenia
19.
BMC Biol ; 16(1): 138, 2018 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-30442124

RESUMO

BACKGROUND: Characterizing recurring sequence patterns in human promoters has been a challenging undertaking even nowadays where a near-complete overview of promoters exists. However, with the more recent availability of genomic location (ChIP-seq) data, one can approach that question through the identification of characteristic patterns of transcription factor occupancy and histone modifications. RESULTS: Based on the ENCODE annotation and integration of sequence motifs as well as three-dimensional chromatin data, we have undertaken a re-analysis of occupancy and sequence patterns in human promoters. We identify clear groups of CAAT-box and E-box sequence motif containing promoters, as well as a group of promoters whose interaction with an enhancer appears to be mediated by CCCTC-binding factor (CTCF) binding on the promoter. We also extend our analysis to inactive promoters, showing that only a surprisingly small number of inactive promoters is repressed by the polycomb complex. We also identify combinatorial patterns of transcription factor interactions indicated by the ChIP-seq signals. CONCLUSION: Our analysis defines subgroups of promoters characterized by stereotypic patterns of transcription factor occupancy, and combinations of specific sequence patterns which are required for their binding. This grouping provides new hypotheses concerning the assembly and dynamics of transcription factor complexes at their respective promoter groups, as well as questions on the evolutionary origin of these groups.


Assuntos
Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Cromatina/metabolismo , Humanos , Ligação Proteica , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
20.
J Neurosci ; 37(50): 12153-12166, 2017 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-29118107

RESUMO

Combinatorial expansion by the cerebellar granule cell layer (GCL) is fundamental to theories of cerebellar contributions to motor control and learning. Granule cells (GrCs) sample approximately four mossy fiber inputs and are thought to form a combinatorial code useful for pattern separation and learning. We constructed a spatially realistic model of the cerebellar GCL and examined how GCL architecture contributes to GrC combinatorial diversity. We found that GrC combinatorial diversity saturates quickly as mossy fiber input diversity increases, and that this saturation is in part a consequence of short dendrites, which limit access to diverse inputs and favor dense sampling of local inputs. This local sampling also produced GrCs that were combinatorially redundant, even when input diversity was extremely high. In addition, we found that mossy fiber clustering, which is a common anatomical pattern, also led to increased redundancy of GrC input combinations. We related this redundancy to hypothesized roles of temporal expansion of GrC information encoding in service of learned timing, and we show that GCL architecture produces GrC populations that support both temporal and combinatorial expansion. Finally, we used novel anatomical measurements from mice of either sex to inform modeling of sparse and filopodia-bearing mossy fibers, finding that these circuit features uniquely contribute to enhancing GrC diversification and redundancy. Our results complement information theoretic studies of granule layer structure and provide insight into the contributions of granule layer anatomical features to afferent mixing.SIGNIFICANCE STATEMENT Cerebellar granule cells are among the simplest neurons, with tiny somata and, on average, just four dendrites. These characteristics, along with their dense organization, inspired influential theoretical work on the granule cell layer as a combinatorial expander, where each granule cell represents a unique combination of inputs. Despite the centrality of these theories to cerebellar physiology, the degree of expansion supported by anatomically realistic patterns of inputs is unknown. Using modeling and anatomy, we show that realistic input patterns constrain combinatorial diversity by producing redundant combinations, which nevertheless could support temporal diversification of like combinations, suitable for learned timing. Our study suggests a neural substrate for producing high levels of both combinatorial and temporal diversity in the granule cell layer.


Assuntos
Córtex Cerebelar/citologia , Conectoma , Dendritos/fisiologia , Modelos Neurológicos , Fibras Nervosas/fisiologia , Pseudópodes/fisiologia , Vias Aferentes/fisiologia , Vias Aferentes/ultraestrutura , Animais , Proteínas de Bactérias/análise , Simulação por Computador , Conectoma/métodos , Dendritos/ultraestrutura , Dependovirus , Feminino , Genes Reporter , Vetores Genéticos , Proteínas Luminescentes/análise , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Fibras Nervosas/ultraestrutura , Pseudópodes/ultraestrutura , Sinapses/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA