Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Stat Comput ; 33(1): 34, 2023.
Article in English | MEDLINE | ID: mdl-36691583

ABSTRACT

There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such that key properties of the original dataset are retained. These algorithms are often referred to as sketching algorithms, as the projected dataset can be used as a compressed representation of the full dataset. We show that random matrix theory, in particular the Tracy-Widom law, is useful for describing the operating characteristics of sketching algorithms in the tall-data regime when the sample size n is much greater than the number of variables d. Asymptotic large sample results are of particular interest as this is the regime where sketching is most useful for data compression. In particular, we develop asymptotic approximations for the success rate in generating random subspace embeddings and the convergence probability of iterative sketching algorithms. We test a number of sketching algorithms on real large high-dimensional datasets and find that the asymptotic expressions give accurate predictions of the empirical performance. Supplementary Information: The online version contains supplementary material available at 10.1007/s11222-022-10148-5.

2.
Comput Stat Data Anal ; 104: 79-90, 2016 Dec.
Article in English | MEDLINE | ID: mdl-28496285

ABSTRACT

The statistical matching problem involves the integration of multiple datasets where some variables are not observed jointly. This missing data pattern leaves most statistical models unidentifiable. Statistical inference is still possible when operating under the framework of partially identified models, where the goal is to bound the parameters rather than to estimate them precisely. In many matching problems, developing feasible bounds on the parameters is equivalent to finding the set of positive-definite completions of a partially specified covariance matrix. Existing methods for characterising the set of possible completions do not extend to high-dimensional problems. A Gibbs sampler to draw from the set of possible completions is proposed. The variation in the observed samples gives an estimate of the feasible region of the parameters. The Gibbs sampler extends easily to high-dimensional statistical matching problems.

3.
Genetics ; 198(1): 117-28, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25236453

ABSTRACT

Multiparental populations are of considerable interest in high-density genetic mapping due to their increased levels of polymorphism and recombination relative to biparental populations. However, errors in map construction can have significant impact on QTL discovery in later stages of analysis, and few methods have been developed to quantify the uncertainty attached to the reported order of markers or intermarker distances. Current methods are computationally intensive or limited to assessing uncertainty only for order or distance, but not both simultaneously. We derive the asymptotic joint distribution of maximum composite likelihood estimators for intermarker distances. This approach allows us to construct hypothesis tests and confidence intervals for simultaneously assessing marker-order instability and distance uncertainty. We investigate the effects of marker density, population size, and founder distribution patterns on map confidence in multiparental populations through simulations. Using these data, we provide guidelines on sample sizes necessary to map markers at sub-centimorgan densities with high certainty. We apply these approaches to data from a bread wheat Multiparent Advanced Generation Inter-Cross (MAGIC) population genotyped using the Illumina 9K SNP chip to assess regions of uncertainty and validate them against the recently released pseudomolecule for the wheat chromosome 3B.


Subject(s)
Genetic Linkage , Models, Genetic , Chromosome Mapping/methods , Chromosomes, Plant/genetics , Polymorphism, Single Nucleotide , Triticum/genetics , Uncertainty
SELECTION OF CITATIONS
SEARCH DETAIL
...