Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 82
Filter
1.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38189540

ABSTRACT

Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.


Subject(s)
Deep Learning , Nanopore Sequencing , Nanopores , Humans , Gene Library , Genome, Bacterial
2.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Article in English | MEDLINE | ID: mdl-34216551

ABSTRACT

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Subject(s)
Chromosome Aberrations , Cytogenetic Analysis/methods , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease , Genome, Human , Mutation , DNA Copy Number Variations , Female , Genetic Testing , High-Throughput Nucleotide Sequencing , Humans , Karyotyping , Male , Sequence Analysis, DNA
3.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36259361

ABSTRACT

Metagenomic sequencing analysis (mNGS) has been implemented as an alternative approach for pathogen diagnosis in recent years, which is independent of cultivation and is able to identify all potential antibiotic resistance genes (ARGs). However, current mNGS methods have to deal with low amounts of prokaryotic deoxyribonucleic acid (DNA) and high amounts of host DNA in clinical samples, which significantly decrease the overall microbial detection resolution. The recently released nanopore adaptive sampling (NAS) technology facilitates immediate mapping of individual nucleotides to a given reference as each molecule is sequenced. User-defined thresholds allow for the retention or rejection of specific molecules, informed by the real-time reference mapping results, as they are physically passing through a given sequencing nanopore. We developed a metagenomics workflow for ultra-sensitive diagnosis of bacterial pathogens and ARGs from clinical samples, which is based on the efficient selective 'human host depletion' NAS sequencing, real-time species identification and species-specific resistance gene prediction. Our method increased the microbial sequence yield at least 8-fold in all 21 sequenced clinical Bronchoalveolar Lavage Fluid (BALF) samples (4.5 h from sample to result) and accurately detected the ARGs at species level. The species-level positive percent agreement between metagenomic sequencing and laboratory culturing was 100% (16/16) and negative percent agreement was 100% (5/5) in our approach. Further work is required for a more robust validation of our approach with large sample size to allow its application to other infection types.


Subject(s)
Anti-Bacterial Agents , Nanopores , Humans , Workflow , Drug Resistance, Bacterial/genetics , Metagenomics/methods , High-Throughput Nucleotide Sequencing/methods , Bacteria/genetics , DNA
4.
Entropy (Basel) ; 26(6)2024 May 26.
Article in English | MEDLINE | ID: mdl-38920460

ABSTRACT

Physics-informed neural networks (PINNs) have garnered widespread use for solving a variety of complex partial differential equations (PDEs). Nevertheless, when addressing certain specific problem types, traditional sampling algorithms still reveal deficiencies in efficiency and precision. In response, this paper builds upon the progress of adaptive sampling techniques, addressing the inadequacy of existing algorithms to fully leverage the spatial location information of sample points, and introduces an innovative adaptive sampling method. This approach incorporates the Dual Inverse Distance Weighting (DIDW) algorithm, embedding the spatial characteristics of sampling points within the probability sampling process. Furthermore, it introduces reward factors derived from reinforcement learning principles to dynamically refine the probability sampling formula. This strategy more effectively captures the essential characteristics of PDEs with each iteration. We utilize sparsely connected networks and have adjusted the sampling process, which has proven to effectively reduce the training time. In numerical experiments on fluid mechanics problems, such as the two-dimensional Burgers' equation with sharp solutions, pipe flow, flow around a circular cylinder, lid-driven cavity flow, and Kovasznay flow, our proposed adaptive sampling algorithm markedly enhances accuracy over conventional PINN methods, validating the algorithm's efficacy.

5.
J Transl Med ; 21(1): 378, 2023 06 10.
Article in English | MEDLINE | ID: mdl-37301971

ABSTRACT

BACKGROUND: Diagnosis of rare genetic diseases can be a long, expensive and complex process, involving an array of tests in the hope of obtaining an actionable result. Long-read sequencing platforms offer the opportunity to make definitive molecular diagnoses using a single assay capable of detecting variants, characterizing methylation patterns, resolving complex rearrangements, and assigning findings to long-range haplotypes. Here, we demonstrate the clinical utility of Nanopore long-read sequencing by validating a confirmatory test for copy number variants (CNVs) in neurodevelopmental disorders and illustrate the broader applications of this platform to assess genomic features with significant clinical implications. METHODS: We used adaptive sampling on the Oxford Nanopore platform to sequence 25 genomic DNA samples and 5 blood samples collected from patients with known or false-positive copy number changes originally detected using short-read sequencing. Across the 30 samples (a total of 50 with replicates), we assayed 35 known unique CNVs (a total of 55 with replicates) and one false-positive CNV, ranging in size from 40 kb to 155 Mb, and assessed the presence or absence of suspected CNVs using normalized read depth. RESULTS: Across 50 samples (including replicates) sequenced on individual MinION flow cells, we achieved an average on-target mean depth of 9.5X and an average on-target read length of 4805 bp. Using a custom read depth-based analysis, we successfully confirmed the presence of all 55 known CNVs (including replicates) and the absence of one false-positive CNV. Using the same CNV-targeted data, we compared genotypes of single nucleotide variant loci to verify that no sample mix-ups occurred between assays. For one case, we also used methylation detection and phasing to investigate the parental origin of a 15q11.2-q13 duplication with implications for clinical prognosis. CONCLUSIONS: We present an assay that efficiently targets genomic regions to confirm clinically relevant CNVs with a concordance rate of 100%. Furthermore, we demonstrate how integration of genotype, methylation, and phasing data from the Nanopore sequencing platform can potentially simplify and shorten the diagnostic odyssey.


Subject(s)
Nanopore Sequencing , Humans , DNA Copy Number Variations/genetics , Workflow , Genomics , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing
6.
Stat Med ; 42(7): 917-935, 2023 03 30.
Article in English | MEDLINE | ID: mdl-36650619

ABSTRACT

Cluster-based outcome-dependent sampling (ODS) has the potential to yield efficiency gains when the outcome of interest is relatively rare, and resource constraints allow only a certain number of clusters to be visited for data collection. Previous research has shown that when the intended analysis is inverse-probability weighted generalized estimating equations, and the number of clusters that can be sampled is fixed, optimal allocation of the (cluster-level) sample size across strata defined by auxiliary variables readily available at the design stage has the potential to increase efficiency in the estimation of the parameter(s) of interest. In such a setting, the optimal allocation formulae depend on quantities that are unknown in practice, currently making such designs difficult to implement. In this paper, we consider a two-wave adaptive sampling approach, in which data is collected from a first wave sample, and subsequently used to compute the optimal second wave stratum-specific sample sizes. We consider two strategies for estimating the necessary components using the first wave data: an inverse-probability weighting (IPW) approach and a multiple imputation (MI) approach. In a comprehensive simulation study, we show that the adaptive sampling approach performs well, and that the MI approach yields designs that are very near-optimal, regardless of the covariate type. The IPW approach, on the other hand, has mixed results. Finally, we illustrate the proposed adaptive sampling procedures with data on maternal characteristics and birth outcomes among women enrolled in the Safer Deliveries program in Zanzibar, Tanzania.


Subject(s)
Research Design , Humans , Female , Sample Size , Computer Simulation , Probability , Data Collection
7.
Proc Natl Acad Sci U S A ; 117(48): 30610-30618, 2020 12 01.
Article in English | MEDLINE | ID: mdl-33184174

ABSTRACT

Peptide binding to major histocompatibility complexes (MHCs) is a central component of the immune system, and understanding the mechanism behind stable peptide-MHC binding will aid the development of immunotherapies. While MHC binding is mostly influenced by the identity of the so-called anchor positions of the peptide, secondary interactions from nonanchor positions are known to play a role in complex stability. However, current MHC-binding prediction methods lack an analysis of the major conformational states and might underestimate the impact of secondary interactions. In this work, we present an atomically detailed analysis of peptide-MHC binding that can reveal the contributions of any interaction toward stability. We propose a simulation framework that uses both umbrella sampling and adaptive sampling to generate a Markov state model (MSM) for a coronavirus-derived peptide (QFKDNVILL), bound to one of the most prevalent MHC receptors in humans (HLA-A24:02). While our model reaffirms the importance of the anchor positions of the peptide in establishing stable interactions, our model also reveals the underestimated importance of position 4 (p4), a nonanchor position. We confirmed our results by simulating the impact of specific peptide mutations and validated these predictions through competitive binding assays. By comparing the MSM of the wild-type system with those of the D4A and D4P mutations, our modeling reveals stark differences in unbinding pathways. The analysis presented here can be applied to any peptide-MHC complex of interest with a structural model as input, representing an important step toward comprehensive modeling of the MHC class I pathway.


Subject(s)
Major Histocompatibility Complex , Markov Chains , Models, Molecular , Peptides/metabolism , Alanine/genetics , Binding, Competitive , Computer Simulation , DNA Mutational Analysis , Mutation/genetics , Proline/metabolism , Protein Binding
8.
Sensors (Basel) ; 23(2)2023 Jan 14.
Article in English | MEDLINE | ID: mdl-36679762

ABSTRACT

Data redundancy and data loss are relevant issues in condition monitoring. Sampling strategies for segment intervals can address these at the source, but do not receive the attention they deserve. Currently, the sampling methods in relevant research lack sufficient adaptability to the condition. In this paper, an adaptive sampling framework of segment intervals is proposed, based on the summary and improvement of existing problems. The framework is implemented to monitor mechanical degradation, and experiments are implemented on simulation data and real datasets. Subsequently, the distributions of the samples collected by different sampling strategies are visually presented through a color map, and five metrics are designed to assess the sampling results. The intuitive and numerical results show the superiority of the proposed method in comparison to existing methods, and the results are closely related to data status and degradation indicators. The smaller the data fluctuation and the more stable the degradation trend, the better the result. Furthermore, the results of the objective physical indicators are obviously better than those of the feature indicators. By addressing existing problems, the proposed framework opens up a new idea of predictive sampling, which significantly improves the degradation monitoring.


Subject(s)
Computer Simulation
9.
Sensors (Basel) ; 23(6)2023 Mar 17.
Article in English | MEDLINE | ID: mdl-36991936

ABSTRACT

High precision geometric measurement of free-form surfaces has become the key to high-performance manufacturing in the manufacturing industry. By designing a reasonable sampling plan, the economic measurement of free-form surfaces can be realized. This paper proposes an adaptive hybrid sampling method for free-form surfaces based on geodesic distance. The free-form surfaces are divided into segments, and the sum of the geodesic distance of each surface segment is taken as the global fluctuation index of free-form surfaces. The number and location of the sampling points for each free-form surface segment are reasonably distributed. Compared with the common methods, this method can significantly reduce the reconstruction error under the same sampling points. This method overcomes the shortcomings of the current commonly used method of taking curvature as the local fluctuation index of free-form surfaces, and provides a new perspective for the adaptive sampling of free-form surfaces.

10.
Sensors (Basel) ; 23(23)2023 Dec 04.
Article in English | MEDLINE | ID: mdl-38067973

ABSTRACT

Adaptive information-sampling approaches enable efficient selection of mobile robots' waypoints through which the accurate sensing and mapping of a physical process, such as the radiation or field intensity, can be obtained. A key parameter in the informative sampling objective function could be optimized balance the need to explore new information where the uncertainty is very high and to exploit the data sampled so far, with which a great deal of the underlying spatial fields can be obtained, such as the source locations or modalities of the physical process. However, works in the literature have either assumed the robot's energy is unconstrained or used a homogeneous availability of energy capacity among different robots. Therefore, this paper analyzes the impact of the adaptive information-sampling algorithm's information function used in exploration and exploitation to achieve a tradeoff between balancing the mapping, localization, and energy efficiency objectives. We use Gaussian process regression (GPR) to predict and estimate confidence bounds, thereby determining each point's informativeness. Through extensive experimental data, we provide a deeper and holistic perspective on the effect of information function parameters on the prediction map's accuracy (RMSE), confidence bound (variance), energy consumption (distance), and time spent (sample count) in both single- and multi-robot scenarios. The results provide meaningful insights into choosing the appropriate energy-aware information function parameters based on sensing objectives (e.g., source localization or mapping). Based on our analysis, we can conclude that it would be detrimental to give importance only to the uncertainty of the information function (which would explode the energy needs) or to the predictive mean of the information (which would jeopardize the mapping accuracy). By assigning more importance to the information uncertainly with some non-zero importance to the information value (e.g., 75:25 ratio), it is possible to achieve an optimal tradeoff between exploration and exploitation objectives while keeping the energy requirements manageable.

11.
Sensors (Basel) ; 23(21)2023 Oct 24.
Article in English | MEDLINE | ID: mdl-37960376

ABSTRACT

An attention-aware patch-based deep-learning model for a blind 360-degree image quality assessment (360-IQA) is introduced in this paper. It employs spatial attention mechanisms to focus on spatially significant features, in addition to short skip connections to align them. A long skip connection is adopted to allow features from the earliest layers to be used at the final level. Patches are properly sampled on the sphere to correspond to the viewports displayed to the user using head-mounted displays. The sampling incorporates the relevance of patches by considering (i) the exploration behavior and (ii) a latitude-based selection. An adaptive strategy is applied to improve the pooling of local patch qualities to global image quality. This includes an outlier score rejection step relying on the standard deviation of the obtained scores to consider the agreement, as well as a saliency to weigh them based on their visual significance. Experiments on available 360-IQA databases show that our model outperforms the state of the art in terms of accuracy and generalization ability. This is valid for general deep-learning-based models, multichannel models, and natural scene statistic-based models. Furthermore, when compared to multichannel models, the computational complexity is significantly reduced. Finally, an extensive ablation study gives insights into the efficacy of each component of the proposed model.

12.
J Biol Chem ; 297(4): 101092, 2021 10.
Article in English | MEDLINE | ID: mdl-34437903

ABSTRACT

Witchweed, or Striga hermonthica, is a parasitic weed that destroys billions of dollars' worth of crops globally every year. Its germination is stimulated by strigolactones exuded by its host plants. Despite high sequence, structure, and ligand-binding site conservation across different plant species, one strigolactone receptor in witchweed, ShHTL7, uniquely exhibits a picomolar EC50 for downstream signaling. Previous biochemical and structural analyses have hypothesized that this unique ligand sensitivity can be attributed to a large binding pocket volume in ShHTL7 resulting in enhanced ability to bind substrates, but additional structural details of the substrate-binding process would help explain its role in modulating the ligand selectivity. Using long-timescale molecular dynamics simulations, we demonstrate that mutations at the entrance of the binding pocket facilitate a more direct ligand-binding pathway to ShHTL7, whereas hydrophobicity at the binding pocket entrance results in a stable "anchored" state. We also demonstrate that several residues on the D-loop of AtD14 stabilize catalytically inactive conformations. Finally, we show that strigolactone selectivity is not modulated by binding pocket volume. Our results indicate that while ligand binding is not the sole modulator of strigolactone receptor selectivity, it is a significant contributing factor. These results can be used to inform the design of selective antagonists for strigolactone receptors in witchweed.


Subject(s)
Heterocyclic Compounds, 3-Ring/chemistry , Lactones/chemistry , Molecular Dynamics Simulation , Plant Proteins/chemistry , Striga/chemistry , Binding Sites , Heterocyclic Compounds, 3-Ring/metabolism , Lactones/metabolism , Plant Proteins/genetics , Plant Proteins/metabolism , Striga/genetics , Striga/metabolism
13.
Stat Med ; 41(17): 3336-3348, 2022 07 30.
Article in English | MEDLINE | ID: mdl-35527474

ABSTRACT

Outbreaks of an endemic infectious disease can occur when the disease is introduced into a highly susceptible subpopulation or when the disease enters a network of connected individuals. For example, significant HIV outbreaks among people who inject drugs have occurred in at least half a dozen US states in recent years. This motivates the current study: how can limited testing resources be allocated across geographic regions to rapidly detect outbreaks of an endemic infectious disease? We develop an adaptive sampling algorithm that uses profile likelihood to estimate the distribution of the number of positive tests that would occur for each location in a future time period if that location were sampled. Sampling is performed in the location with the highest estimated probability of triggering an outbreak alarm in the next time period. The alarm function is determined by a semiparametric likelihood ratio test. We compare the profile likelihood sampling (PLS) method numerically to uniform random sampling (URS) and Thompson sampling (TS). TS was worse than URS when the outbreak occurred in a location with lower initial prevalence than other locations. PLS had lower time to outbreak detection than TS in some but not all scenarios, but was always better than URS even when the outbreak occurred in a location with a lower initial prevalence than other locations. PLS provides an effective and reliable method for rapidly detecting endemic disease outbreaks that is robust to this uncertainty.


Subject(s)
Disease Outbreaks , Humans , Likelihood Functions , Prevalence
14.
Sensors (Basel) ; 22(13)2022 Jun 25.
Article in English | MEDLINE | ID: mdl-35808310

ABSTRACT

Block compressed sensing (BCS) is suitable for image sampling and compression in resource-constrained applications. Adaptive sampling methods can effectively improve the rate-distortion performance of BCS. However, adaptive sampling methods bring high computational complexity to the encoder, which loses the superiority of BCS. In this paper, we focus on improving the adaptive sampling performance at the cost of low computational complexity. Firstly, we analyze the additional computational complexity of the existing adaptive sampling methods for BCS. Secondly, the adaptive sampling problem of BCS is modeled as a distortion minimization problem. We present three distortion models to reveal the relationship between block sampling rate and block distortion and use a simple neural network to predict the model parameters from several measurements. Finally, a fast estimation method is proposed to allocate block sampling rates based on distortion minimization. The results demonstrate that the proposed estimation method of block sampling rates is effective. Two of the three proposed distortion models can make the proposed estimation method have better performance than the existing adaptive sampling methods of BCS. Compared with the calculation of BCS at the sampling rate of 0.1, the additional calculation of the proposed adaptive sampling method is less than 1.9%.


Subject(s)
Data Compression , Neural Networks, Computer , Image Processing, Computer-Assisted
15.
Molecules ; 27(2)2022 Jan 06.
Article in English | MEDLINE | ID: mdl-35056671

ABSTRACT

Catalytic properties of noble-metal nanoparticles (NPs) are largely determined by their surface morphology. The latter is probed by surface-sensitive spectroscopic techniques in different spectra regions. A fast and precise computational approach enabling the prediction of surface-adsorbate interaction would help the reliable description and interpretation of experimental data. In this work, we applied Machine Learning (ML) algorithms for the task of adsorption-energy approximation for CO on Pd nanoclusters. Due to a high dependency of binding energy from the nature of the adsorbing site and its local coordination, we tested several structural descriptors for the ML algorithm, including mean Pd-C distances, coordination numbers (CN) and generalized coordination numbers (GCN), radial distribution functions (RDF), and angular distribution functions (ADF). To avoid overtraining and to probe the most relevant positions above the metal surface, we utilized the adaptive sampling methodology for guiding the ab initio Density Functional Theory (DFT) calculations. The support vector machines (SVM) and Extra Trees algorithms provided the best approximation quality and mean absolute error in energy prediction up to 0.12 eV. Based on the developed potential, we constructed an energy-surface 3D map for the whole Pd55 nanocluster and extended it to new geometries, Pd79, and Pd85, not implemented in the training sample. The methodology can be easily extended to adsorption energies onto mono- and bimetallic NPs at an affordable computational cost and accuracy.

16.
J Struct Biol ; 213(4): 107800, 2021 12.
Article in English | MEDLINE | ID: mdl-34600140

ABSTRACT

The flux of ions and molecules in and out of the cell is vital for maintaining the basis of various biological processes. The permeation of substrates across the cellular membrane is mediated through the function of specialized integral membrane proteins commonly known as membrane transporters. These proteins undergo a series of structural rearrangements that allow a primary substrate binding site to be accessed from either side of the membrane at a given time. Structural insights provided by experimentally resolved structures of membrane transporters have aided in the biophysical characterization of these important molecular drug targets. However, characterizing the transitions between conformational states remains challenging to achieve both experimentally and computationally. Though molecular dynamics simulations are a powerful approach to provide atomistic resolution of protein dynamics, a recurring challenge is its ability to efficiently obtain relevant timescales of large conformational transitions as exhibited in transporters. One approach to overcome this difficulty is to adaptively guide the simulation to favor exploration of the conformational landscape, otherwise known as adaptive sampling. Furthermore, such sampling is greatly benefited by the statistical analysis of Markov state models. Historically, the use of Markov state models has been effective in quantifying slow dynamics or long timescale behaviors such as protein folding. Here, we review recent implementations of adaptive sampling and Markov state models to not only address current limitations of molecular dynamics simulations, but to also highlight how Markov state modeling can be applied to investigate the structure-function mechanisms of large, complex membrane transporters.


Subject(s)
Markov Chains , Membrane Transport Proteins/chemistry , Molecular Dynamics Simulation , Protein Conformation , Animals , Binding Sites , Cell Membrane/metabolism , Humans , Membrane Transport Proteins/metabolism , Protein Binding , Thermodynamics
17.
Proc Natl Acad Sci U S A ; 115(44): 11138-11143, 2018 10 30.
Article in English | MEDLINE | ID: mdl-30327341

ABSTRACT

We develop a method for the evaluation of extreme event statistics associated with nonlinear dynamical systems from a small number of samples. From an initial dataset of design points, we formulate a sequential strategy that provides the "next-best" data point (set of parameters) that when evaluated results in improved estimates of the probability density function (pdf) for a scalar quantity of interest. The approach uses Gaussian process regression to perform Bayesian inference on the parameter-to-observation map describing the quantity of interest. We then approximate the desired pdf along with uncertainty bounds using the posterior distribution of the inferred map. The next-best design point is sequentially determined through an optimization procedure that selects the point in parameter space that maximally reduces uncertainty between the estimated bounds of the pdf prediction. Since the optimization process uses only information from the inferred map, it has minimal computational cost. Moreover, the special form of the metric emphasizes the tails of the pdf. The method is practical for systems where the dimensionality of the parameter space is of moderate size and for problems where each sample is very expensive to obtain. We apply the method to estimate the extreme event statistics for a very high-dimensional system with millions of degrees of freedom: an offshore platform subjected to 3D irregular waves. It is demonstrated that the developed approach can accurately determine the extreme event statistics using a limited number of samples.

18.
Sensors (Basel) ; 21(15)2021 Jul 28.
Article in English | MEDLINE | ID: mdl-34372361

ABSTRACT

Machine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed in this study. DisSAGD corrects the gradient estimate for each iteration by using the gradient variance of historical iterations without full gradient computation or additional storage, i.e., it reduces the mean variance of historical gradients in order to reduce the error in updating parameters. We implemented DisSAGD in distributed clusters in order to train a machine learning model by sharing parameters among nodes using an asynchronous communication protocol. We also propose an adaptive learning rate strategy, as well as a sampling strategy, to address the update lag of the overall parameter distribution, which helps to improve the convergence speed when the parameters deviate from the optimal value-when one working node is faster than another, this node will have more time to compute the local gradient and sample more samples for the next iteration. Our experiments demonstrate that DisSAGD significantly reduces waiting times during loop iterations and improves convergence speed when compared to traditional methods, and that our method can achieve speed increases for distributed clusters.

19.
Biometrics ; 76(2): 496-507, 2020 06.
Article in English | MEDLINE | ID: mdl-31598956

ABSTRACT

Modeling correlated or highly stratified multiple-response data is a common data analysis task in many applications, such as those in large epidemiological studies or multisite cohort studies. The generalized estimating equations method is a popular statistical method used to analyze these kinds of data, because it can manage many types of unmeasured dependence among outcomes. Collecting large amounts of highly stratified or correlated response data is time-consuming; thus, the use of a more aggressive sampling strategy that can accelerate this process-such as the active-learning methods found in the machine-learning literature-will always be beneficial. In this study, we integrate adaptive sampling and variable selection features into a sequential procedure for modeling correlated response data. Besides reporting the statistical properties of the proposed procedure, we also use both synthesized and real data sets to demonstrate the usefulness of our method.


Subject(s)
Biometry/methods , Models, Statistical , Algorithms , Antibodies, Neutralizing/therapeutic use , Computer Simulation , Data Interpretation, Statistical , Databases, Factual/statistics & numerical data , Humans , Interferon beta-1b/therapeutic use , Logistic Models , Machine Learning , Multiple Sclerosis, Relapsing-Remitting/immunology , Multiple Sclerosis, Relapsing-Remitting/therapy , Multivariate Analysis , Probability , Randomized Controlled Trials as Topic/statistics & numerical data , Sample Size
20.
Sensors (Basel) ; 20(12)2020 Jun 17.
Article in English | MEDLINE | ID: mdl-32560453

ABSTRACT

To allow mobile robots to visually observe the temperature of equipment in complex industrial environments and work on temperature anomalies in time, it is necessary to accurately find the coordinates of temperature anomalies and obtain information on the surrounding obstacles. This paper proposes a visual saliency detection method for hypertemperature in three-dimensional space through dual-source images. The key novelty of this method is that it can achieve accurate salient object detection without relying on high-performance hardware equipment. First, the redundant point clouds are removed through adaptive sampling to reduce the computational memory. Second, the original images are merged with infrared images and the dense point clouds are surface-mapped to visually display the temperature of the reconstructed surface and use infrared imaging characteristics to detect the plane coordinates of temperature anomalies. Finally, transformation mapping is coordinated according to the pose relationship to obtain the spatial position. Experimental results show that this method not only displays the temperature of the device directly but also accurately obtains the spatial coordinates of the heat source without relying on a high-performance computing platform.

SELECTION OF CITATIONS
SEARCH DETAIL