RESUMEN
Improving the technical performance of related industrial products is an efficient strategy to reducing the application quantities and environmental burden for toxic chemicals. A novel polyfluoroalkyl surfactant potassium 1,1,2,2,3,3,4,4-octafluoro-4-(perfluorobutoxy)butane-1-sulfonate(F404) was synthesized by a commercializable route. It had a surface tension(γ) of 18.2 mN/m at the critical micelle concentration(CMC, 1.04 g/L), significantly lower than that of perfluorooctane sulfonate(PFOS, ca. 33.0 mN/m, 0.72 g/L), and exhibited remarkable suppression of chromium-fog at a dose half that of PFOS. The half maximal inhibitory concentration(IC50) values in HepG2 cells and the lethal concentration of 50%(LC50) in zebrafish embryos after 72 hpf indicated a lower toxicity for F404 in comparison to PFOS. In a UV/sulphite system, 89.3% of F404 were decomposed after 3 h, representing a defluorination efficiency of 43%. The cleavage of the ether C-O bond during the decomposition would be expected to form a short chain·C4F9 as the position of the ether C-O in the F404 fluorocarbon chains is C4-O5. The ether unit is introduced in the perfluoroalkyl chain to improve water solubility, biocompatibility and degradation, thereby minimizing the environmental burden. Electronic Supplementary Material: Supplementary material is available in the online version of this article at 10.1007/s40242-023-3030-4.
RESUMEN
Conversion of solar and mechanical vibration energies for catalytic water splitting into H2 has gained substantial attention recently. However, the sluggish charge separation and inefficient energy utilization in photocatalytic and piezocatalytic processes severely restrict the catalytic activity. In this paper, efficient piezo-photocatalytic H2 evolution from water splitting is realized via simultaneously converting solar and vibration energy over one-dimensional (1D) nanorod-structured Cdx Zn1-x S (x = 0, 0.2, 0.4, 0.6, 0.8, 1) solid solutions. Under combined visible light and ultrasound irradiation, Cd0.4 Zn0.6 S 1D nanorods deliver a prominently synergetic piezo-photocatalytic H2 yield rate of 4.45 mmol g-1 h-1 , far exceeding that under sole ultrasound or illumination. The consumedly promoted catalytic activity of Cd0.4 Zn0.6 S is attributed to strengthened charge separation by piezo-potential as disclosed by light-assisted scanning Kelvin probe force microscopy (SKPFM), increased strain sensitivity, and desirable optimization between piezoelectricity and visible-light response due to the formation of 1D configuration and solid solution. Metal and metal oxide depositions disclose that reduction and oxidation reactions separately occur at the tips and lateral edges of the Cd0.4 Zn0.6 S nanorods, in which the spatially separated reactive sites also contribute to super catalytic activity. This work is expected to inspire a new design strategy of coupled catalysis reactions for efficient renewable fuel production.
RESUMEN
Tunnels play an essential role in the transportation network. Tunnel entrances are usually buried at a shallow depth. In the event of an internal explosion, the blast pressure will cause severe damage or even collapse of the tunnel entrance, paralyzing the traffic system. Therefore, an accurate assessment of the damage level of tunnel entrances under internal blast loading can provide effective assistance for the anti-blast design of tunnels, post-disaster emergency response, and economic damage assessment. In this paper, four tunnel entrance specimens were designed and fabricated with a scale ratio of 1/5.5, and a series of field blast tests were carried out to examine the damage pattern of the tunnel entrances under internal explosion. Subsequently, static loading tests were conducted to obtain the maximum bearing capacity of the intact specimen and residual bearing capacities of the post-blast specimens. After that, an explicit non-linear analysis was carried out and a numerical finite element (FE) model of the tunnel entrance under internal blast loading was established by adopting the arbitrary Lagrangian-Eulerian (ALE) method and validated based on the data obtained from the field blast and static loading tests. A probabilistic vulnerability analysis of a typical tunnel entrance subjected to stochastic internal explosions (assuming various charge weights and detonation points) was then carried out with the validated FE model. For the purpose of damage assessment, the residual bearing capacity of the tunnel entrance was taken as the damage criterion. The vulnerability curves corresponding to various damage levels were further developed based on the stochastic data from the probabilistic vulnerability analysis. When the charge weight was 200 kg, the tunnel entrance exhibited slight or moderate damage, while the tunnel entrance suffered severe or even complete damage as the charge weight increased to 1000 kg. However, the tunnel entrance's probability of complete damage was less than 10% when the TNT charge weight did not exceed 1000 kg.
RESUMEN
Diversity analysis has been performed routinely on microbiomes, including human viromes. Shared species analysis has been conducted only rarely, but it can be a powerful supplement to diversity analysis. In the present study, we conducted integrated diversity and shared species analyses of human viromes by reanalyzing three published datasets of human viromes with more than 250 samples from healthy vs. diseased individuals and/or rural vs. urban individuals. We found significant differences in the virome diversity measured in the Hill numbers between the healthy and diseased individuals, with diseased individuals exhibiting higher virome diversity than healthy individuals, and rural individual exhibiting higher virome diversity than urban individuals. We applied both "read randomization" and "sample randomization" algorithms to perform shared species analysis. With the more conservative sample randomization algorithm, the observed number of shared species was significantly smaller than the expected shared species in 50% (8 of 16) of the comparisons. These results suggest that integrated diversity and shared species analysis can offer more comprehensive insights in comparing human virome samples than standard diversity analysis alone with potentially powerful applications in differentiating the effects of diseases or other meta-factors.
Asunto(s)
Biodiversidad , Viroma , Bases de Datos Genéticas , Genoma Viral/genética , Estado de Salud , Humanos , Población Rural , Población Urbana , Viroma/genéticaRESUMEN
Methyl jasmonate (MeJA) is one of the most effective inducers of taxol biosynthetic genes, particularly the tasy gene. However, the mechanism underlying the regulation of tasy by MeJA is still unknown. In this study, a 550-bp 5'-flanking sequence was obtained and confirmed as the promoter of the tasy gene. Deletion analysis revealed that the fragment containing a GCC-box from -150 to -131 was the crucial jasmonate (JA)-responsive element, designated as JRE. Using JRE as bait, two binding proteins, namely TcERF12 and TcERF15, were discovered. Sequence alignment and phylogenetic analysis showed that TcERF12 was related to the repressor AtERF3, while TcERF15 was more related to the activator ORA59; these are typical GCC-box-binding ethylene-responsive factors. Both could significantly respond to MeJA for 10 and 4.5 times, respectively, in 0.5 h. When the two TcERFs were overexpressed in Taxus cells, tasy gene expression decreased by 2.1 times in TcERF12-overexpressing cells, but increased by 2.5 times in TcERF15-overexpressing cells. Results indicated that TcERF12 and TcERF15 were negative and positive regulators, respectively, in the JA signal transduction to the tasy gene by binding the GCC-box in the JRE of the tasy promoter. Our results promote further research on regulatory mechanisms of taxol biosynthesis.
Asunto(s)
Ciclopentanos/metabolismo , Genes de Plantas , Oxilipinas/metabolismo , Paclitaxel/biosíntesis , Taxus/genética , Taxus/metabolismo , Región de Flanqueo 5' , Secuencia de Aminoácidos , Secuencia de Bases , ADN de Plantas/genética , Regulación de la Expresión Génica de las Plantas , Datos de Secuencia Molecular , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas Modificadas Genéticamente , Regiones Promotoras Genéticas , Homología de Secuencia de Aminoácido , Transducción de Señal , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Técnicas del Sistema de Dos HíbridosRESUMEN
KEY MESSAGE: Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data and indentified many putative genes involved in isoflavones biosynthesis in Pueraria lobata. Pueraria lobata, a member of the Leguminosae family, is a traditional Chinese herb which has been used since ancient times. P. lobata root has extensive clinical usages, because it contains a rich source of isoflavones, including daidzin and puerarin. However, the knowledge of isoflavone metabolism and the characterization of corresponding genes in such a pathway remain largely unknown. In this study, de novo transcriptome of P. lobata root and leaf was sequenced using the Solexa sequencing platform. Over 140 million high-quality reads were assembled into 163,625 unigenes, of which about 43.1% were aligned to the Nr protein database. Using the RPKM (reads per kilo bases per million reads) method, 3,148 unigenes were found to be upregulated, and 2,011 genes were downregulated in the leaf as compared to those in the root. Towards a further understanding of these differentially expressed genes, Gene ontology enrichment and metabolic pathway enrichment analyses were performed. Based on these results, 47 novel structural genes were identified in the biosynthesis of isoflavones. Also, 22 putative UDP glycosyltransferases and 45 O-methyltransferases unigenes were identified as the candidates most likely to be involved in the tailoring processes of isoflavonoid downstream pathway. Moreover, MYB transcription factors were analyzed, and 133 of them were found to have higher expression levels in the roots than in the leaves. In conclusion, the de novo transcriptome investigation of these unique transcripts provided an invaluable resource for the global discovery of functional genes related to isoflavones biosynthesis in P. lobata.
Asunto(s)
Isoflavonas/metabolismo , Proteínas de Plantas/genética , Pueraria/genética , Transcriptoma , Vías Biosintéticas , ADN Complementario/química , ADN Complementario/genética , ADN de Plantas/química , ADN de Plantas/genética , Perfilación de la Expresión Génica , Ontología de Genes , Glicosiltransferasas/clasificación , Glicosiltransferasas/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Isoflavonas/química , Metiltransferasas/clasificación , Metiltransferasas/genética , Datos de Secuencia Molecular , Hojas de la Planta/genética , Hojas de la Planta/metabolismo , Proteínas de Plantas/clasificación , Raíces de Plantas/genética , Raíces de Plantas/metabolismo , Pueraria/metabolismo , Análisis de Secuencia de ADNRESUMEN
Recently, low-rank tensor regularization has received more and more attention in hyperspectral and multispectral fusion (HMF). However, these methods often suffer from inflexible low-rank tensor definition and are highly sensitive to the permutation of tensor modes, which hinder their performance. To tackle this problem, we propose a novel generalized tensor nuclear norm (GTNN)-based approach for the HMF. First, we define a novel GTNN by extending the existing third-mode-based tensor nuclear norm (TNN) to arbitrary mode, which conducts the Fourier transform on an arbitrary single mode and then computes the TNN for each mode. In this way, we can not only capture more extensive correlations for the three modes of a tensor, and also omit the adverse effect of permutation of tensor modes. To utilize the correlations among spectral bands, the high-resolution hyperspectral image (HSI) is approximated as low-rank spectral basis multiplication by coefficients, and we estimate the spectral basis by conducting singular-value decomposition (SVD) on HSI. Then, the coefficients are estimated by addressing the proposed GTNN regularized optimization. In specific, to exploit the non-local similarities of the HSI, we first cluster the patches of the coefficient into a 3-D, which contains spatial, spectral, and non-local modes. Since the collected tensor contains the strong non-local spatial-spectral similarities of the HSI, the proposed low-rank tensor regularization is imposed on these collected tensors, which fully model the non-local self-similarities. Fusion experiments on both simulated and real datasets prove the advantages of this approach. The code is available at https://github.com/renweidian/GTNN.
RESUMEN
Spectral super-resolution has attracted the attention of more researchers for obtaining hyperspectral images (HSIs) in a simpler and cheaper way. Although many convolutional neural network (CNN)-based approaches have yielded impressive results, most of them ignore the low-rank prior of HSIs resulting in huge computational and storage costs. In addition, the ability of CNN-based methods to capture the correlation of global information is limited by the receptive field. To surmount the problem, we design a novel low-rank tensor reconstruction network (LTRN) for spectral super-resolution. Specifically, we treat the features of HSIs as 3-D tensors with low-rank properties due to their spectral similarity and spatial sparsity. Then, we combine canonical-polyadic (CP) decomposition with neural networks to design an adaptive low-rank prior learning (ALPL) module that enables feature learning in a 1-D space. In this module, there are two core modules: the adaptive vector learning (AVL) module and the multidimensionwise multihead self-attention (MMSA) module. The AVL module is designed to compress an HSI into a 1-D space by using a vector to represent its information. The MMSA module is introduced to improve the ability to capture the long-range dependencies in the row, column, and spectral dimensions, respectively. Finally, our LTRN, mainly cascaded by several ALPL modules and feedforward networks (FFNs), achieves high-quality spectral super-resolution with fewer parameters. To test the effect of our method, we conduct experiments on two datasets: the CAVE dataset and the Harvard dataset. Experimental results show that our LTRN not only is as effective as state-of-the-art methods but also has fewer parameters. The code is available at https://github.com/renweidian/LTRN.
RESUMEN
Spectral super-resolution (SSR) aims to restore a hyperspectral image (HSI) from a single RGB image, in which deep learning has shown impressive performance. However, the majority of the existing deep-learning-based SSR methods inadequately address the modeling of spatial-spectral features in HSI. That is to say, they only sufficiently capture either the spatial correlations or the spectral self-similarity, which results in a loss of discriminative spatial-spectral features and hence limits the fidelity of the reconstructed HSI. To solve this issue, we propose a novel SSR network dubbed multistage spatial-spectral fusion network (MSFN). From the perspective of network design, we build a multistage Unet-like architecture that differentially captures the multiscale features of HSI both spatialwisely and spectralwisely. It consists of two types of the self-attention mechanism, which enables the proposed network to achieve global modeling of HSI comprehensively. From the perspective of feature alignment, we innovatively design the spatial fusion module (SpatialFM) and spectral fusion module (SpectralFM), aiming to preserve the comprehensively captured spatial correlations and spectral self-similarity. In this manner, the multiscale features can be better fused and the accuracy of reconstructed HSI can be significantly enhanced. Quantitative and qualitative experiments on the two largest SSR datasets (i.e., NTIRE2022 and NTIRE2020) demonstrate that our MSFN outperforms the state-of-the-art SSR methods. The code implementation will be uploaded at https://github.com/Matsuri247/MSFN-for-Spectral-Super-Resolution.
RESUMEN
Temporal answer grounding in instructional video (TAGV) is a new task naturally derived from temporal sentence grounding in general video (TSGV). Given an untrimmed instructional video and a text question, this task aims at locating the frame span from the video that can semantically answer the question, i.e., visual answer. Existing methods tend to solve the TAGV problem with a visual span-based predictor, taking visual information to predict the start and end frames in the video. However, due to the weak correlations between the semantic features of the textual question and visual answer, current methods using the visual span-based predictor do not work well in the TAGV task. In this paper, we propose a visual-prompt text span localization (VPTSL) method, which introduces the timestamped subtitles for a text span-based predictor. Specifically, the visual prompt is a learnable feature embedding, which brings visual knowledge to the pre-trained language model. Meanwhile, the text span-based predictor learns joint semantic representations from the input text question, video subtitles, and visual prompt feature with the pre-trained language model. Thus, the TAGV is reformulated as the task of the visual-prompt subtitle span localization for the visual answer. Extensive experiments on five instructional video datasets, namely MedVidQA, TutorialVQA, VehicleVQA, CrossTalk and Coin, show that the proposed method outperforms several state-of-the-art (SOTA) methods by a large margin in terms of mIoU score, which demonstrates the effectiveness of the proposed visual prompt and text span-based predictor. Besides, all the experimental codes and datasets are open-sourced on the website https://github.com/wengsyx/VPTSL.
RESUMEN
Infrared image (IR) and visible image (VI) fusion creates fusion images that contain richer information and gain improved visual effects. Existing methods generally use the operators of manual design, such as intensity and gradient operators, to mine the image information. However, it is hard for them to achieve a complete and accurate description of information, which limits the image fusion performance. To this end, a novel information measurement method is proposed to achieve IR and VI fusion. Its core idea is to guide a generator in achieving image fusion by learning the denoisers. Specifically, by using denoisers to restore fusion images with different noise interference to source images, a mutual competition relationship is formed between denoisers, which helps the generator thoroughly explore the data specificity of the source images and guide it to achieve more accurate feature representation. In addition, a semantic adaptive measurement loss function is proposed to constrain the generator, which fuses semantic information adaptively by considering the semantic information density of different source images. The results of quantitative and qualitative experiments have shown that the proposed method can achieve a higher quality information fusion and has a faster fusion speed on three public datasets when compared with advanced methods.
RESUMEN
Image geo-localization aims to locate a query image from source platform (e.g., drones, street vehicle) by matching it with Geo-tagged reference images from the target platforms (e.g., different satellites). Achieving cross-modal or cross-view real-time (>30fps) image localization with the guaranteed accuracy in a unified framework remains a challenge due to the huge differences in modalities and views between the two platforms. In order to solve this problem, a novel fine-grained overlap estimation based image geo-localization method is proposed in this paper, the core of which is to estimate the salient and subtle overlapping regions in image pairs to ensure correct matching. Specifically, the high-level semantic features of input images are extracted by a deep convolutional neural network. Then, a novel overlap scanning module (OSM) is presented to mine the long-range spatial and channel dependencies of semantic features in various subspaces, thereby identifying fine-grained overlapping regions. Finally, we adopt the triplet ranking loss to guide the proposed network optimization so that the matching regions are as close as possible and the most mismatched regions are as far away as possible. To demonstrate the effectiveness of our FOENet, comprehensive experiments are conducted on three cross-view benchmarks and one cross-modal benchmark. Our FOENet yields better performance in various metrics and the recall accuracy at top 1 (R@1) is significantly improved, with a maximum improvement of 70.6%. In addition, the proposed model runs fast on a single RTX 6000, reaching real-time inference speed on all datasets, with the fastest being 82.3 FPS.
RESUMEN
Interactive image segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre-and post-processing for IIS, but the critical issue of interaction ambiguity, notably hindering segmentation quality, has been under-researched. To address this, we introduce AdaptiveClick - a click-aware transformer incorporating an adaptive focal loss (AFL) that tackles annotation inconsistencies with tools for mask-and pixel-level ambiguity resolution. To the best of our knowledge, AdaptiveClick is the first transformer-based, mask-adaptive segmentation framework for IIS. The key ingredient of our method is the click-aware mask-adaptive transformer decoder (CAMD), which enhances the interaction between click and image features. Additionally, AdaptiveClick enables pixel-adaptive differentiation of hard and easy samples in the decision space, independent of their varying distributions. This is primarily achieved by optimizing a generalized AFL with a theoretical guarantee, where two adaptive coefficients control the ratio of gradient values for hard and easy pixels. Our analysis reveals that the commonly used Focal and BCE losses can be considered special cases of the proposed AFL. With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods. The source code is publicly available at https://github.com/lab206/AdaptiveClick.
RESUMEN
Interactive image segmentation (IIS) has been widely used in various fields, such as medicine, industry, etc. However, some core issues, such as pixel imbalance, remain unresolved so far. Different from existing methods based on pre-processing or post-processing, we analyze the cause of pixel imbalance in depth from the two perspectives of pixel number and pixel difficulty. Based on this, a novel and unified Click-pixel Cognition Fusion network with Balanced Cut (CCF-BC) is proposed in this paper. On the one hand, the Click-pixel Cognition Fusion (CCF) module, inspired by the human cognition mechanism, is designed to increase the number of click-related pixels (namely, positive pixels) being correctly segmented, where the click and visual information are fully fused by using a progressive three-tier interaction strategy. On the other hand, a general loss, Balanced Normalized Focal Loss (BNFL), is proposed. Its core is to use a group of control coefficients related to sample gradients and forces the network to pay more attention to positive and hard-to-segment pixels during training. As a result, BNFL always tends to obtain a balanced cut of positive and negative samples in the decision space. Theoretical analysis shows that the commonly used Focal and BCE losses can be regarded as special cases of BNFL. Experiment results of five well-recognized datasets have shown the superiority of the proposed CCF-BC method compared to other state-of-the-art methods. The source code is publicly available at https://github.com/lab206/CCF-BC.
RESUMEN
To improve the thermal and combustion properties of nanothermites, a design theory of changing the state of matter and structural state of the reactants during reaction was proposed. The Al/MoO3/KClO4 (Kp) nanothermite was prepared and the Al/MoO3 nanothermite was used as a control. SEM and XRD were used to characterize the nanothermites; DSC was used to test thermal properties; and constant volume and open combustion tests were performed to examine their combustion performance. Phase and morphology characterization of the combustion products were performed to reveal the mechanism of the aluminothermic reaction. The results show that the Al/MoO3/Kp nanothermite exhibited excellent thermal properties, with a total heat release of 1976 J·g- 1, increasing by approximately 33% of 1486 J·g- 1 of the Al/MoO3 nanothermite, and activation energy of 269.66 kJ·mol- 1, which demonstrated higher stability than the Al/MoO3 nanothermite (205.64 kJ·mol- 1). During the combustion test, the peak pressure of the Al/MoO3/Kp nanothermite was 0.751 MPa, and the average pressure rise rate was 25.03 MPa·s- 1, much higher than 0.188 MPa and 6.27 MPa·s- 1 of the Al/MoO3 nanothermite. The combustion products of Al/MoO3 nanothermite were Al2O3, MoO, and Mo, indicating insufficient combustion and incomplete reaction, whereas, the combustion products of Al/MoO3/Kp nanothermite were Al2O3, MoO, and KCl, indicating complete reaction. Their "coral-like" morphology was the effect of reactants solidifying after melting during the combustion process. The characterization of reactants and pressure test during combustion reveals the three stages of aluminothermic reaction in thermites. The excellent thermal and combustion performance of Al/MoO3/Kp nanothermite is attributed to the melt and decomposition of Kp into O2 in the third stage. This study provides new ideas and guidance for the design of high-performance nanothermites.
RESUMEN
Two bis(tridentate) Schiff base ligands H2L(x) were used to construct three 2×2 grid-type tetranuclear Fe(II) complexes 1-3 to obtain polynuclear spin-crossover materials. Magnetic susceptibility measurements show that the spin states of the complexes are related to the substituents of H2L(x), and that spin transition occurs only in complexes 1 and 2, which are derived from a bulky ligand, whereas complex 3 is diamagnetic. The transition temperatures of complexes 1 and 2 are close to room temperature and are dependent on counteranions. The spin transition of complex 1 can be reversibly tuned by the dehydration and hydration process.
RESUMEN
Graphs are essential to improve the performance of graph-based machine learning methods, such as spectral clustering. Various well-designed methods have been proposed to learn graphs that depict specific properties of real-world data. Joint learning of knowledge in different graphs is an effective means to uncover the intrinsic structure of samples. However, the existing methods fail to simultaneously mine the global and local information related to sample structure and distribution when multiple graphs are available, and further research is needed. Hence, we propose a novel intrinsic graph learning (IGL) with discrete constrained diffusion-fusion to solve the above problem in this article. In detail, given a set of the predefined graphs, IGL first obtains the graph encoding the global high-order manifold structure via the diffusion-fusion mechanism based on the tensor product graph. Then, two discrete operators are integrated to fine-prune the obtained graph. One of them limits the maximum number of neighbors connected to each sample, thereby removing redundant and erroneous edges. The other one forces the rank of the Laplacian matrix of the obtained graph to be equal to the number of sample clusters, which guarantees that samples from the same subgraph belong to the same cluster and vice versa. Moreover, a new strategy of weight learning is designed to accurately quantify the contribution of pairwise predefined graphs in the optimization process. Extensive experiments on six single-view and two multiview datasets have demonstrated that our proposed method outperforms the previous state-of-the-art methods on the clustering task.
RESUMEN
Recently, fusing a low-resolution hyperspectral image (LR-HSI) with a high-resolution multispectral image (HR-MSI) of different satellites has become an effective way to improve the resolution of an HSI. However, due to different imaging satellites, different illumination, and adjacent imaging time, the LR-HSI and HR-MSI may not satisfy the observation models established by existing works, and the LR-HSI and HR-MSI are hard to be registered. To solve the above problems, we establish new observation models for LR-HSIs and HR-MSIs from different satellites, then a deep-learning-based framework is proposed to solve the key steps in multi-satellite HSI fusion, including image registration, blur kernel learning, and image fusion. Specifically, we first construct a convolutional neural network (CNN), called RegNet, to produce pixel-wise offsets between LR-HSI and HR-MSI, which are utilized to register the LR-HSI. Next, according to the new observation models, a tiny network, called BKLNet, is built to learn the spectral and spatial blur kernels, where the BKLNet and RegNet can be trained jointly. In the fusion part, we further train a FusNet by downsampling the registered data with the learned spatial blur kernel. Extensive experiments demonstrate the superiority of the proposed framework in HSI registration and fusion accuracy.
RESUMEN
Fusing hyperspectral images (HSIs) with multispectral images (MSIs) of higher spatial resolution has become an effective way to sharpen HSIs. Recently, deep convolutional neural networks (CNNs) have achieved promising fusion performance. However, these methods often suffer from the lack of training data and limited generalization ability. To address the above problems, we present a zero-shot learning (ZSL) method for HSI sharpening. Specifically, we first propose a novel method to quantitatively estimate the spectral and spatial responses of imaging sensors with high accuracy. In the training procedure, we spatially subsample the MSI and HSI based on the estimated spatial response and use the downsampled HSI and MSI to infer the original HSI. In this way, we can not only exploit the inherent information in the HSI and MSI, but the trained CNN can also be well generalized to the test data. In addition, we take the dimension reduction on the HSI, which reduces the model size and storage usage without sacrificing fusion accuracy. Furthermore, we design an imaging model-based loss function for CNN, which further boosts the fusion performance. The experimental results show the significantly high efficiency and accuracy of our approach.
RESUMEN
Automatic speech recognition (ASR) is the major human-machine interface in many intelligent systems, such as intelligent homes, autonomous driving, and servant robots. However, its performance usually significantly deteriorates in the presence of external noise, leading to limitations of its application scenes. The audio-visual speech recognition (AVSR) takes visual information as a complementary modality to enhance the performance of audio speech recognition effectively, particularly in noisy conditions. Recently, the transformer-based architectures have been used to model the audio and video sequences for the AVSR, which achieves a superior performance. However, its performance may be degraded in these architectures due to extracting irrelevant information while modeling long-term dependences. In addition, the motion feature is essential for capturing the spatio-temporal information within the lip region to best utilize visual sequences but has not been considered in the AVSR tasks. Therefore, we propose a multimodal sparse transformer network (MMST) in this article. The sparse self-attention mechanism can improve the concentration of attention on global information by selecting the most relevant parts wisely. Moreover, the motion features are seamlessly introduced into the MMST model. We subtly allow motion-modality information to flow into visual modality through the cross-modal attention module to enhance visual features, thereby further improving recognition performance. Extensive experiments conducted on different datasets validate that our proposed method outperforms several state-of-the-art methods in terms of the word error rate (WER).