Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

Identification of the continuum field structure at multiple scale levels.

Wang, Lipo; Mei, Xinyu.

Chaos ; 34(5)2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38717414

RESUMO

For continuum fields such as turbulence, analyses of the field structure offer insights into their kinematic and dynamic properties. To ensure the analyses are quantitative rather than merely illustrative, two conditions are essential: space-filling and structure quantification. A pertinent example is the dissipation element (DE) structure, which is however susceptible to noisy interference, rendering it inefficient for extracting the large-scale features of the field. In this study, the multi-level DE structure is proposed based on the multi-level extremal point concept. At a given scale level, the entire field can be decomposed into the corresponding space-filling and non-overlapping DEs, each characterized by its length scale l and the scalar difference ΔÏ between its two extremal points. We will first elaborate on the fundamental principles of this method. Results from an artificially constructed two-scale field indicate that the decomposed units adequately represent the geometry of the original field. In examining the fractal Brownian motion, a structure function equivalent ⟨ΔÏ|l⟩ and an energy spectrum equivalent are introduced. The scaling relation derived from ⟨ΔÏ|l⟩ corresponds with the Hurst number. Furthermore, the multi-level DE structure distinctly reveals the two different inertial ranges in two-dimensional turbulence. Overall, this novel structure identification approach holds significant potential for complex analyses concerning the field geometry.

2.

Subject matching for cross-subject EEG-based recognition of driver states related to situation awareness.

Li, Ruilin; Wang, Lipo; Sourina, Olga.

Methods ; 202: 136-143, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-33845126

RESUMO

Situation awareness (SA) has received much attention in recent years because of its importance for operators of dynamic systems. Electroencephalography (EEG) can be used to measure mental states of operators related to SA. However, cross-subject EEG-based SA recognition is a critical challenge, as data distributions of different subjects vary significantly. Subject variability is considered as a domain shift problem. Several attempts have been made to find domain-invariant features among subjects, where subject-specific information is neglected. In this work, we propose a simple but efficient subject matching framework by finding a connection between a target (test) subject and source (training) subjects. Specifically, the framework includes two stages: (1) we train the model with multi-source domain alignment layers to collect source domain statistics. (2) During testing, a distance is computed to perform subject matching in the latent representation space. We use a reciprocal exponential function as a similarity measure to dynamically select similar source subjects. Experiment results show that our framework achieves a state-of-the-art accuracy 74.32% for the Taiwan driving dataset.

Assuntos

Conscientização , Eletroencefalografia , Algoritmos , Eletroencefalografia/métodos , Humanos

3.

Glaucoma screening using an attention-guided stereo ensemble network.

Liu, Yuan; Yip, Leonard Wei Leon; Zheng, Yuanjin; Wang, Lipo.

Methods ; 202: 14-21, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-34153436

RESUMO

Glaucoma is a chronic eye disease, which causes gradual vision loss and eventually blindness. Accurate glaucoma screening at early stage is critical to mitigate its aggravation. Extracting high-quality features are critical in training of classification models. In this paper, we propose a deep ensemble network with attention mechanism that detects glaucoma using optic nerve head stereo images. The network consists of two main sub-components, a deep Convolutional Neural Network that obtains global information and an Attention-Guided Network that localizes optic disc while maintaining beneficial information from other image regions. Both images in a stereo pair are fed into these sub-components, the outputs are fused together to generate the final prediction result. Abundant image features from different views and regions are being extracted, providing compensation when one of the stereo images is of poor quality. The attention-based localization method is trained in a weakly-supervised manner and only image-level annotation is required, which avoids expensive segmentation labelling. Results from real patient images show that our approach increases recall (sensitivity) from the state-of-the-art 88.89% to 95.48%, while maintaining precision and performance stability. The marked reduction in false-negative rate can significantly enhance the chance of successful early diagnosis of glaucoma.

Assuntos

Glaucoma , Disco Óptico , Técnicas de Diagnóstico Oftalmológico , Glaucoma/diagnóstico por imagem , Humanos , Programas de Rastreamento , Redes Neurais de Computação , Disco Óptico/diagnóstico por imagem

4.

BCAS2, a protein enriched in advanced prostate cancer, interacts with NBS1 to enhance DNA double-strand break repair.

Wang, Li-Po; Chen, Tzu-Yu; Kang, Chun-Kai; Huang, Hsiang-Po; Chen, Show-Li.

Br J Cancer ; 123(12): 1796-1807, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-32963349

RESUMO

BACKGROUND: Breast cancer amplified sequence 2 (BCAS2) plays crucial roles in pre-mRNA splicing and androgen receptor transcription. Previous studies suggested that BCAS2 is involved in double-strand breaks (DSB); therefore, we aimed to characterise its mechanism and role in prostate cancer (PCa). METHODS: Western blotting and immunofluorescence microscopy were used to assay the roles of BCAS2 in the DSBs of PCa cells and apoptosis in Drosophila, respectively. The effect of BCAS2 dosage on non-homologous end joining (NHEJ) and homologous recombination (HR) were assayed by precise end-joining assay and flow cytometry, respectively. Glutathione-S-transferase pulldown and co-immunoprecipitation assays were used to determine whether and how BCAS2 interacts with NBS1. The expression of BCAS2 and other proteins in human PCa was determined by immunohistochemistry. RESULTS: BCAS2 helped repair radiation-induced DSBs efficiently in both human PCa cells and Drosophila. BCAS2 enhanced both NHEJ and HR, possibly by interacting with NBS1, which involved the BCAS2 N-terminus as well as both the NBS1 N- and C-termini. The overexpression of BCAS2 was significantly associated with higher Gleason and pathology grades and shorter survival in patients with PCa. CONCLUSION: BCAS2 promotes two DSB repair pathways by interacting with NBS1, and it may affect PCa progression.

Assuntos

Proteínas de Ciclo Celular/metabolismo , Quebras de DNA de Cadeia Dupla , Reparo do DNA por Junção de Extremidades/fisiologia , Proteínas de Neoplasias/metabolismo , Proteínas Nucleares/metabolismo , Neoplasias da Próstata/metabolismo , Animais , Apoptose/genética , DNA/efeitos da radiação , Enzimas Reparadoras do DNA/metabolismo , Drosophila/genética , Humanos , Masculino , Gradação de Tumores , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia

5.

Optimization of Submodularity and BBO-Based Routing Protocol for Wireless Sensor Deployment.

Wang, Yaoli; Duan, Yujun; Di, Wenxia; Chang, Qing; Wang, Lipo.

Sensors (Basel) ; 20(5)2020 Feb 27.

Artigo em Inglês | MEDLINE | ID: mdl-32120900

RESUMO

Wireless sensors are limited by node costs, communication efficiency, and energy consumption when wireless sensors are deployed on a large scale. The use of submodular optimization can reduce the deployment cost. This paper proposes a sensor deployment method based on the Improved Heuristic Ant Colony Algorithm-Chaos Optimization of Padded Sensor Placements at Informative and cost-Effective Locations (IHACA-COpSPIEL) algorithm and a routing protocol based on an improved Biogeography-Based Optimization (BBO) algorithm. First, a mathematical model with submodularity is established. Second, the IHACA is combined with pSPIEL-based on chaos optimization to determine the shortest path. Finally, the selected sensors are used in the biogeography of the improved BBO routing protocols to transmit data. The experimental results show that the IHACA-COpSPIEL algorithm can go beyond the local optimal solutions, and the communication cost of IHACA-COpSPIEL is 38.42%, 24.19% and 8.31%, respectively, lower than that of the greedy algorithm, the pSPIEL algorithm and the IHACA algorithm. It uses fewer sensors and has a longer life cycle. Compared with the LEACH protocol, the routing protocol based on the improved BBO extends the life cycle by 30.74% and has lower energy consumption.

6.

3D Deep Learning on Medical Images: A Review.

Singh, Satya P; Wang, Lipo; Gupta, Sukrit; Goli, Haveesh; Padmanabhan, Parasuraman; Gulyás, Balázs.

Sensors (Basel) ; 20(18)2020 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-32906819

RESUMO

The rapid advancements in machine learning, graphics processing technologies and the availability of medical imaging data have led to a rapid increase in the use of deep learning models in the medical domain. This was exacerbated by the rapid advancements in convolutional neural network (CNN) based architectures, which were adopted by the medical imaging community to assist clinicians in disease diagnosis. Since the grand success of AlexNet in 2012, CNNs have been increasingly used in medical image analysis to improve the efficiency of human clinicians. In recent years, three-dimensional (3D) CNNs have been employed for the analysis of medical images. In this paper, we trace the history of how the 3D CNN was developed from its machine learning roots, we provide a brief mathematical description of 3D CNN and provide the preprocessing steps required for medical images before feeding them to 3D CNNs. We review the significant research in the field of 3D medical imaging analysis using 3D CNNs (and its variants) in different medical areas such as classification, segmentation, detection and localization. We conclude by discussing the challenges associated with the use of 3D CNNs in the medical imaging domain (and the use of deep learning models in general) and possible future trends in the field.

Assuntos

Aprendizado Profundo , Imageamento Tridimensional , Humanos , Aprendizado de Máquina , Redes Neurais de Computação

7.

Image Thresholding Improves 3-Dimensional Convolutional Neural Network Diagnosis of Different Acute Brain Hemorrhages on Computed Tomography Scans.

Ker, Justin; Singh, Satya P; Bai, Yeqi; Rao, Jai; Lim, Tchoyoson; Wang, Lipo.

Sensors (Basel) ; 19(9)2019 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-31083289

RESUMO

Intracranial hemorrhage is a medical emergency that requires urgent diagnosis and immediate treatment to improve patient outcome. Machine learning algorithms can be used to perform medical image classification and assist clinicians in diagnosing radiological scans. In this paper, we apply 3-dimensional convolutional neural networks (3D CNN) to classify computed tomography (CT) brain scans into normal scans (N) and abnormal scans containing subarachnoid hemorrhage (SAH), intraparenchymal hemorrhage (IPH), acute subdural hemorrhage (ASDH) and brain polytrauma hemorrhage (BPH). The dataset used consists of 399 volumetric CT brain images representing approximately 12,000 images from the National Neuroscience Institute, Singapore. We used a 3D CNN to perform both 2-class (normal versus a specific abnormal class) and 4-class classification (between normal, SAH, IPH, ASDH). We apply image thresholding at the image pre-processing step, that improves 3D CNN classification accuracy and performance by accentuating the pixel intensities that contribute most to feature discrimination. For 2-class classification, the F1 scores for various pairs of medical diagnoses ranged from 0.706 to 0.902 without thresholding. With thresholding implemented, the F1 scores improved and ranged from 0.919 to 0.952. Our results are comparable to, and in some cases, exceed the results published in other work applying 3D CNN to CT or magnetic resonance imaging (MRI) brain scan classification. This work represents a direct application of a 3D CNN to a real hospital scenario involving a medically emergent CT brain diagnosis.

Assuntos

Redes Neurais de Computação , Algoritmos , Encéfalo/diagnóstico por imagem , Humanos , Imageamento Tridimensional/métodos , Aprendizado de Máquina , Imageamento por Ressonância Magnética/métodos

8.

Machine learning methods for bio-medical image and signal processing: Recent advances.

Wang, Lipo; Sourina, Olga; Erdt, Marius; Wang, Yaoli; Chang, Qing.

Methods ; 202: 1-2, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35314099

Assuntos

Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Processamento de Imagem Assistida por Computador/métodos

9.

Feature selection methods for big data bioinformatics: A survey from the search perspective.

Wang, Lipo; Wang, Yaoli; Chang, Qing.

Methods ; 111: 21-31, 2016 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-27592382

RESUMO

This paper surveys main principles of feature selection and their recent applications in big data bioinformatics. Instead of the commonly used categorization into filter, wrapper, and embedded approaches to feature selection, we formulate feature selection as a combinatorial optimization or search problem and categorize feature selection methods into exhaustive search, heuristic search, and hybrid methods, where heuristic search methods may further be categorized into those with or without data-distilled feature ranking measures.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Software , Algoritmos , Inteligência Artificial , Biologia Computacional/tendências , Mineração de Dados/tendências , Humanos

10.

Learning ECOC Code Matrix for Multiclass Classification with Application to Glaucoma Diagnosis.

Bai, Xiaolong; Niwas, Swamidoss Issac; Lin, Weisi; Ju, Bing-Feng; Kwoh, Chee Keong; Wang, Lipo; Sng, Chelvin C; Aquino, Maria C; Chew, Paul T K.

J Med Syst ; 40(4): 78, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26798075

RESUMO

Classification of different mechanisms of angle closure glaucoma (ACG) is important for medical diagnosis. Error-correcting output code (ECOC) is an effective approach for multiclass classification. In this study, we propose a new ensemble learning method based on ECOC with application to classification of four ACG mechanisms. The dichotomizers in ECOC are first optimized individually to increase their accuracy and diversity (or interdependence) which is beneficial to the ECOC framework. Specifically, the best feature set is determined for each possible dichotomizer and a wrapper approach is applied to evaluate the classification accuracy of each dichotomizer on the training dataset using cross-validation. The separability of the ECOC codes is maximized by selecting a set of competitive dichotomizers according to a new criterion, in which a regularization term is introduced in consideration of the binary classification performance of each selected dichotomizer. The proposed method is experimentally applied for classifying four ACG mechanisms. The eye images of 152 glaucoma patients are collected by using anterior segment optical coherence tomography (AS-OCT) and then segmented, from which 84 features are extracted. The weighted average classification accuracy of the proposed method is 87.65 % based on the results of leave-one-out cross-validation (LOOCV), which is much better than that of the other existing ECOC methods. The proposed method achieves accurate classification of four ACG mechanisms which is promising to be applied in diagnosis of glaucoma.

Assuntos

Diagnóstico por Computador/métodos , Glaucoma de Ângulo Fechado/diagnóstico , Aprendizado de Máquina , Humanos , Sensibilidade e Especificidade , Tomografia de Coerência Óptica

11.

Entropy-driven Adversarial Training For Source-free Medical Image Segmentation.

Liqiang, Yuan; Erdt, Marius; Wang, Lipo; Siyal, Mohammed Yakoob; Cui, Jian.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-7, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-38083377

RESUMO

Although traditional unsupervised domain adaptation (UDA) methods have proven effective in reducing domain gaps, their reliance on source domain data during adaptation often proves unfeasible in real-world applications. For instance, data access in a hospital setting is typically constrained due to patient privacy regulations. To address both the need for privacy protection and the mitigation of domain shifts between source and target domain data, we propose a novel two-step adversarial Source-Free Unsupervised Domain Adaptation (SFUDA) framework in this study. Our approach involves dividing the target domain data into confident and unconfident samples based on prediction entropy, using the Gumbel softmax technique. Confident samples are then treated as source domain data. In order to emulate adversarial training from traditional UDA methods, we employ a min-max loss in the first step, followed by a consistency loss in the second step. Additionally, we introduce a weight to penalize the L2-SP regularizer, which prevents excessive loss of source domain knowledge during optimization. Through extensive experiments on two distinct domain transfer challenges, our proposed SFUDA framework consistently outperforms other SFUDA methods. Remarkably, our approach even achieves competitive results when compared to state-of-the-art UDA methods, which benefit from direct access to source domain data. This demonstrates the potential of our novel SFUDA framework in addressing the limitations of traditional UDA methods while preserving patient privacy in sensitive applications.

Assuntos

Hospitais , Privacidade , Humanos , Entropia

12.

Transfer-recursive-ensemble learning for multi-day COVID-19 prediction in India using recurrent neural networks.

Chakraborty, Debasrita; Goswami, Debayan; Ghosh, Susmita; Ghosh, Ashish; Chan, Jonathan H; Wang, Lipo.

Sci Rep ; 13(1): 6795, 2023 04 26.

Artigo em Inglês | MEDLINE | ID: mdl-37100806

RESUMO

The COVID-19 pandemic has put a huge challenge on the Indian health infrastructure. With a larger number of people getting affected during the second wave, hospitals were overburdened, running out of supplies and oxygen. Hence, predicting new COVID-19 cases, new deaths, and total active cases multiple days in advance can aid better utilization of scarce medical resources and prudent pandemic-related decision-making. The proposed method uses gated recurrent unit networks as the main predicting model. A study is conducted by building four models pre-trained on COVID-19 data from four different countries (United States of America, Brazil, Spain, and Bangladesh) and fine-tuned on India's data. Since the four countries chosen have experienced different types of infection curves, the pre-training provides a transfer learning to the models incorporating diverse situations into account. Each of the four models then gives 7-day ahead predictions using the recursive learning method for the Indian test data. The final prediction comes from an ensemble of the predictions of the different models. This method with two countries, Spain and Bangladesh, is seen to achieve the best performance amongst all the combinations as well as compared to other traditional regression models.

Assuntos

COVID-19 , Pandemias , Humanos , COVID-19/epidemiologia , Índia/epidemiologia , Redes Neurais de Computação , Aprendizado de Máquina

13.

Sample-Based Data Augmentation Based on Electroencephalogram Intrinsic Characteristics.

Li, Ruilin; Wang, Lipo; Suganthan, P N; Sourina, Olga.

IEEE J Biomed Health Inform ; 26(10): 4996-5003, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-35737622

RESUMO

Deep learning for electroencephalogram-based classification is confronted with data scarcity, due to the time-consuming and expensive data collection procedure. Data augmentation has been shown as an effective way to improve data efficiency. In addition, contrastive learning has recently been shown to hold great promise in learning effective representations without human supervision, which has the potential to improve the electroencephalogram-based recognition performance with limited labeled data. However, heavy data augmentation is a key ingredient of contrastive learning. In view of the limited number of sample-based data augmentation in electroencephalogram processing, three methods, performance-measure-based time warp, frequency noise addition and frequency masking, are proposed based on the characteristics of electroencephalogram signal. These methods are parameter learning free, easy to implement, and can be applied to individual samples. In the experiment, the proposed data augmentation methods are evaluated on three electroencephalogram-based classification tasks, including situation awareness recognition, motor imagery classification and brain-computer interface steady-state visually evoked potentials speller system. Results demonstrated that the convolutional models trained with the proposed data augmentation methods yielded significantly improved performance over baselines. In overall, this work provides more potential methods to cope with the problem of limited data and boost the classification performance in electroencephalogram processing.

Assuntos

Interfaces Cérebro-Computador , Algoritmos , Eletroencefalografia/métodos , Potenciais Evocados , Humanos , Imaginação/fisiologia

14.

Deep and Domain Transfer Learning Aided Photoacoustic Microscopy: Acoustic Resolution to Optical Resolution.

Zhang, Zhengyuan; Jin, Haoran; Zheng, Zesheng; Sharma, Arunima; Wang, Lipo; Pramanik, Manojit; Zheng, Yuanjin.

IEEE Trans Med Imaging ; 41(12): 3636-3648, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-35849667

RESUMO

Acoustic resolution photoacoustic micros- copy (AR-PAM) can achieve deeper imaging depth in biological tissue, with the sacrifice of imaging resolution compared with optical resolution photoacoustic microscopy (OR-PAM). Here we aim to enhance the AR-PAM image quality towards OR-PAM image, which specifically includes the enhancement of imaging resolution, restoration of micro-vasculatures, and reduction of artifacts. To address this issue, a network (MultiResU-Net) is first trained as generative model with simulated AR-OR image pairs, which are synthesized with physical transducer model. Moderate enhancement results can already be obtained when applying this model to in vivo AR imaging data. Nevertheless, the perceptual quality is unsatisfactory due to domain shift. Further, domain transfer learning technique under generative adversarial network (GAN) framework is proposed to drive the enhanced image's manifold towards that of real OR image. In this way, perceptually convincing AR to OR enhancement result is obtained, which can also be supported by quantitative analysis. Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) values are significantly increased from 14.74 dB to 19.01 dB and from 0.1974 to 0.2937, respectively, validating the improvement of reconstruction correctness and overall perceptual quality. The proposed algorithm has also been validated across different imaging depths with experiments conducted in both shallow and deep tissue. The above AR to OR domain transfer learning with GAN (AODTL-GAN) framework has enabled the enhancement target with limited amount of matched in vivo AR-OR imaging data.

Assuntos

Microscopia , Técnicas Fotoacústicas , Microscopia/métodos , Técnicas Fotoacústicas/métodos , Razão Sinal-Ruído , Acústica , Aprendizado de Máquina

15.

Scaling of the two-point velocity difference along scalar gradient trajectories in fluid turbulence.

Wang, Lipo.

Phys Rev E Stat Nonlin Soft Matter Phys ; 79(4 Pt 2): 046325, 2009 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-19518351

RESUMO

In the context of dissipation element analysis of scalar fields in turbulence [L. Wang and N. Peters, J. Fluid Mech. 608, 113 (2008)], the elongation of elements by the velocity difference at the minimum and maximum points was found to increase linearly with the length of an element. This paper attempts to provide a theoretical basis for this finding by analyzing two-point properties along the gradient trajectories, of which dissipation elements consist. An equation of the two-point correlation function for the product of the scalar gradient along the same trajectory can be obtained. Similar to the idea of deriving Kolmogorov's 4/5 law, there exist a scaling in the inertial range for the velocity difference, however, not same as Kolmogorov's 1/3 scaling. Specifically, by conditioning on gradient trajectories we obtain a linear relation between the velocity difference and the arclength between two points on the same trajectory. Results from direct numerical simulation (DNS) show satisfactory agreement with the theoretical prediction. This result and the derivation thereof may generally be helpful for broad stream of similar statistical and scaling studies of turbulent flows.

16.

Automated brain histology classification using machine learning.

Ker, Justin; Bai, Yeqi; Lee, Hwei Yee; Rao, Jai; Wang, Lipo.

J Clin Neurosci ; 66: 239-245, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-31155342

RESUMO

Brain and breast tumors cause significant morbidity and mortality worldwide. Accurate and expedient histological diagnosis of patients' tumor specimens is required for subsequent treatment and prognostication. Currently, histology slides are visually inspected by trained pathologists, but this process is both time and labor-intensive. In this paper, we propose an automated process to classify histology slides of both brain and breast tissues using the Google Inception V3 convolutional neural network (CNN). We report successful automated classification of brain histology specimens into normal, low grade glioma (LGG) or high grade glioma (HGG). We also report for the first time the benefit of transfer learning across different tissue types. Pre-training on a brain tumor classification task improved CNN performance accuracy in a separate breast tumor classification task, with the F1 score improving from 0.547 to 0.913. We constructed a dataset using brain histology images from our own hospital and a public breast histology image dataset. Our proposed method can assist human pathologists in the triage and inspection of histology slides to expedite medical care. It can also improve CNN performance in cases where the training data is limited, for example in rare tumors, by applying the learned model weights from a more common tissue type.

Assuntos

Neoplasias Encefálicas/classificação , Encéfalo , Glioma/classificação , Aprendizado de Máquina , Redes Neurais de Computação , Encéfalo/patologia , Neoplasias Encefálicas/patologia , Glioma/patologia , Humanos

17.

Effective selection of informative SNPs and classification on the HapMap genotype data.

Zhou, Nina; Wang, Lipo.

BMC Bioinformatics ; 8: 484, 2007 Dec 20.

Artigo em Inglês | MEDLINE | ID: mdl-18093342

RESUMO

BACKGROUND: Since the single nucleotide polymorphisms (SNPs) are genetic variations which determine the difference between any two unrelated individuals, the SNPs can be used to identify the correct source population of an individual. For efficient population identification with the HapMap genotype data, as few informative SNPs as possible are required from the original 4 million SNPs. Recently, Park et al. (2006) adopted the nearest shrunken centroid method to classify the three populations, i.e., Utah residents with ancestry from Northern and Western Europe (CEU), Yoruba in Ibadan, Nigeria in West Africa (YRI), and Han Chinese in Beijing together with Japanese in Tokyo (CHB+JPT), from which 100,736 SNPs were obtained and the top 82 SNPs could completely classify the three populations. RESULTS: In this paper, we propose to first rank each feature (SNP) using a ranking measure, i.e., a modified t-test or F-statistics. Then from the ranking list, we form different feature subsets by sequentially choosing different numbers of features (e.g., 1, 2, 3, ..., 100.) with top ranking values, train and test them by a classifier, e.g., the support vector machine (SVM), thereby finding one subset which has the highest classification accuracy. Compared to the classification method of Park et al., we obtain a better result, i.e., good classification of the 3 populations using on average 64 SNPs. CONCLUSION: Experimental results show that the both of the modified t-test and F-statistics method are very effective in ranking SNPs about their classification capabilities. Combined with the SVM classifier, a desirable feature subset (with the minimum size and most informativeness) can be quickly found in the greedy manner after ranking all SNPs. Our method is able to identify a very small number of important SNPs that can determine the populations of individuals.

Assuntos

Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Bases de Dados Genéticas , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Variação Genética/genética , Genótipo , Dados de Sequência Molecular

18.

Accurate cancer classification using expressions of very few genes.

Wang, Lipo; Chu, Feng; Xie, Wei.

IEEE/ACM Trans Comput Biol Bioinform ; 4(1): 40-53, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17277412

RESUMO

We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) It greatly reduces the computational burden and "noise" arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers. 2) It simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly. 3) It calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small" and "simple" data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large" and "complex" data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2-step approach to each of these binary classification problems. Through this "divide-and-conquer" approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis.

Assuntos

Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias/classificação , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Lógica Fuzzy , Perfilação da Expressão Gênica , Humanos , Neoplasias Hepáticas/classificação , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/genética , Linfoma/classificação , Linfoma/diagnóstico , Linfoma/genética , Neoplasias/diagnóstico , Neoplasias/genética , Redes Neurais de Computação , Análise de Sequência com Séries de Oligonucleotídeos

19.

A modified T-test feature selection method and its application on the HapMap genotype data.

Zhou, Nina; Wang, Lipo.

Genomics Proteomics Bioinformatics ; 5(3-4): 242-9, 2007 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-18267305

RESUMO

Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genômica/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Algoritmos , Biologia Computacional , Genética Médica/estatística & dados numéricos , Genética Populacional , Genótipo , Humanos

20.

Intrinsic flow structure and multifractality in two-dimensional bacterial turbulence.

Wang, Lipo; Huang, Yongxiang.

Phys Rev E ; 95(5-1): 052215, 2017 May.

Artigo em Inglês | MEDLINE | ID: mdl-28618644

RESUMO

The active interaction between the bacteria and fluid generates turbulent structures even at zero Reynolds number. The velocity of such a flow obtained experimentally has been quantitatively investigated based on streamline segment analysis. There is a clear transition at about 16 times the organism body length separating two different scale regimes, which may be attributed to the different influence of the viscous effect. Surprisingly the scaling extracted from the streamline segment indicates the existence of scale similarity even at the zero Reynolds number limit. Moreover, the multifractal feature can be quantitatively described via a lognormal formula with the Hurst number H=0.76 and the intermittency parameter µ=0.20, which is coincidentally in agreement with the three-dimensional hydrodynamic turbulence result. The direction of cascade is measured via the filter-space technique. An inverse energy cascade is confirmed. For the enstrophy, a forward cascade is observed when r/R≤3, and an inverse one is observed when r/R>3, where r and R are the separation distance and the bacteria body size, respectively. Additionally, the lognormal statistics is verified for the coarse-grained energy dissipation and enstrophy, which supports the lognormal formula to fit the measured scaling exponent.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA