Showing 1 to 10 of 1248 matching Articles
Results per page:
Export (CSV)
By
Tallón-Ballesteros, Antonio J.; Correia, Luís; Cho, Sung-Bae
3 Citations
Feature selection has been applied in several areas of science and engineering for a long time. This kind of pre-processing is almost mandatory in problems with huge amounts of features which requires a very high computational cost and also may be handicapped very frequently with more than two classes and lot of instances. The general taxonomy clearly divides the approaches into two groups such as filters and wrappers. This paper introduces a methodology to refine the feature subset with an additional feature selection approach. It reviews the possibilities and deepens into a new class of algorithms based on a refinement of an initial search with another method. We apply sequentially an approximate procedure and an exact procedure. The research is supported by empirical results and some guidelines are drawn as conclusions of this paper.
more …
By
Nguyen, Hoai Bach; Xue, Bing; Liu, Ivy; Andreae, Peter; Zhang, Mengjie
Show all (5)
10 Citations
In classification, feature selection is an important but challenging task, which requires a powerful search technique. Particle swarm optimisation (PSO) has recently gained much attention for solving feature selection problems, but the current representation typically forms a high-dimensional search space. A new representation based on feature clusters was recently proposed to reduce the dimensionality and improve the performance, but it does not form a smooth fitness landscape, which may limit the performance of PSO. This paper proposes a new Gaussian based transformation rule for interpreting a particle as a feature subset, which is combined with the feature cluster based representation to develop a new PSO-based feature selection algorithm. The proposed algorithm is examined and compared with two recent PSO-based algorithms, where the first uses a Gaussian based updating mechanism and the conventional representation, and the second uses the feature cluster representation without using Gaussian distribution. Experiments on commonly used datasets of varying difficulty show that the proposed algorithm achieves better performance than the other two algorithms in terms of the classification performance and the number of features in both the training sets and the test sets. Further analyses show that the Gaussian transformation rule improves the stability, i.e. selecting similar features in different independent runs and almost always selects the most important features.
more …
By
Bouguila, Nizar; Ziou, Djemel
24 Citations
Mixture modeling is one of the most useful tools in machine learning and data mining applications. An important challenge when applying finite mixture models is the selection of the number of clusters which best describes the data. Recent developments have shown that this problem can be handled by the application of non-parametric Bayesian techniques to mixture modeling. Another important crucial preprocessing step to mixture learning is the selection of the most relevant features. The main approach in this paper, to tackle these problems, consists on storing the knowledge in a generalized Dirichlet mixture model by applying non-parametric Bayesian estimation and inference techniques. Specifically, we extend finite generalized Dirichlet mixture models to the infinite case in which the number of components and relevant features do not need to be known a priori. This extension provides a natural representation of uncertainty regarding the challenging problem of model selection. We propose a Markov Chain Monte Carlo algorithm to learn the resulted infinite mixture. Through applications involving text and image categorization, we show that infinite mixture models offer a more powerful and robust performance than classic finite mixtures for both clustering and feature selection.
more …
By
Soguero-Ruiz, Cristina
; Alberca Díaz-Plaza, Ana; Bohoyo, Pablo de Miguel
; Ramos-López, Javier; Rubio-Sánchez, Manuel; Sánchez, Alberto; Mora-Jiménez, Inmaculada
Show all (7)
1 Citations
Diabetes mellitus (DM) and essential hypertension (EH) are chronic diseases more prevalent every year, both independently and jointly. To gain insights about the particularities of these chronic conditions, we study the use of decision trees as a tool for selecting discriminative features and making predictive analyses of the health status of this kind of chronic patients. We considered gender, age, ICD9 codes for diagnosis and ATC codes for drugs associated with the diabetic and/or hypertensive population linked to the University Hospital of Fuenlabrada (Madrid, Spain) during 2012. Results show a relationship among DM/EH and diseases/drugs related to the respiratory system, mental disorders, or the musculoskeletal system. We conclude that drugs are quite informative, collecting information about the disease when the diagnosis code is not registered. Regarding predictive analyses, when discriminating patients with EH-DM and just one of these chronic conditions, better accuracy is obtained for EH (85.4%) versus DM (80.1%).
more …
By
Pernkopf, Franz
4 Citations
This paper proposes an approach that detects surface defects with three-dimensional characteristics on scale-covered steel blocks. The surface reflection properties of the flawless surface changes strongly. Light sectioning is used to acquire the surface range data of the steel block. These sections are arbitrarily located within a range of a few millimeters due to vibrations of the steel block on the conveyor. After the recovery of the depth map, segments of the surface are classified according to a set of extracted features by means of Bayesian network classifiers. For establishing the structure of the Bayesian network, a floating search algorithm is applied, which achieves a good tradeoff between classification performance and computational efficiency for structure learning. This search algorithm enables conditional exclusions of previously added attributes and/or arcs from the network. The experiments show that the selective unrestricted Bayesian network classifier outperforms the naïve Bayes and the tree-augmented naïve Bayes decision rules concerning the classification rate. More than 98% of the surface segments have been classified correctly.
more …
By
Wang, Youwei; Feng, Lizhou; Zhu, Jianming
7 Citations
Feature selection, which can reduce the dimensions of feature space without sacrificing the performance of the classifier, is an effective technique for text classification. Because many classifiers cannot deal with the features with high dimensions, filtering the redundant information from the original feature space becomes one of the core goals in feature selection field. In this paper, the concept of equivalence word set is introduced and a set of equivalence word sets (represented as EWS1) is constructed using the rich semantic information of the Open Directory Project (ODP). On this basis, an artificial bee colony based feature selection method is proposed for filtering the redundant information, and a feature subset FS is obtained by using an optimal feature selection (OFS) method and two predetermined thresholds. In order to obtain the best predetermined thresholds, an improved memory based artificial bee colony method (IABCM) is proposed. In the experiments, fuzzy support vector machine (FSVM) and Naïve Bayesian (NB) classifiers are used on six datasets: LingSpam, WebKB, SpamAssian, 20-Newsgroups, Reuters21578 and TREC2007. Experimental results verify that when FSVM and NB are applied, the proposed method is efficient and achieves better accuracy than several representative feature selection methods.
more …
By
Ferone, Alessio; Georgiev, Tsvetozar; Maratea, Antonio
In real-world applications, the data gathering process is necessarily bounded by costs in terms of money, time or resources that need to be spent in order to sample a sufficient amount of good quality data. From this point of view Feature Selection (FS) is essential to reduce the total sampling cost while trying to keep the information content of sampled data unaltered, and Rough Sets (RS) offer a natural representation of FS in terms of the so-called reducts. In this paper a modified version of the Quick Reduct (QR) algorithm is proposed, where the criterium to add features to the reduct accounts also for the costs of the features. Exploiting granular computing and the indiscernibility principle, the Test-Cost-Sensitive Quick Reduct (TCSQR) here proposed efficiently derives a close-to-optimal subset of informative and inexpensive features. Promising experimental results have been obtained on three different cost scenarios.
more …
By
Ahmad, Aliyu Usman; Starkey, Andrew
The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. One of the key challenges is the implementation of effective methods for selecting a set of relevant features, which are buried in high-dimensional data along with irrelevant noisy features by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s Self Organising Neural Network MAP has been utilized in various ways for this task. In this work, a review of the appropriate application of multiple methods for this task is carried out. The feature selection approach based on analysis of the Self Organising network result after training is presented with comparison of performance of two methods.
more …
By
Chang, Chuan-Yu; Chang, Shu-Han; Chen, Shao-Jer
1 Citations
Pathological changes in lymph nodes (LN) can be diagnosed using biopsy, which is a time consuming process. Compared to biopsy, sonography is a better material for detecting pathology in the LN. However, there is lack of consistency between different ultrasound systems, which tend to produce images with different properties. To overcome this problem, a method was proposed in this paper to identify and select universal imaging features to standardize the classification of LN for different ultrasound imaging systems. This will help in the diagnosis of various pathological conditions. The support vector machine (SVM), which combines correlation and performance analysis for the selection of proper imaging features, was adopted for this classification system. Experimental results demonstrated that each selected feature set could be used to classify respective pathological conditions in the LN for images acquired from different ultrasound imaging machines.
more …
By
Afzali, Shima; Al-Sahaf, Harith; Xue, Bing; Hollitt, Christopher; Zhang, Mengjie
Show all (5)
Salient Object Detection (SOD) aims to model human visual attention system to cope with the complex natural scene which contains various objects at different scales. Over the past two decades, a wide range of saliency features have been introduced in the SOD field, however feature selection has not been widely investigated for selecting informative, non-redundant, and complementary features from the existing features. In SOD, multi-level feature extraction and feature combination are two fundamental stages to compute the final saliency map. However, designing a good feature combination framework is a challenging task and requires domain-expert intervention. In this paper, we propose a genetic programming (GP) based method that is able to automatically select the complementary saliency features and generate mathematical function to combine those features. The performance of the proposed method is evaluated using four benchmark datasets and compared to nine state-of-the-art methods. The qualitative and quantitative results show that the proposed method significantly outperformed, or achieved comparable performance to, the competitor methods.
more …
-