Showing 1 to 23 of 23 matching Articles
Results per page:
Export (CSV)
By
Picek, Stjepan; Coello Coello, Carlos A.; Jakobovic, Domagoj; Mentens, Nele
Show all (4)
2 Citations
The problem of finding the shortest addition chain for a given exponent is of great relevance in cryptography, but is also very difficult to solve since it is an NPhard problem. In this paper, we propose a genetic algorithm with a novel representation of solutions and new crossover and mutation operators to minimize the length of the addition chains corresponding to a given exponent. We also develop a repair strategy that significantly enhances the performance of our approach. The results are compared with respect to those generated by other metaheuristics for instances of moderate size, but we also investigate values up to
$$2^{127}  3$$
. For those instances, we were unable to find any results produced by other metaheuristics for comparison, and three additional strategies were adopted in this case to serve as benchmarks. Our results indicate that the proposed approach is a very promising alternative to deal with this problem.
more …
By
TecuanhuehueVera, P.; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Multidimensional scaling maps a set of ndimensional objects into a lowerdimension space, usually the Euclidean plane, preserving the distances among objects in the original space. Most algorithms for multidimensional scaling have been designed to work on numerical data, but in soft sciences, it is common that objects are described using quantitative and qualitative attributes, even with some missing values. For this reason, in this paper we propose a genetic algorithm especially designed for multidimensional scaling over mixed and incomplete data. Some experiments using datasets from the UCI repository, and a comparison against a common algorithm for multidimensional scaling, shows the behavior of our proposal.
more …
By
KuriMorales, Angel
A method which allows us to find a polynomial model of an unknown function from a set of tuples with n independent variables and m tuples is presented. A plausible model of an explicit approximation is found. The number of monomials of the model is determined by a previously trained neural network (NNt). The degrees of the monomials are bounded from the Universal Approximation Theorem yielding a reduced polynomial form. The coefficients of the model are found by exploring the space of possible monomials with a genetic algorithm (GA). The polynomial defined by every training pair is approximated by the Ascent Algorithm which yields the coefficients of best fit with the L_{∞} norm. The L2 error corresponding to these coefficients is calculated and becomes the fitness function of the GA. A detailed example for a well known experimental data set is presented. It is shown that the model derived from the method yields a classification error of <7% for the crossvalidation data set. A detailed description of the NNt is included. The approximation to 46 datasets tackled with our method is discussed. In all these cases the approximation accuracy was better than 94%. The tool described herein yields general models with explanatory characteristics not found in other methods.
more …
By
RojasSimón, Jonathan; Ledeneva, Yulia; GarcíaHernández, René Arnulfo
Over the last years, Automatic Text Summarization (ATS) has been considered as one of the main tasks in Natural Language Processing (NLP) that generates summaries in several languages (e.g., English, Portuguese, Spanish, etc.). One of the most significant advances in ATS is developed for Portuguese reflected with the proposals of various stateofart methods. It is essential to know the performance of different stateoftheart methods with respect to the upper bounds (Topline), lower bounds (Baselinerandom), and other heuristics (Baselinefirst). In recent works, the significance and upper bounds for SingleDocument Summarization (SDS) and MultiDocument Summarization (MDS) using corpora from Document Understanding Conferences (DUC) were calculated. In this paper, a calculus of upper bounds for SDS in Portuguese using Genetic Algorithms (GA) is performed. Moreover, we present a comparison of some stateoftheart methods with respect to the upper bounds, lower bounds, and heuristics to determinate their level of significance.
more …
By
KuriMorales, Angel; AldanaBobadilla, Edwin
9 Citations
Genetic Algorithms (GAs) have long been recognized as powerful tools for optimization of complex problems where traditional techniques do not apply. However, although the convergence of elitist GAs to a global optimum has been mathematically proven, the number of iterations remains a casebycase parameter. We address the problem of determining the best GA out of a family of structurally different evolutionary algorithms by solving a large set of unconstrained functions. We selected 4 structurally different genetic algorithms and a nonevolutionary one (NEA). A schemata analysis was conducted further supporting our claims. As the problems become more demanding, the GAs significantly and consistently outperform the NEA. A particular breed of GA (the Eclectic GA) is superior to all other, in all cases.
more …
By
KuriMorales, Angel
At present very large volumes of information are being regularly produced in the world. Most of this information is unstructured, lacking the properties usually expected from, for instance, relational databases. One of the more interesting issues in computer science is how, if possible, may we achieve data mining on such unstructured data. Intuitively, its analysis has been attempted by devising schemes to identify patterns and trends through means such as statistical pattern learning. The basic problem of this approach is that the user has to decide, a priori, the model of the patterns and, furthermore, the way in which they are to be found in the data. This is true regardless of the kind of data, be it textual, musical, financial or otherwise. In this paper we explore an alternative paradigm in which raw data is categorized by analyzing a large corpus from which a set of categories and the different instances in each category are determined, resulting in a structured database. Then each of the instances is mapped into a numerical value which preserves the underlying patterns. This is done using a genetic algorithm and a set of multilayer perceptron networks. Every categorical instance is then replaced by the adequate numerical code. The resulting numerical database may be tackled with the usual clustering algorithms. We hypothesize that any unstructured data set may be approached in this fashion. In this work we exemplify with a textual database and apply our method to characterize texts by different authors and present experimental evidence that the resulting databases yield clustering results which permit authorship identification from raw textual data.
more …
By
KuriMorales, Angel; CartasAyala, Alejandro
One of the most interesting goals in engineering and the sciences is the mathematical representation of physical, social and other kind of complex phenomena. This goal has been attempted and, lately, achieved with different machine learning (ML) tools. ML owes much of its present appeal to the fact that it allows to model complex phenomena without the explicit definition of the form of the model. Neural networks and support vector machines exemplify such methods. However, in most of the cases, these methods yield “black box” models, i.e. input and output correspond to the phenomena under scrutiny but it is very difficult (or outright impossible) to discern the interrelation of the input variables involved. In this paper we address this problem with the explicit aim of targeting on models which are closed in nature, i.e. the aforementioned relation between variables is explicit. In order to do this, in general, the only assumption regarding the data is that they be approximately continuous. In such cases it is possible to represent the system with polynomial expressions. To be able to do so one must define the number of monomials, the degree of every variable in every monomial and the coefficients associated. We model sparse data systems with an algorithm minimizing the minmax norm. From mathematical and experimental evidence we are able to set a bound on the number of terms and degrees of the approximating polynomials. Thereafter, a genetic algorithm (GA) identifies the coefficients which correspond to the terms and degrees defined as above.
more …
By
Bonilla Huerta, Edmundo; Hernández Hernández, J. Crispín; Hernández Montiel, L. Alberto
3 Citations
This paper introduces a new combined filterwrapper gene subset selection approach where a Genetic Algorithm (GA) is combined with Linear Discriminant Analysis (LDA). This LDAbased GA algorithm has the major characteristic that the GA uses not only a LDA classifier in its fitness function, but also LDA’s discriminant coefficients in its dedicated crossover and mutation operators. This paper studies the effect of these informed operators on the evolutionary process. The proposed algorithm is assessed on a several wellknown datasets from the literature and compared with recent state of art algorithms. The results obtained show that our filterwrapper approach obtains globally high classification accuracies with very small number of genes to those obtained by other methods.
more …
By
AldanaBobadilla, Edwin; AlfaroPérez, Carlos
1 Citations
A common task in data analysis is to find the appropriate data sample whose properties allow us to infer the parameters of the data population. The most frequently dilemma related to sampling is how to determine the optimal size of the sample. To solve it, there are typical methods based on asymptotical results from the Central Limit Theorem. However, the effectiveness of such methods is bounded by several considerations as the sampling strategy (simple, stratified, clusterbased, etc.), the size of the population or even the dimensionality of the space of the data. In order to avoid such constraints, we propose a method based on a measure of information of the data in terms of Shannon’s Entropy. Our idea is to find the optimal sample of size N whose information is as similar as possible to the information of the population, subject to several constraints. Finding such sample represents a hard optimization problem whose feasible space disallows the use of traditional optimization techniques. To solve it, we resort to Genetic Algorithms. We test our method with synthetic datasets; the results show that our method is suitable. For completeness, we used a dataset from a real problem; the results confirm the effectiveness of our proposal and allow us to visualize different applications.
more …
By
MartínezVillaseñor, Lourdes; Ponce, Hiram; Marmolejo, José Antonio; Ramírez, Juan Manuel; Hernández, Agustina
Show all (5)
In this paper, a deterministic dynamic mixedinteger programming model for solving the generation and transmission expansionplanning problem is addressed. The proposed model integrates conventional generation with renewable energy sources and it is based on a centralized planned transmission expansion. Due a growing demand over time, it is necessary to generate expansion plans that can meet the future requirements of energy systems. Nowadays, in most systems a public entity develops both the short and long of electricitygrid expansion planning and mainly deterministic methods are employed. In this study, an heuristic optimization approach based on genetic algorithms is presented. Numerical results show the performance of the proposed algorithm.
more …
By
KuriMorales, Angel Fernando; AldanaBobadilla, Edwin; LópezPeña, Ignacio
8 Citations
Genetic Algorithms (GAs) have long been recognized as powerful tools for optimization of complex problems where traditional techniques do not apply. In [1] we reported the superior behavior, out of 4 evolutionary algorithms and a hill climber, of a particular breed: the socalled Eclectic Genetic Algorithm (EGA). EGA was tested vs. a set (TS) consisting of large number of selected problems most of which have been used in previous works as an experimental testbed. However, the conclusions of the said benchmark are restricted to the functions in TS. In this work we extend the previous results to a much larger set (U) consisting of ξ ≈
$\sum\limits^{31}_{i=1}$
(2^{64})^{i} ≈ 11 ×10^{50} unconstrained functions. Randomly selected functions in U were minimized for 800 generations each; the minima were averaged in batches of 36 each yielding
$\overline{X}_i$
for the ith batch. This process was repeated until the
$\overline{X}_i$
’s displayed a Gaussian distribution with parameters
$\mu_{\overline{X}}$
and
$\sigma_{\overline{X}}$
. From these, the parameters μ and σ describing the probabilistic behavior of each of the algorithms for U were calculated with 95% reliability. We give a sketch of the proof of the convergence of an elitist GA to the global optimum of any given function. We describe the methods to: a) Generate the functions; b) Calculate μ and σ for U and c) Evaluate the relative efficiency of all algorithms in our study. EGA’s behavior was the best of all algorithms.
more …
By
KuriMorales, Angel; SagastuyBreña, Javier
2 Citations
Structured data bases may include both numerical and nonnumerical attributes (categorical or CA). Databases which include CAs are called “mixed” databases (MD). Metric clustering algorithms are ineffectual when presented with MDs because, in such algorithms, the similarity between the objects is determined by measuring the differences between them, in accordance with some predefined metric. Nevertheless, the information contained in the CAs of MDs is fundamental to understand and identify the patterns therein. A practical alternative is to encode the instances of the CAs numerically. To do this we must consider the fact that there is a limited subset of codes which will preserve the patterns in the MD. To identify such patternpreserving codes (PPC) we appeal to neural networks (NN) and genetic algorithms (GA). It is possible to identify a set of PPCs by trying out a bounded number of codes (the individuals of a GA’s population) and demanding the GA to identify the best individual. Such individual is the best practical PPC for the MD. The computational complexity of this task is considerable. To decrease processing time we appeal to multicore architectures and the implementation of multiple threads in an algorithm called ParCENG. In this paper we discuss the method and establish experimental bounds on its parameters. This will allow us to tackle larger databases in much shorter execution times.
more …
By
RomeroMontiel, Flor Alejandra; RodríguezVázquez, Katya
DNA microarrays are used for the massive quantification of gene expression. This analysis allows to diagnose, identify and classify different diseases. This is a computationally challenging task due to the large number of genes and a relatively small number of samples.
Some papers applied the generalized neuron (GN) to solve approximation functions, to calculate density estimates, prediction and classification problems [1, 2].
In this work we show how a GN can be used in the task of microarray classification. The proposed methodology is as follows: first reducing the dimensionality of the genes using a genetic algorithm, then the generalized neuron is trained using one bioinspired algorithms: Particle Swarm Optimization, Genetic Algorithm and Differential Evolution. Finally the precision of the methodology it is tested by classifying three databases of DNA microarrays:
$$Leukemia\ benchmarck$$
$$ALLAML$$
,
$$Colon\ Tumor$$
and
$$Prostate\ cancer$$
.
more …
By
Gomez, Juan Carlos; TerashimaMarín, Hugo
5 Citations
In this article, a multiobjective evolutionary framework to build selection hyperheuristics for solving instances of the 2D bin packing problem is presented. The approach consists of a multiobjective evolutionary learning process, using specific tailored genetic operators, to produce sets of variable length rules representing hyperheuristics. Each hyperheuristic builds a solution to a given problem instance by sensing the state of the instance, and deciding which single heuristic to apply at each decision point. The hyperheuristics consider the minimization of two conflicting objectives when building a solution: the number of bins used to accommodate the pieces and the total time required to do the job. The proposed framework integrates three wellstudied multiobjective evolutionary algorithms to produce sets of Paretoapproximated hyperheuristics: the Nondominated Sorting Genetic AlgorithmII, the Strength Pareto Evolutionary Algorithm 2, and the Generalized Differential Evolution Algorithm 3. We conduct an extensive experimental analysis using a large set of 2D bin packing problem instances containing convex and nonconvex irregular pieces, under many conditions, settings and using several performance metrics. The analysis assesses the robustness and flexibility of the proposed approach, providing encouraging results when compared against a set of wellknown baseline single heuristics.
more …
By
CervantesSanchez, Fernando; CruzAceves, Ivan; HernandezAguirre, Arturo
This paper presents a novel method based on singlescale Gabor filters (SSG) consisting of three steps for vessel segmentation and vessel width estimation of Xray coronary angiograms. In the first stage, a comparative analysis of genetic algorithms, and two estimation of distribution algorithms in order to improve the vessel detection rate of the SSG, while reducing the computational time of the training step is performed. The detection results of the SSG are compared with those obtained by four stateoftheart detection methods via the area (
$$A_z$$
) under the receiver operating characteristic (ROC) curve. In the second stage, a comparative analysis of five automatic thresholding methods is performed in order to discriminate vessel and nonvessel pixels from the Gabor filter response. In the last step, a procedure to estimate the vessel width of the segmented coronary tree structure is presented. The experimental results using the SSG obtained the highest vessel detection performance with
$$A_z = 0.9584$$
with a training set of 40 angiograms. In addition, the segmentation results using the interclass variance thresholding method provided a segmentation accuracy of 0.941 with a test set of 40 angiograms. The performance of the proposed method consisting of the steps of vessel detection, segmentation, and vessel width estimation shows promising results according to the evaluation measures, which is suitable for clinical decision support in cardiology.
more …
By
PalaciosAlonso, Miguel A.; Brizuela, Carlos A.; Sucar, L. Enrique
14 Citations
Many problems such as voice recognition, speech recognition and many other tasks have been tackled with Hidden Markov Models (HMMs). These problems can also be dealt with an extension of the Naive Bayesian Classifier (NBC) known as Dynamic NBC (DNBC). From a dynamic Bayesian network (DBN) perspective, in a DNBC at each time there is a NBC. NBCs work well in data sets with independent attributes. However, they perform poorly when the attributes are dependent or when there are one or more irrelevant attributes which are dependent of some relevant ones. Therefore, to increase this classifier accuracy, we need a method to design network structures that can capture the dependencies and get rid of irrelevant attributes. Furthermore, when we deal with dynamical processes there are temporal relations that should be considered in the network design. In order to learn automatically these models from data and increase the classifier accuracy we propose an evolutionary optimization algorithm to solve this design problem. We introduce a new encoding scheme and new genetic operators which are natural extensions of previously proposed encoding and operators for grouping problems. The design methodology is applied to solve the recognition problem for nine hand gestures. Experimental results show that the evolved network has higher average classification accuracy than the basic DNBC and a HMM.
more …
By
OrtizBayliss, José Carlos; TerashimaMarín, Hugo; ConantPablos, Santiago Enrique
5 Citations
Selection hyperheuristics are a technology for optimization in which a highlevel mechanism controls lowlevel heuristics, so as to be capable of solving a wide range of problem instances efficiently. Hyperheuristics are used to generate a solution process rather than producing an immediate solution to a given problem. This process is a reusable mechanism that can be applied both to seen and unseen problem instances. In this paper, we propose a selection hyperheuristic process with the intention to rise the level of generality and solve consistently well a wide range of constraint satisfaction problems. The hyperheuristic technique is based on a messy genetic algorithm that generates highlevel heuristics formed by rules (condition
$$\rightarrow $$
heuristic). The highlevel heuristics produced are seen to be good at solving instances from certain parts of the parameterized space of problems, producing results using effort comparable to the best single heuristic per instance. This is beneficial, as the choice of best heuristic varies from instance to instance, so the highlevel heuristics are definitely preferable to selecting any one lowlevel heuristic for all instances. The results confirm the robustness of the proposed approach and how highlevel heuristics trained for some specific classes of instances can also be applied to unseen classes without significant lost of efficiency. This paper contributes to the understanding of heuristics and the way they can be used in a collaborative way to benefit from their combined strengths.
more …
By
Dorado, Hugo; Cobos, Carlos; TorresJimenez, Jose; Jimenez, Daniel; Mendoza, Martha
Show all (5)
The methods for variable importance measures and feature selection in the task of classification/regression in data mining and Big Data enable the removal of noise caused by irrelevant or redundant variables, the reduction of computational cost in the construction of models and facilitate the understanding of these models. This paper presents a proposal to measure the importance of the input variables in a classification/regression problem, taking as input the solutions evaluated by a wrapper and the performance information (quality of classification expressed for example in accuracy, precision, recall, F measure, among others) associated with each of these solutions. The proposed method quantifies the effect on the classification/regression performance produced by the presence or absence of each input variable in the subsets evaluated by the wrapper. This measure has the advantage of being specific for each classifier, which makes it possible to differentiate the effects each input variable can generate depending on the model built. The proposed method was evaluated using the results of three wrappers  one based on genetic algorithms (GA), another on particle swarm optimization (PSO), and a new proposal based on covering arrays (CA)  and compared with two filters and the variable importance in Random Forest. The experiments were performed on three classifiers (Naive Bayes, Random Forest and MultiLayer Perception) and seven data sets from the UCI repository. The comparisons were made using Friedman’s Aligned Ranks test and the results indicate that the proposed measure stands out for maintaining in the first input variables a higher quality in the classification, approximating better to the variables found by the feature selection methods.
more …
By
Coello Coello, C. A.; Christiansen, A. D.; Hernández, F. Santos
52 Citations
We present an optimization model for the design of rectangular reinforced concrete beams subject to a specified set of constraints. Our model is more realistic than previously published models because it minimizes the cost of the beam on strength design procedures, while also considering the costs of concrete, steel and shuttering. Thus our method leads to very practical designs. As there is an infinite number of possible beam dimensions and reinforcement ratios that yield the same moment of resistance, an efficient search technique is preferred over the more traditional iterative methods. We employ a simple genetic algorithm as the search engine, and we compare our results with those obtained via geometric programming. Since the adjustment of parameters in a genetic algorithm (e.g., population size, crossover and mutation rates, and maximum number of generations) is a significant problem for any application, we present our own methodology to deal with this problem. A prototype of this system is currently being tested in México, in order to evaluate its potential as a reliable design tool for real world applications.
more …
By
ParraDominguez, Gemma S.; SanchezYanez, Raul E.; Valdez, S. Ivvan
Image analysis is a relevant tool to improve the healthcare services. Fundus retinal image analysis allows the early detection of ophthalmic diseases such as diabetes and glaucoma. Thus, growing interest is observed on the development of segmentation algorithms for blood vessels in retinal images. For this purpose, Kernelbased approaches with Gaussian matched filters have been successfully used. Nowadays, improved image sensors and computers deliver high resolution images, and different parameter values are required for the efficient operation of such filters. In this work, an optimization system using genetic algorithms is designed to calculate those values. To evaluate our methodology, a segmentation algorithm is proposed and the outcomes are evaluated on the HRF image database. Performance measures are obtained and compared to those obtained using state of the art methods. This analysis represents a first step in the detection and classification of normal and abnormal eye conditions.
more …
By
JaenCuellar, Arturo Yosimar; MoralesVelazquez, Luis; RomeroTroncoso, Rene de Jesus; MoriñigoSotelo, Daniel; OsornioRios, Roque Alfredo
Show all (5)
1 Citations
The power quality analysis represents an important aspect in the overall society welfare. The analysis of power disturbances in electrical systems is typically performed in two steps: disturbance detection and disturbance classification. Disturbance detection is usually made through space transform techniques, and their classification is usually performed through artificial intelligence methods. The problem with those approaches is the adequate selection of parameters for these techniques. Due to the advantages of a variant scheme known as the microgenetic algorithms, in this investigation, a new methodology to directly detect and classify electrical disturbances in one step is developed. The proposed approach is validated through synthetic signals and experimental test on real data, and the obtained results are compared with the particle swarm optimization method in order to show the effectiveness of this methodology.
more …
By
Fraire Huacuja, Héctor J.; Vargas, David Romero; Valdez, Guadalupe Castilla; Camacho Andrade, Carlos A.; Valdez, Georgina Castillo; Martínez Flores, José A.
Show all (6)
In this paper the problem of determining the atomic cluster configurations that minimize the LennardJones potential energy is approached. Traditional studies are oriented to improve the quality of the solution and practically do not present statistical information to support the efficiency of the reported solution methods. Without this type of evidence the effectiveness of these methods might highly be dependent only on the capacity of the available computing resources. In this work it is proposed to incorporate statistical information on the performance of the solution methods. An advantage of this approach is that when the performance tests are standardized and statistically supported, we can take advantage of efficient solution methods that have been tested only in conditions of modest computing resources. An experimental study of the problem is presented in which the generated statistical information is used to identify two potential areas to improve the performance of the evaluated method.
more …
By
Ramon, Soto C.; Julio, Waissman V.
A widely accepted idea in the pattern recognition field is that a multiple classifier system use to show superior performance than individual classifiers when dealing with complex problems. Most multiple classier systems are built up from classifiers developed completely independent of each other and combined in a last step, generating possible decisions conflicts among individual classifiers. In this paper, a non standard genetic algorithm for tuning multiple classifier systems is presented. This algorithm is based on a set of concepts that extends the genetic metaphor: coevolutionary diversity, collective fitness, suitable behavior, phylogenetic evolution and ontogenetic evolution.
more …
