Showing 1 to 46 of 46 matching Articles
Results per page:
Export (CSV)
By
JiménezGuarneros, Magdiel; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Post to Citeulike
Currently, graph embedding has taken a great interest in the area of structural pattern recognition, especially techniques based on representation via dissimilarity. However, one of the main problems of this technique is the selection of a suitable set of prototype graphs that better describes the whole set of graphs. In this paper, we evaluate the use of an instance selection method based on clustering for graph embedding, which selects border prototypes and some nonborder prototypes. An experimental evaluation shows that the selected method gets competitive accuracy and better runtimes than other state of the art methods.
more …
By
LazoCortés, Manuel S.; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel
Post to Citeulike
1 Citations
In this article, we revisit the concept of Goldman’s fuzzy testor and restudy it from the perspective of the conceptual approach to attribute reduct introduced by Y.Y. Yao in the framework of Rough Set Theory. We reformulate the original concept of Goldman’s fuzzy testor and we introduce the Goldman’s fuzzy reducts. Additionally, in order to show the usefulness of the Goldman’s fuzzy reducts, we build a rule based classifier and evaluate its performance in a case of study.
more …
By
TecuanhuehueVera, P.; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Post to Citeulike
Multidimensional scaling maps a set of ndimensional objects into a lowerdimension space, usually the Euclidean plane, preserving the distances among objects in the original space. Most algorithms for multidimensional scaling have been designed to work on numerical data, but in soft sciences, it is common that objects are described using quantitative and qualitative attributes, even with some missing values. For this reason, in this paper we propose a genetic algorithm especially designed for multidimensional scaling over mixed and incomplete data. Some experiments using datasets from the UCI repository, and a comparison against a common algorithm for multidimensional scaling, shows the behavior of our proposal.
more …
By
LoyolaGonzález, Octavio; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; GarcíaBorroto, Milton
Show all (4)
Post to Citeulike
Applying resampling methods is an important approach for working with class imbalance problems. The main reason is that many classifiers are sensitive to class distribution, biasing their prediction towards the majority class. Contrast pattern based classifiers are sensitive to imbalanced databases because these classifiers commonly find several patterns of the majority class and only a few patterns (or none) of the minority class. In this paper, we present a correlation study among resampling methods for contrast pattern based classifiers. Our experiments performed over several imbalanced databases show that there is a high correlation among different resampling methods. Correlation results show that there are nine different groups with very high inner correlation and very low outer correlation. We show that most resampling methods allow improving the accuracy of the contrast pattern based classifiers.
more …
By
ArocheVillarruel, Argenis A.; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; PérezSuárez, Airel
Show all (4)
Post to Citeulike
1 Citations
DenStream is a data stream clustering algorithm which has been widely studied due to its ability to find clusters with arbitrary shapes and dealing with noisy objects. In this paper, we propose a different approach for pruning microclusters in DenStream. Our proposal unlike other previously reported pruning, introduces a different way for computing the microcluster radii and provides new options for the pruning stage of DenStream. From our experiments over public standard datasets we conclude that our approach improves the results obtained by DenStream.
more …
By
RodríguezDiez, Vladímir; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; LazoCortés, Manuel S.
Show all (4)
Post to Citeulike
Within Testor Theory, typical testors are irreducible subsets of attributes preserving the object discernibility ability of the original set of attributes. Computing all typical testors from a dataset has exponential complexity regarding its number of attributes, however there are other properties of a dataset that have some influence on the performance of different algorithms. Previous studies have determined that a significant runtime reduction can be obtained from selecting the appropriate algorithm for a given dataset. In this work, we present an experimental study evaluating the effect of basic matrix dimensionality on the performance of the algorithms for typical testor computation. Our experiments are carried out over synthetic and real–world datasets. Finally, some guidelines obtained from the experiments, for helping to select the best algorithm for a given dataset, are summarised.
more …
By
LópezEspinoza, Erika; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Post to Citeulike
In this paper, two strategies to compute the support sets system for the supervised classifier ALVOT (voting algorithms) using sequential floating selection are presented. ALVOT is a supervised classification model based on the partial precedence principle, therefore, it needs, as feature selection, a set of features subsets, this set is called support sets system. The sequential floating selection methods for feature selection find only one relevant features subset. The introduced strategies search for a set of features subsets to generate a support sets system. Both strategies are compared between them and against the feature selection method based on testor theory, which is commonly used to compute this system. Results obtained with both strategies on different databases from UCI and on the faces database from Olivetti Research Laboratory (ORL) in Cambridge are presented.
more …
By
LópezEspinoza, Erika; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Post to Citeulike
Abstract
In this paper, two strategies to compute the support sets system for the supervised classifier ALVOT (voting algorithms) using sequential floating selection are presented. ALVOT is a supervised classification model based on the partial precedence principle, therefore, it needs, as feature selection, a set of features subsets, this set is called support sets system. The sequential floating selection methods for feature selection find only one relevant features subset. The introduced strategies search for a set of features subsets to generate a support sets system. Both strategies are compared between them and against the feature selection method based on testor theory, which is commonly used to compute this system. Results obtained with both strategies on different databases from UCI and on the faces database from Olivetti Research Laboratory (ORL) in Cambridge are presented.
more …
By
CarrascoOchoa, Jesús Ariel; RuizShulcloper, José; DelaVegaDoría, Lucía Angélica
Post to Citeulike
Typical e:testors are useful to do feature selection in supervised classification problems with mixed incomplete data, where similarity function is not the total coincidence, but it is a one threshold function. In this kind of problems, modifications on the training matrix can appear very frequently. Any modification of the training matrix can change the set of all typical ε:testors, so this set must be recomputed after each modification. But, complexity of algorithms for calculating all typical ε:testors of a training matrix is too high. In this paper we analyze how the set of all typical ε:testors changes after modifications. An alternative method to calculate all typical ε:testors of the modified training matrix is exposed. The new method’s complexity is analyzed and some experimental results are shown.
more …
By
LazoCortés, Manuel S.; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; SanchezDiaz, Guillermo
Show all (4)
Post to Citeulike
1 Citations
This paper deals with the relation between rough set reducts and typical testors from the logical combinatorial approach to pattern recognition. The main objective is to clarify once and for all that although in many cases the two concepts coincide, being rigorous they are not the same. Definitions, comments and observations are formally introduced and supported by illustrative examples. Furthermore, some theorems expressing theoretical relations between reducts and typical testors are enunciated and proved.
more …
By
PinedaBautista, Bárbara B.; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Post to Citeulike
1 Citations
In this work, a new method for classspecific feature selection, which selects a possible different feature subset for each class of a supervised classification problem, is proposed. Since conventional classifiers do not allow using a different feature subset for each class, the use of a classifier ensemble and a new decision rule for classifying new instances are also proposed. Experimental results over different databases show that, using the proposed method, better accuracies than using traditional feature selection methods, are achieved.
more …
By
GarcíaHernández, René A.; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel
Post to Citeulike
4 Citations
One of the sequential pattern mining problems is to find the maximal frequent sequences in a database with a β support. In this paper, we propose a new algorithm to find all the maximal frequent sequences in a text instead of a database. Our algorithm in comparison with the typical sequential pattern mining algorithms avoids the joining, pruning and text scanning steps. Some experiments have shown that it is possible to get all the maximal frequent sequences in a few seconds for medium texts.
more …
By
LazoCortés, Manuel S.; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.; SanchezDiaz, Guillermo
Show all (4)
Post to Citeulike
In their classic form, reducts as well as typical testors are minimal subsets of attributes that retain the discernibility condition. Constructs are a special type of reducts and represent a kind of generalization of the reduct concept. A construct reliably provides sufficient amount of discrimination between objects belonging to different classes as well as sufficient amount of resemblance between objects belonging to the same class. Based on the relation between constructs, reducts and typical testors this paper focuses on a practical use of this relation. Specifically, we propose a method that allows applying typical testor algorithms for computing constructs. The proposed method involves modifying the classic definition of pairwise object comparison matrix adapting it to the requirements of certain algorithms originally designed to compute typical testors. The usefulness of our method is shown through some examples.
more …
By
FloresGarrido, Marisol; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Post to Citeulike
2 Citations
Graph pattern mining is an important task in Data Mining and several algorithms have been proposed to solve this problem. Most of them require that a pattern and its occurrences are identical, thus, they rely on solving the graph isomorphism problem. In the last years, however, some algorithms have focused in the case in which label and edge structure differences between a pattern and its occurrences are allowed but maintaining a bijection among vertices, using inexact matching during the mining process. Recently, an algorithm that allows structural differences in vertices was proposed. This feature allows it to find patterns missed by other algorithms, but, do these extra patterns actually contain useful information? We explore the answer to this question by performing an experiment in the context of unsupervised mining tasks. Our results suggests that by allowing structural differences in both, vertices and edges, it is possible to obtain new useful information.
more …
By
GarcíaBorroto, Milton; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel
Post to Citeulike
14 Citations
Obtaining accurate class prediction of a query object is an important component of supervised classification. However, it could be also important to understand the classification in terms of the application domain, mostly if the prediction disagrees with the expected results. Many accurate classifiers are unable to explain their classification results in terms understandable by an application expert. Classifiers based on emerging patterns, on the other hand, are accurate and easy to understand. The goal of this article is to review the stateoftheart methods for mining emerging patterns, classify them by different taxonomies, and identify new trends. In this survey, we present the most important emerging pattern miners, categorizing them on the basis of the mining paradigm, the use of discretization, and the stage where the mining occurs. We provide detailed descriptions of the mining paradigms with their pros and cons, what helps researchers and users to select the appropriate algorithm for a given application.
more …
By
LópezMonroy, Adrián Pastor; MontesyGómez, Manuel; VillaseñorPineda, Luis; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Show all (5)
Post to Citeulike
1 Citations
This paper proposes a novel representation for Authorship Attribution (AA), based on Concise Semantic Analysis (CSA), which has been successfully used in Text Categorization (TC). Our approach for AA, called Document Author Representation (DAR), builds document vectors in a space of authors, calculating the relationship between textual features and authors. In order to evaluate our approach, we compare the proposed representation with conventional approaches and previous works using the c50 corpus. We found that DAR can be very useful in AA tasks, because it provides good performance on imbalanced data, getting comparable or better accuracy results.
more …
By
LoyolaGonzález, Octavio; GarcíaBorroto, Milton; MedinaPérez, Miguel Angel; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; Ita, Guillermo
Show all (6)
Post to Citeulike
4 Citations
Classifiers based on emerging patterns are usually more understandable for humans than those based on more complex mathematical models. However, most of the classifiers based on emerging patterns get low accuracy in those problems with imbalanced databases. This problem has been tackled through oversampling or undersampling methods, nevertheless, to the best of our knowledge these methods have not been tested for classifiers based on emerging patterns. Therefore, in this paper, we present an empirical study about the use of oversampling and undersampling methods to improve the accuracy of a classifier based on emerging patterns. We apply the most popular oversampling and undersampling methods over 30 databases from the UCI Repository of Machine Learning. Our experimental results show that using oversampling and undersampling methods significantly improves the accuracy of the classifier for the minority class.
more …
By
RodríguezGonzález, Ansel Yoan; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel; RuizShulcloper, José
Show all (4)
Post to Citeulike
9 Citations
Most of the current algorithms for mining frequent patterns assume that two object subdescriptions are similar if they are equal, but in many realworld problems some other ways to evaluate the similarity are used. Recently, three algorithms (ObjectMiner, STreeDCMiner and STreeNDCMiner) for mining frequent patterns allowing similarity functions different from the equality have been proposed. For searching frequent patterns, ObjectMiner and STreeDCMiner use a pruning property called Downward Closure property, which should be held by the similarity function. For similarity functions that do not meet this property, the STreeNDCMiner algorithm was proposed. However, for searching frequent patterns, this algorithm explores all subsets of features, which could be very expensive. In this work, we propose a frequent similar pattern mining algorithm for similarity functions that do not meet the Downward Closure property, which is faster than STreeNDCMiner and loses fewer frequent similar patterns than ObjectMiner and STreeDCMiner. Also we show the quality of the set of frequent similar patterns computed by our algorithm with respect to the quality of the set of frequent similar patterns computed by the other algorithms, in a supervised classification context.
more …
By
AcostaMendoza, Niusvel; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.; GagoAlonso, Andrés; MedinaPagola, José E.
Show all (5)
Post to Citeulike
1 Citations
Frequent approximate subgraph (FAS) mining and graph clustering are important techniques in Data Mining with great practical relevance. In FAS mining, some approximations in data are allowed for identifying graph patterns, which could be used for solving other pattern recognition tasks like supervised classification and clustering. In this paper, we explore the use of the patterns identified by a FAS mining algorithm on a graph collection for image clustering. Some experiments are performed on image databases for showing that by using the FASs mined from a graph collection under the bag of features image approach, it is possible to improve the clustering results reported by other stateoftheart methods.
more …
By
Gago Alonso, Andrés; Medina Pagola, José Eladio; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Show all (4)
Post to Citeulike
5 Citations
In this paper, a new algorithm for mining frequent connected subgraphs called gRed (graph Candidate Reduction Miner) is presented. This algorithm is based on the gSpan algorithm proposed by Yan and Jan. In this method, the mining process is optimized introducing new heuristics to reduce the number of candidates. The performance of gRed is compared against two of the most popular and efficient algorithms available in the literature (gSpan and Gaston). The experimentation on real world databases shows the performance of our proposal overcoming gSpan, and achieving better performance than Gaston for low minimal support when databases are large.
more …
By
GarcíaBorroto, Milton; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel
Post to Citeulike
6 Citations
Obtaining an accurate class prediction of a query object is an important component of supervised classification. However, it could be important to understand the classification in terms of the application domain, mostly if the prediction disagrees with the expected results. Many accurate classifiers are unable to explain their classification results in terms understandable by an application expert. Emerging Pattern classifiers, on the other hand, are accurate and easy to understand. However, they have two characteristics that could degrade their accuracy: global discretization of numerical attributes and high sensitivity to the support threshold value. In this paper, we introduce a novel algorithm to find emerging patterns without global discretization, which uses an accurate estimation of the support threshold. Experimental results show that our classifier attains higher accuracy than other understandable classifiers, while being competitive with Nearest Neighbors and Support Vector Machines classifiers.
more …
By
GutiérrezRodríguez, Andrés Eduardo; MedinaPérez, Miguel Angel; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; GarcíaBorroto, Milton
Show all (5)
Post to Citeulike
4 Citations
Ultraviolet Spectra (UVS) analysis is a frequent tool in tasks like diseases diagnosis, drugs detection and hyperspectral remote sensing. A key point in these applications is the UVS comparison function. Although there are several UVS comparisons functions, creating good dissimilarity functions is still a challenge because there are different substances with very similar spectra and the same substance may produce different spectra. In this paper, we introduce a new spectral dissimilarity measure for substances identification, based on the way experts visually match the spectra shapes. We also combine the new measure with the Spectral Correlation Measure. A set of experiments conducted with a database of real substances reveals superior results of the combined dissimilarity, with respect to stateoftheart measures. We use Receiver Operating Characteristic curve analysis to show that our proposal get the best tradeoff between false positive rates and true positive rates.
more …
By
DuarteVillaseñor, Miriam Mónica; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Francisco; FloresGarrido, Marisol
Show all (4)
Post to Citeulike
4 Citations
Multiclass problems, i.e., classification problems involving more than two classes, are a common scenario in supervised classification. An important approach to solve this type of problems consists in using binary classifiers repeated times; within this category we find nested dichotomies. However, most of the methods for building nested dichotomies use a random strategy, which does not guarantee finding a good one. In this work, we propose new nonrandom methods for building nested dichotomies, using the idea of reducing misclassification errors by separating in the higher levels those classes that are easier to separate; and, in the lower levels those classes that are more difficult to separate. In order to evaluate the performance of the proposed methods, we compare them against methods that randomly build nested dichotomies, using some datasets (with mixed data) taken from the UCI repository.
more …
By
CarrascoOchoa, Jesús Ariel; RuizShulcloper, José; DelaVegaDoría, Lucía Angélica
Post to Citeulike
Abstract
Typical e:testors are useful to do feature selection in supervised classification problems with mixed incomplete data, where similarity function is not the total coincidence, but it is a one threshold function. In this kind of problems, modifications on the training matrix can appear very frequently. Any modification of the training matrix can change the set of all typical ε:testors, so this set must be recomputed after each modification. But, complexity of algorithms for calculating all typical ε:testors of a training matrix is too high. In this paper we analyze how the set of all typical ε:testors changes after modifications. An alternative method to calculate all typical ε:testors of the modified training matrix is exposed. The new method’s complexity is analyzed and some experimental results are shown.
more …
By
GarcíaBorroto, Milton; VilluendasRey, Yenny; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Show all (4)
Post to Citeulike
4 Citations
The Nearest Neighbor classifier is a simple but powerful nonparametric technique for supervised classification. However, it is very sensitive to noise and outliers, which could decrease the classifier accuracy. To overcome this problem, we propose two new editing methods based on maximum similarity graphs. Numerical experiments in several databases show the high quality performance of our methods according to classifier accuracy.
more …
By
AcostaMendoza, Niusvel; GagoAlonso, Andrés; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Francisco; MedinaPagola, José E.
Show all (5)
Post to Citeulike
1 Citations
Feature selection is an essential preprocessing step for classifiers with high dimensional training sets. In pattern recognition, feature selection improves the performance of classification by reducing the feature space but preserving the classification capabilities of the original feature space. Image classification using frequent approximate subgraph mining (FASM) is an example where the benefits of features selections are needed. This is due using frequent approximate subgraphs (FAS) leads to high dimensional representations. In this paper, we explore the use of feature selection algorithms in order to reduce the representation of an image collection represented through FASs. In our results we report a dimensionality reduction of over 50% of the original features and we get similar classification results than those reported by using all the features.
more …
By
GarcíaBorroto, Milton; LoyolaGonzalez, Octavio; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel
Show all (4)
Post to Citeulike
Contrast pattern miners and contrast pattern classifiers typically use a quality measure to evaluate the discriminative power of a pattern. Since many quality measures exist, it is important to perform comparative studies among them. Nevertheless, previous studies mostly compare measures based on how they impact the classification accuracy. In this paper, we introduce a comparative study of quality measures over different aspects: accuracy using the whole training set, accuracy using pattern subsets, and accuracy and compression for filtering patterns. Experiments over 10 quality measures in 25 repository databases show that there is a huge correlation among different quality measures and that the most accurate quality measures are not appropriate in contexts like pattern filtering.
more …
By
RodríguezGonzález, Ansel Y.; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; RuizShulcloper, José
Show all (4)
Post to Citeulike
In this paper, we focus on frequent pattern mining using non Boolean similarity functions. Several properties and propositions that allow pruning the search space of frequent similar patterns, are proposed. Based on these properties, an algorithm for mining frequent similar patterns using non Boolean similarity functions is also introduced. We evaluate the quality of the frequent similar patterns computed by our algorithm by means of a supervised classifier based on frequent patterns.
more …
By
HernándezLeón, Raudel; HernándezPalancar, José; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Francisco
Show all (4)
Post to Citeulike
In this paper, we propose two improvements to CARNF classifier, which is a classifier based on Class Association Rules (CARs). The first one, is a theoretical proof that allows selecting the minimum Netconf threshold, independently of the dataset, that avoids ambiguity at the classification stage. The second one, is a new coverage criterion, which aims to reduce the number of noncovered unseentransactions during the classification stage. Experiments over several datasets show that the improved classifier, called CARNF^{ + }, beats the best reported classifiers based on CARs, including the original CARNF classifier.
more …
By
Hernandez, Julio; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Francisco
Post to Citeulike
5 Citations
Instance selection methods get low accuracy in problems with imbalanced databases. In the literature, the problem of imbalanced databases has been tackled applying oversampling or undersampling methods. Therefore, in this paper, we present an empirical study about the use of oversampling and undersampling methods to improve the accuracy of instance selection methods on imbalanced databases. We apply different oversampling and undersampling methods jointly with instance selectors over several public imbalanced databases. Our experimental results show that using oversampling and undersampling methods significantly improves the accuracy for the minority class.
more …
By
GarcíaBorroto, Milton; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel
Post to Citeulike
3 Citations
Emerging Pattern classifiers are accurate and easy to understand classifiers. However, they have two characteristics that can degrade their accuracy: global discretization of numerical attributes and high sensitivity to the support threshold value. In this paper, we introduce a novel algorithm to find emerging patterns without global discretization. Additionally, we propose a new method for building cascades of emerging pattern classifiers, which combines the higher accuracy of classifying with higher support thresholds with the lower levels of abstention of classifying with lower thresholds. Experimental results show that our cascade attains higher accuracy than other stateoftheart classifiers, including one of the most accurate emerging pattern based classifier.
more …
By
TenorioGonzález, Ana Cecilia; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel
Post to Citeulike
Performance and accuracy of a neural network are strongly related to its design. Designing a neural network involves topology (number of neurons, number of layers, number of synapses between layers, etc.), training synapse weights, and parameter selection. Radial basis function neural networks (RBFNNs) could additionally require some other parameters, for example, the means and standard deviations if the activation function of neurons in the hidden layer is a Gaussian function. Commonly, Genetic Algorithms and Evolution Strategies have been used for automatically designing RBFNNs In this work, the use of prototype selection methods for designing a RBFNN is proposed and studied. Experimental results show the viability of designing RBFNNs using prototype selection.
more …
By
AcostaMendoza, Niusvel; CarrascoOchoa, Jesús Ariel; GagoAlonso, Andrés; MartínezTrinidad, José Francisco; MedinaPagola, José Eladio
Show all (5)
Post to Citeulike
In data mining, frequent approximate subgraph (FAS) mining techniques has taken the full attention of several applications, where some approximations are allowed between graphs for identifying important patterns. In the last four years, the application of FAS mining algorithms over multigraphs has reported relevant results in different pattern recognition tasks like supervised classification and object identification. However, to the best of our knowledge, there is no reported work where the patterns identified by a FAS mining algorithm over multigraph collections are used for image clustering. Thus, in this paper, we explore the use of multigraph FASs for image clustering. Some experiments are performed over image collections for showing that by using multigraph FASs under the bag of features image approach, the image clustering results reported by using simplegraph FAS can be improved.
more …
By
AcostaMendoza, Niusvel; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.; GagoAlonso, Andrés; MedinaPagola, José E.
Show all (5)
Post to Citeulike
2 Citations
Currently, there has been an increase in the use of frequent approximate subgraph (FAS) mining for different applications like graph classification. In graph classification tasks, FAS mining algorithms over graph collections have achieved good results, specially those algorithms that allow distortions between labels, keeping the graph topology. However, there are some applications where multigraphs are used for data representation, but FAS miners have been designed to work only with simplegraphs. Therefore, in this paper, in order to deal with multigraph structures, we propose a method based on graph transformations for FAS mining in multigraph collections.
more …
By
PinillaBuitrago, Laura Alejandra; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel
Post to Citeulike
4 Citations
Optimal Subsequence Bijection (OSB) is a method that allows comparing two sequences of endnodes of two skeleton graphs which represent articulated shapes of 2D images. The OSB dissimilarity function uses a constant penalty cost for all endnodes not matching between two skeleton graphs; this can be a problem, especially in those cases where there is a big amount of not matching endnodes. In this paper, a new penalty scheme for OSB, assigning variable penalties on endnodes not matching between two skeleton graphs, is proposed. The experimental results show that the new penalty scheme improves the results on supervised classification, compared with the original OSB.
more …
By
LazoCortés, Manuel S.; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel
Post to Citeulike
In Rough Set Theory, reducts are minimal subsets of attributes that retain the ability of the whole set of attributes to discern objects belonging to different classes. On the other hand, classspecific reducts allow discerning objects belonging to a specific class from all other classes. This latest type of reduct has been little studied. Here we show, through a case study, some advantages of using classspecific reducts instead of classic ones in a rulebased classifier. Our results show that it is worthwhile to deepen in the study of this issue.
more …
By
GarcíaBorroto, Milton; VilluendasRey, Yenny; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Fco.
Show all (4)
Post to Citeulike
Finding a minimal subset of objects that correctly classify the training set for the nearest neighbors classifier has been an active research area in Pattern Recognition and Machine Learning communities for decades. Although finding the Minimal Consistent Subset is not feasible in many real applications, several authors have proposed methods to find small consistent subsets. In this paper, we introduce a novel algorithm for this task, based on support graphs. Experiments over a wide range of repository databases show that our algorithm finds consistent subsets with lower cardinality than traditional methods.
more …
By
GarcíaBorroto, Milton; MartínezTrinidad, José Fco; CarrascoOchoa, Jesús Ariel
Post to Citeulike
16 Citations
Emerging pattern–based classification is an ongoing branch in Pattern Recognition. However, despite its simplicity and accurate results, this classification includes an a priori discretization step that may degrade the classification accuracy. In this paper, we introduce fuzzy emerging patterns as an extension of emerging patterns to deal with numerical attributes using fuzzy discretization. Based on fuzzy emerging patterns, we propose a new classifier that uses a novel graph organization of patterns. The new classifier outperforms some popular and state of the art classifiers on several UCI repository databases. In a pairwise comparison, it significantly beats every other single classifier.
more …
By
AcostaMendoza, Niusvel; GagoAlonso, Andrés; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Francisco; MedinaPagola, José Eladio
Show all (5)
Post to Citeulike
1 Citations
Frequent approximate subgraph (FAS) mining has become an important technique into the data mining. However, FAS miners produce a large number of FASs affecting the computational performance of methods using them. For solving this problem, in the literature, several algorithms for mining only maximal or closed patterns have been proposed. However, there is no algorithm for mining FASs from multigraph collections. For this reason, in this paper, we introduce an algorithm for mining generalized closed FASs from multigraph collections. The proposed algorithm obtains more patterns than the maximal ones, but less than the closed one, covering patterns with small frequency differences. In our experiments over two realworld multigraph collections, we show how our proposal reduces the size of the FAS set.
more …
By
LoyolaGonzález, Octavio; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel; GarcíaBorroto, Milton
Show all (4)
Post to Citeulike
Selecting contrast patterns is an important task for patternbased classifiers, especially in class imbalance problems. The main reason is that the contrast pattern miners commonly extract several patterns with high support for the majority class and only a few patterns, with low support, for the minority class. This produces a bias of classification results toward the majority class, obtaining a low accuracy for the minority class. In this paper, we introduce a contrast pattern selection method for class imbalance problems. Our proposal selects all the contrast patterns for the minority class and a certain percent of contrast patterns for the majority class. Our experiments performed over several imbalanced databases show that our proposal selects significantly better contrast patterns, obtaining better AUC results, than other approaches reported in the literature.
more …
By
RodríguezGonzález, Ansel Y.; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel; RuizShulcloper, José
Show all (4)
Post to Citeulike
3 Citations
Frequent Pattern Mining is an important task due to the relevance of repetitions on data, also it is a fundamental step in the Association Rule Mining. Most of the current algorithms for mining frequent patterns assume that two object subdescriptions are similar if and only if they are equal, but in soft sciences some other similarity functions are used. In this work, we focus on the search of frequent patterns on Mixed Data, incorporating similarity between objects. We propose a novel and efficient algorithm to mine frequent similar patterns for a family of similarity functions that fulfill Downward Closure property and we also propose another algorithm for the remaining families of similarity functions. Some experiments over mixed datasets are done, and the results are compared against the ObjectMiner algorithm.
more …
By
GarcíaHernández, René A.; MartínezTrinidad, José Fco.; CarrascoOchoa, Jesús Ariel
Post to Citeulike
Abstract
One of the sequential pattern mining problems is to find the maximal frequent sequences in a database with a β support. In this paper, we propose a new algorithm to find all the maximal frequent sequences in a text instead of a database. Our algorithm in comparison with the typical sequential pattern mining algorithms avoids the joining, pruning and text scanning steps. Some experiments have shown that it is possible to get all the maximal frequent sequences in a few seconds for medium texts.
more …
By
GagoAlonso, Andrés; CarrascoOchoa, Jesús Ariel; MedinaPagola, José Eladio; MartínezTrinidad, José Fco.
Show all (4)
Post to Citeulike
Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is proposed. In our proposal, the candidate space does not contain duplicates; therefore, we can use a fast evaluation strategy for reducing the cost of SI tests without wasting memory resources. Thus, we introduce a data structure to reduce the cost of SI tests. The performance of our algorithm is compared against other reported algorithms.
more …
By
CarbajalHernández, José Juan; SánchezFernández, Luis P.; SánchezPérez, Luis A.; CarrascoOchoa, Jesús Ariel; MartínezTrinidad, José Francisco
Show all (5)
Post to Citeulike
An associative memory is a binary relationship between inputs and outputs, which is stored in an M matrix. In this paper, we propose a modification of the Steinbuch Lernmatrix model in order to process realvalued patterns, avoiding binarization processes and reducing computational burden. The proposed model is used in experiments with noisy environments, where the performance and efficiency of the memory is proven. A comparison between the proposed and the original model shows a good response and efficiency in the classification process of the new Lernmatrix.
more …
By
GarcíaHernández, René Arnulfo; MartínezTrinidad, José Francisco; CarrascoOchoa, Jesús Ariel
Post to Citeulike
5 Citations
Sequential pattern mining is an important tool for solving many data mining tasks and it has broad applications. However, only few efforts have been made to extract this kind of patterns in a textual database. Due to its broad applications in text mining problems, finding these textual patterns is important because they can be extracted from text independently of the language. Also, they are human readable patterns or descriptors of the text, which do not lose the sequential order of the words in the document. But the problem of discovering sequential patterns in a database of documents presents special characteristics which make it intractable for most of the apriorilike candidategenerationandtest approaches. Recent studies indicate that the patterngrowth methodology could speed up the sequential pattern mining. In this paper we propose a patterngrowth based algorithm (DIMASP) to discover all the maximal sequential patterns in a document database. Furthermore, DIMASP is incremental and independent of the support threshold. Finally, we compare the performance of DIMASP against GSP, DELISP, GenPrefixSpan and cSPADE algorithms.
more …
