Showing 1 to 10 of 2063 matching Articles
Results per page:
By
Shinoda, Yuji; Yoshida, Kenji; Nakayama, Hirotaka
Post to Citeulike
1 Citations
Adjusting the content to each student is a major issue in eLearning. From this viewpoint, a learning course as a series of content also must be adjusted according to the performance of the students. We propose a method that combines clustering and decision tree learning for constructing scenarios of the students’ actions. The global statuses of the students are reflected to the clusters, and the local and sequential actions of the students are reflected to the decision trees. The results of eLearning tests gathered from Japanese junior high school students was processed by our proposed method. We graded the clusters by adaptation to the trees, and selected a set of clusters as a scenario for the students. These scenarios have a possibility of aiding the adjustment, and revision of learning courses.
more …
By
Sun, Ning; Yu, Hong
Post to Citeulike
Cluster analysis is a method of unsupervised learning technology which is playing a more and more important role in data mining. However, one basic and difficult question for clustering is how to gain the number of clusters automatically. The traditional solution for the problem is to introduce a single validity index which may lead to failure because the index is bias to some specific condition. On the other hand, most of the existing clustering algorithms are based on hard partitioning which can not reflect the uncertainty of the data in the clustering process. To combat these drawbacks, this paper proposes a method to determine the number of clusters automatically based on threeway decision and multivalidity index which includes three parts: (1) the kmeans clustering algorithm is devised to obtain the threeway clustering results; (2) multivalidity indexes are employed to evaluate the results and each evaluated result is weighed according to the mean similarity between the corresponding clustering result and the others based on the idea of the median partition in clustering ensemble; and (3) the comprehensive evaluation results are sorted and the best ranked k value is selected as the optional number of clusters. The experimental results show that the proposed method is better than the single evaluation method used in the fusion at determining the number of clusters automatically.
more …
By
Lou, Chang; Gao, Xiaofeng; Wu, Fan; Chen, Guihai
Show all (4)
Post to Citeulike
3 Citations
Maximizing the network lifetime is always a main challenge ahead of wireless sensor network (WSN). Clustering and routing has been proved to be energyefficient strategies for extending the network lifetime. In this paper, we put forward an energyconsumption model for sensors in WSNs and calculate network energy consumption in a short period for any given network configuration, including sensor state scheduling, clustering, and routing information. Then we address an energyaware optimal planning problem with area coverage and connectivity constraints, seeking for the best sensor scheduling scheme to extend the network lifetime. We formulate it as an Integer Linear Programming (ILP) model, and add some extra constraints to reduce the scale of the model. We use Gurobi to compute this model and compare the basic model, scalereduced model, with a previous work OPTALLRCC [6]. The simulation results prove that the reduction is necessary and our model have better performance than the previous model.
more …
By
Chen, Haoyuan; Fan, Yali; Jiang, Jing; Chen, Xiang
Show all (4)
Post to Citeulike
Predicting users’ mobility trajectories is significant for service providers, such as recommendation systems for tourist routing, emergency warning, etc. However, the former researchers predict the next location merely by observing the past individual trajectories, which usually performs poor in the accuracy of trace prediction. In this paper, POIs (Points of Interest) information is used to adjust the weight parameters of the predicted results, and the rationality and precision would be improved. The cellular towers are firstly classified into seven types of functional area through POIs. Then the target user’s next possible functional area could be speculated, which acts as a supervision of the ultimate prediction outcome. We use the DP (Dirichlet Process) mixture model to identify similarity between different users and predict users’ locations by leveraging these similar users. As is shown in the results, the methods proposed above are highly adaptive and precise when being utilized to predict users’ mobility trajectories.
more …
By
Bouguila, Nizar; Ziou, Djemel
Post to Citeulike
17 Citations
Mixture modeling is one of the most useful tools in machine learning and data mining applications. An important challenge when applying finite mixture models is the selection of the number of clusters which best describes the data. Recent developments have shown that this problem can be handled by the application of nonparametric Bayesian techniques to mixture modeling. Another important crucial preprocessing step to mixture learning is the selection of the most relevant features. The main approach in this paper, to tackle these problems, consists on storing the knowledge in a generalized Dirichlet mixture model by applying nonparametric Bayesian estimation and inference techniques. Specifically, we extend finite generalized Dirichlet mixture models to the infinite case in which the number of components and relevant features do not need to be known a priori. This extension provides a natural representation of uncertainty regarding the challenging problem of model selection. We propose a Markov Chain Monte Carlo algorithm to learn the resulted infinite mixture. Through applications involving text and image categorization, we show that infinite mixture models offer a more powerful and robust performance than classic finite mixtures for both clustering and feature selection.
more …
By
Salah, Aghiles; Rogovschi, Nicoleta; Nadif, Mohamed
Post to Citeulike
Collaborative filtering (CF) systems aim at recommending a set of personalized items for an active user, according to the preferences of other similar users. Many methods have been developed and some, such those based on Similarity and Matrix Factorization (MF) can achieve very good recommendation accuracy, but unfortunately they are computationally prohibitive. Thus, applying such approaches to realworld applications in which available information evolves frequently, is a nontrivial task. To address this problem, we propose a novel efficient incremental CF system, based on a weighted clustering approach. Our system is able to provide a high quality of recommendations with a very low computation cost. Experimental results on several realworld datasets, confirm the efficiency and the effectiveness of our method by demonstrating that it is significantly better than existing incremental CF methods in terms of both scalability and recommendation quality.
more …
By
Kurban, Hasan; Jenne, Mark; Dalkilic, Mehmet M.
Post to Citeulike
2 Citations
Existing data mining techniques, more particularly iterative learning algorithms, become overwhelmed with big data. While parallelism is an obvious and, usually, necessary strategy, we observe that both (1) continually revisiting data and (2) visiting all data are two of the most prominent problems especially for iterative, unsupervised algorithms like expectation maximization algorithm for clustering (EMT). Our strategy is to embed EMT into a nonlinear hierarchical data structure (heap) that allows us to (1) separate data that needs to be revisited from data that does not and (2) narrow the iteration toward the data that is more difficult to cluster. We call this extended EMT, EM*. We show our EM* algorithm outperform EMT algorithm over large realworld and synthetic data sets. We lastly conclude with some theoretical underpinnings that explain why EM* is successful.
more …
By
Bonmati, Ester; Bardera, Anton; Boada, Imma; Feixas, Miquel; Sbert, Mateu
Show all (5)
Post to Citeulike
2 Citations
Clustering techniques aim organizing data into groups whose members are similar. A key element of these techniques is the definition of a similarity measure. The information bottleneck method provides us a full solution of the clustering problem with no need to define a similarity measure, since a variable
$$X$$
is clustered depending on a control variable
$$Y$$
by maximizing the mutual information between them. In this paper, we propose a hierarchical clustering algorithm based on the information bottleneck method such that, instead of using a control variable, the different possible values of a Markov process are clustered by maximally preserving the mutual information between two consecutive states of the Markov process. These two states can be seen as the input and the output of an information channel that is used as a control process, similarly to how the variable
$$Y$$
is used as a control variable in the original information bottleneck algorithm. We present both agglomerative and divisive versions of our hierarchical clustering approach and two different applications. The first one, to quantize an image by grouping intensity bins of the image histograms, is tested on synthetic, photographic and medical images and compared with handlabelled images, hierarchical clustering using Euclidean distance and nonnegative matrix factorization methods. The second one, to cluster brain regions by grouping them depending on their connectivity, is tested on medical data. In all the applications, the obtained results demonstrate the efficacy of the method in getting clusters with high mutual information.
more …
By
Deng, Ze; Hu, Yangyang; Zhu, Mao; Huang, Xiaohui; Du, Bo
Show all (5)
Post to Citeulike
24 Citations
Clustering trajectory data is an important way to mine hidden information behind moving object sampling data, such as understanding trends in movement patterns, gaining high popularity in geographic information and so on. In the era of ‘Big data’, the current approaches for clustering trajectory data generally do not apply for excessive costs in both scalability and computing performance for trajectory big data. Aiming at these problems, this study first proposes a new clustering algorithm for trajectory big data, namely TraPOPTICS by modifying a scalable clustering algorithm for point data (POPTICS). TraPOPTICS has employed the spatiotemporal distance function and trajectory indexing to support trajectory data. TraPOPTICS can process the trajectory big data in a distributed manner to meet a great scalability. Towards providing a fast solution to clustering trajectory big data, this study has explored the feasibility to utilize the contemporary generalpurpose computing on the graphics processing unit (GPGPU). The GPGPUaided clustering approach parallelized the TraPOPTICS with the HyperQ feature of Kelper GPU and massive GPU threads. The experimental results indicate that (1) the TraPOPTICS algorithm has a comparable clustering quality with TOPTICS (the state of art work of clustering trajectories in a centralized fashion) and outperforms TOPTICS by average four times in terms of scalability, and (2) the GTraPOPTICS has a comparable clustering quality with TPOPTICS as well and further gains about 30 speedup on average for clustering trajectories comparing to TraPOPTICS with eight threads. The proposed algorithms exhibit great scalability and computing performance in clustering trajectory big data.
more …
