Showing 1 to 10 of 1245 matching Articles
Results per page:
Export (CSV)
By
Munteanu, Alexander; Wornowizki, Max
We consider the twosample homogeneity problem where the information contained in two samples is used to test the equality of the underlying distributions. In cases where one sample is simulated by a procedure modelling the data generating process of another observed sample, a mere rejection of the null hypothesis is unsatisfactory. Instead, the data analyst would like to know how the simulation can be improved. Based on the popular Kolmogorov–Smirnov test and a general mixture model, we propose an algorithm that determines an appropriate correction distribution function. Complementing the simulation sample by a given proportion of observations sampled from this distribution reduces the Kolmogorov–Smirnov distance between the modified and the observed sample. Therefore, the correction distribution indicates possible improvements to the current simulation process. We prove our algorithm to run in linear time when applied to sorted samples. We further illustrate its intuitive results on simulated as well as on real data sets from astrophysics and bioinformatics.
more …
By
Allison, J. S.; Santana, L.; Smit, N.; Visagie, I. J. H.
Show all (4)
3 Citations
The exponential distribution is a popular model both in practice and in theoretical work. As a result, a multitude of tests based on varied characterisations have been developed for testing the hypothesis that observed data are realised from this distribution. Many of the recently developed tests contain a tuning parameter, usually appearing in a weight function. In this paper we compare the powers of 20 tests for exponentiality—some containing a tuning parameter and some that do not. To ensure a fair ‘apples to apples’ comparison between each of the tests, we employ a datadependent choice of the tuning parameter for those tests that contain these parameters. The comparisons are conducted for various samples sizes and for a large number of alternative distributions. The results of the simulation study show that the test with the best overall power performance is the Baringhaus and Henze test, followed closely by the test by Henze and Meintanis; both tests contain a tuning parameter. The score test by Cox and Oakes performs the best among those tests that do not include a tuning parameter.
more …
By
Ten Eyck, Patrick ; Cavanaugh, Joseph E.
2 Citations
In the logistic regression framework, we present the development and investigation of three model selection criteria based on crossvalidatory analogues of the traditional and adjusted cstatistics. These criteria are designed to estimate three corresponding measures of predictive error: the model misspecification prediction error, the fitting sample prediction error, and the sum of prediction errors. We aim to show that these estimators serve as suitable model selection criteria, facilitating the identification of a model that appropriately balances goodnessoffit and parsimony, while achieving generalizability. We examine the properties of the selection criteria via an extensive simulation study designed as a factorial experiment. We then employ these measures in a practical application based on modeling the occurrence of heart disease.
more …
By
Schmidberger, Markus; Vicedo, Esmeralda; Mansmann, Ulrich
1 Citations
As microarray data quality can affect each step of the microarray analysis process, quality assessment and control is an integral part. It detects divergent measurements beyond the acceptable level of random fluctuations. This empirical study identifies association and correlation between the six quality assessment methods for microarray outlier detection used in the arrayQualityMetrics package version 2.2.2. For evaluation two different agreement tests—Cohen’s Kappa, after a homogeneity marginal criteria, and AC1 Statistic—, the Pearson Correlation Coefficient and realistic microarray data from the public ArrayExpress database have been used. It is possible to assess the quality of a data set using only four of the six currently proposed statistical methods to comprehensively quantify the quality information in large series of microarrays. This saves computation time and reduces decision complexity for the analyst. The new proposed rule is validated with data sets from biomedical studies.
more …
By
Liquet, Benoît; Saracco, Jérôme
3 Citations
In this paper, we consider a semiparametric regression model involving both pdimensional quantitative covariable X and categorical predictor Z, and including a dimension reduction of X via K indices X′β_{k}. The dependent variable Y can be real or qdimensional. We propose an approach based on SIR_{α} and pooled marginal slicing methods in order to estimate the space spanned by the β_{k}’s. We establish
$$\sqrt{n}$$
consistency of the proposed estimator. Simulation studies show the numerical qualities of our estimator.
more …
By
Morana, Claudio
1 Citations
A new noise filtering approach, based on flexible least squares (FLS) estimation of an unobserved component local level model, is introduced. The proposed FLS filter has been found to perform well in Monte Carlo analysis, independently of the persistence properties of the data and the size of the signal to noise ratio, ouperforming in general even the Wiener Kolmogorov filter, which, theoretically, is a minimum mean square estimator. Moreover, a key advantage of the proposed filter, relatively to available competitors, is that any persistence property of the data can be handled, without any pretesting, being computationally fast and not demanding, and easy to be implemented as well.
more …
By
Indira, K.; Kanmani, S.
6 Citations
Association rule mining is a data mining task on a great deal of academic research has been done and many algorithms are proposed. Association rule mining is treated as a twofold process by most of the methods. It increases the complexity of the system and takes up more time and space. Evolutionary Computation (EC) are fast growing search based optimization method for association rule mining. Among ECs particle swarm optimization (PSO) is more suited for mining association rules. The bottleneck of PSO is setting the precise values for their control parameters. Setting values to the control parameter is done either through parameter tuning or parameter control. This paper proposes an adaptive methodology for the control parameters in PSO namely, acceleration coefficients and inertia weight based on estimation of evolution state and fitness value respectively. Both of the proposed adaptive methods when tested on five datasets from University of California Irvine (UCI) repository proved to generate association rules with better accuracy and rule measures compared to simple PSO.
more …
By
Dinwoodie, I. H.; MacGibbon, Brenda
Summary
A data set on types of congenital heart malformations for sibling pairs of Fraser and Hunter (1975) is analyzed exactly for quasiindependence with Monte Carlo methods. Exact pvalues are computed for a test of parameter significance and a test of goodnessoffit which contradict the model of quasiindependence and confirm an earlier analysis of MacGibbon (1983).
more …
By
Krämer, Nicole
17 Citations
The aim of this paper is twofold. In the first part, we recapitulate the main results regarding the shrinkage properties of partial least squares (PLS) regression. In particular, we give an alternative proof of the shape of the PLS shrinkage factors. It is well known that some of the factors are >1. We discuss in detail the effect of shrinkage factors for the mean squared error of linear estimators and argue that we cannot extend the results to PLS directly, as it is nonlinear. In the second part, we investigate the effect of shrinkage factors empirically. In particular, we point out that experiments on simulated and real world data show that bounding the absolute value of the PLS shrinkage factors by 1 seems to leads to a lower mean squared error.
more …
