COMPARISON OF SPURIOUS CORRELATION METHODS USING PROBABILITY DISTRIBUTOINS AND PROPORTION OF REJECTING A TRUE NULL HYPOTHESIS

855

COMPARISON OF SPURIOUS CORRELATION METHODS USING PROBABILITY DISTRIBUTOINS AND PROPORTION OF REJECTING A TRUE NULL HYPOTHESIS (STATISTICS PROJECT TOPICS AND MATERIALS)

ABSTRACT

The problem of spurious correlation analysis, e.g. Pearson moment-product correlation test is that, the data need to be normally distributed. This research work compares spurious correlation methods using some non- normal probability distributions in order to obtain the method with the best degree of association among them. The methods were compared using proportions of rejecting true null hypothesis obtained from t and z test statistics for testing correlation coefficients. Data from Normal, log-normal, exponential and contaminated normal distributions were generated using simulation method with different sample sizes. The results indicate that, when the data are normal, exponential and contaminated normal random distributions, Pearson’s and Spearman’s rank have the best proportion of rejecting the true null hypothesis. But, when the data are log-normal distribution, only Spearman’s rank correlation coefficient has the best proportion of rejecting the true null hypothesis. Thus, Pearson’s and Spearman’s rank have the best degree of association under normal, exponential and contaminated normal distributions. While, for log-normal distribution only Spearman’s rank has the best degree of association.

CHAPTER ONE

1.0 INTRODUCTION

1.1 Background to the Study

The awareness of problems related to the statistical analysis on spurious correlation began as early as 1897 by Karl Pearson in his seminar paper on spurious correlations, which title began significantly with the words “On a form of spurious correlation” and then repeatedly by a geologist Chayes (1960).

The main source of information about the history of spurious correlation test is that, Pearson used the term spurious correlation to “distinguish the correlations of scientific importance from those that were not.” The problem, according to Pearson, was that some correlations did not indicate an “organic relationship.” Although this term is never defined, the examples used suggest that spurious correlation was the same as a correlation between two variables that were not causally connected and the term correlation coefficient only measures the strength of linear relationships (Johnson and Kotz 1992). The simplicity and interpretability should be the main ideas when selecting measures of association. Historically, the Pearson correlation has been the main association measure in multivariate analysis. It is simple, as it relates only two variables

of a random vector; it concerns only linear transformation in Rⁿ , i.e. change of scale plus a shift. Interpretation relies on the linear regression ideas, which in turn are related

to the geometry of Rⁿ , where covariance appears as a Euclidean inner product in the space of samples (Lovell et al, 2013). All these desirable properties will be achieved when Pearson correlation is applied to study association. Correlations between variables can be measured with the use of different indices (coefficients). The three most popular are: Pearson’s coefficient r , Spearman’s rho coefficient r and Kendall’s….

DOWNLOAD COMPLETE PROJECT MATERIAL