DETECTION OF OUTLIERS IN REAL TIME KINEMATIC (RTK) GLOBAL POSITIONING SYSTEM (GPS) OBSERVATION, A RESEARCH PROJECT TOPIC ON QUANTITY SURVEYING
1.1 BACKGROUND OF STUDY
In carrying out data analysis, it is of uttermost importance to identify outlying observations that deviates so much from the overall datasets before data modelling; otherwise aberrant data may result in model misspeciﬁcation, biased parameter estimation and incorrect results. That is to say it is futile to do data based analysis when data are contaminated with outliers because outliers can lead to incorrect analysis of results.
Outliers are observations that do not follow the statistical distribution of the bulk of the data, and consequently may lead to erroneous results with respect to statistical analysis (Liu, et al. 2004). According to Hawkins (1980), an outlier can be referred to as an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism.
Outliers are results of mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations (Hodge and Austin, 2004). Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can also identify errors and remove their contaminating eﬀects on the data set and as such to purify the data for processing.
Outliers has the ability to alter the results arrived at if they are not carefully handled. The identification and handling of outliers leads to significantly greater computational process. Because of this, removal of outlying observations can enhance the quality of data used for statistical inferences. Eliminated outliers from observations will have positive effects on the results of data analysis and data mining. Simple statistical estimates, like sample mean and standard deviation can be significantly biased by individual outliers that are very far from the middle of the distribution (Kaya, 2010).
The overall objective of outlier identification and removal is to discern the odd data, whose behaviour is very anomalous when put side-by-side with the rest of the data set. Assessing the abnormal behaviour of outliers aid in the uncovering the valuable knowledge hidden behind them and also assist in decision making for the improvement of service quality. The main purpose outlier detection is to separate those observations that are divergent from the rest of the dataset. Outlier identification and removal is applied in several fields such as fraud detection, intrusion detection, data cleaning, medical diagnosis, etc. Data mining includes supervised and unsupervised approaches (Nithya and Caroline, 2014).
Surveying networks are used in many geomatics engineering projects to provide positioning information. In a surveying network, geodetic observations (height differences, distances, angles, directions and GPS baseline components) are made and then parameter estimation is realized using the method of least squares (Yetkin, 2013).
1.2 STATEMENT OF THE PROBLEM
The least squares technique is the most commonly used parameter estimation tool in geomatics. It is carried out by minimizing the sum of squares of weighted residuals. The advantage of the least squares method is that it has the ability to give an unbiased and minimum variance estimate. However, the least squares technique is limited it must use observations free from gross error i.e. blunder and systematic bias to provide optimal results. Unfortunately, these unwanted errors are often encountered in practice. Therefore, outlier detection and elimination in spatial data is very necessary in conducting spatial data analysis (Yetkin, 2013).