Understanding and Implementing Machine Learning Models with Dummy Variables with Low Variance

0
431

Abstract

Machine learning is creating some importance in daily life and predicting something to be done with the data. We need to handle the data in an adequate format, and the information we gather from the data and the insights of data will be identified based on the implementation of the rules we generate and the rules must be semantic with time to time and requirement to requirement. Dummy variables are used for implementing and handling the categorical variables which are by default object category in modeling. These cannot be directly used in the prediction model and for that we need to use and understand the purpose of collecting the type of data we have the information we gathered will be further used for identifying the objects of the model and the features we gather will impact the accuracy of model. In machine learning we compute the categorical variables based on back propagation and the requirement of feature selection plays a vital role in understanding the accuracy management. Regression analysis and classification analysis differ the usage of dummy variables. In this chapter, we are not replacing the variables with dummy values and instead we are adding a new feature with dummy variables. There will be a major difference in implementing the classification model and regression model with the same features. We achieved highest accuracy of 91% with DBScan with clustering mechanism.