CLASSIFICATION features. In this scenario result will not satisfied

CLASSIFICATION
& MISSING VALUES

 DATA MINING TECHNIQUES USED ON VOTING PATTERNS

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

Sadaf Mehmood, Naima Talib

 

Sadaf Mehmood

[email protected]

Naima Talib

[email protected]

 

 

Abstract

In this paper, number of approaches
to classifications and missing attributes are presented and compared on a data
set. Classification is a data mining technique which is used to predict group membership
for data instances. This paper presents some basic techniques of classification
and methods to deal with missing values, then their results are compared. The
algorithm which produces the result with highest accuracy is most suitable for
the given data set.

Keywords-
Data
mining, data preprocessing, classification algorithm, Decision tree analysis, missing
attributes, voting pattern

Introduction

Data preprocessing is one
of the major steps in the knowledge discovery process. Data preprocessing involves
transforming the raw input data into an appropriate form or which may include
dealing with missing values. In order to perform classification on the data, it
must be pre-processed or cleaned to remove noise and redundant values and
selecting records and attributes that would be relevant. Sometime peoples are likely
making mistakes when they analyzing the data or

 

Sometime when they are
trying to establish relationships between multiple features. In this scenario result
will not satisfied if the data is not accurate, i.e. if it is noisy or contains
missing value. Because, it may not give us the valid information or knowledge that
we want or need, or it may provide misleading information. So, it is difficult
to find solutions to certain problems.

Data mining techniques help us
in this scenario. Data mining approach are used for large number of values to
find new and useful patterns that might remain unknown. It involves the use of data
analysis tools to discover, previously unknown, valid patterns, meaningful
relationships, and to summarize it, in large amount of data sets.

Classification is
 technique that assigns items in a
collection to target classes. Classification goal is to predict accurately
target class for each case in the data set. For example according to our data
set, a classification model could be used to identify the target class is from
republican or democrat.

Missing
values are a common occurrence in data sets, and you need to have a strategy
for removing them. A missing value can indicate a number of different things in
your data set. Possibly the data was not available, or may not applicable, or
the event did not happen. It may because of person who entered or recorded the
data did not know the right value, or missed filling in. Data mining techniques
helps in the missing values. Typically, they ignore the missing values, or
exclude records/attribute that contained missing values, or replace missing
values with mean(mean will be taken from attribute), or it may conclude missing
values from existing values.

Our data set that is based on identify
voting patterns in the US House of Representatives. Each state in the US is
represented in the House proportional to its population, but each state is
entitled to at least one Representative. The total number of Representatives in
data set are 435.US online voting is based on CQA standard. Target class of our
data set decide the representative is democrat or republican. In paper that we
followed use only classification techniques. In data set there are some missing
values, we first apply classification techniques on data set then we remove
these missing values by using missing values techniques that is built in WEKA.
So we use classification and missing values both techniques on data set and
compare the result of these.

Methodology

Classification is define as

The commonly used methods for data mining
classification tasks can be classified into the following groups.

·        
Decision tree induction methods

o  
DecisionStump

o  
J48

o  
RandomForest

o  
Randomtree

·        
Rule-based methods

o  
Decisiontable

o  
JRip

o  
ZeroR

·        
Memory based learning

·        
Neural networks

·        
Bayesian network

o  
Navie Bayes

o  
Navie bayes multinominaltext

o  
Navie bayes updaetable

We use 3 techniques (trees, Rule based, Bayesian) from
these in WEKA and take comparison of these three.

Missing
Attributes:

After classification we remove missing values by using
data mining techniques.

Approaches to Missing attribute values:

·        
Ignored attributed that have missing value

·        
Replace missing values with mean if data
in the form of numeric, replace missing value with most frequent value if the
data is categorical

Fill
in missing values manually based on your domain knowledge.

 

 

 

Experiment

1.     
Classification
results generated using WEKA:

 

Algorithm

Correctly classified instances

Incorrectly classified instances

Absolute error

Naïve
Bayes

90.11%

392

9.8%

43

20%

Decision
table

94.9%

413

5%

22

20%

AdaboostM1

95%

415

4.5%

20

11%

Attribute
selected classifier

95%

416

4.3%

19

14.6%

JRip

95%

415

4%

20

17%

J48

96.3%

419

3%

16

12%

Random
Forest

96%

418

3.9%

17

15%
 

Decisionstump

95%

416

4.3%

19

16%

                                                                     
Table 1

 

In this table we got result after applying algorithms.
Classification methods are typically strong in modeling interactions it is more
difficult to recommend any one technique as superior to others as the choice of
a dataset. But J48 algorithm give best result as compared to others and have
less error rate as compared to others.

2.     
Missing
attribute Values using WEKA

We use three filters for removing missing values

·        
Replace missing values

·        
Replace missing with user constant

·        
Replace with missing value

We use “Replace missing values” filter after applying
this we have not missing values in our dataset. We shared some screenshot as a
result in which there is 0% missing values in attribute. After removing missing
values we got complete dataset

 

 

 

 

 

Results
without removing missing values                   Results after removing missing values

                  

 

            

 

           

Conclusion

Our main objective was comparison of classification
techniques and comparison of methods to deal with missing attributes values.
Results of our experiment will be shown in table 1 and in images. In classification
J48 perform well as compared to others       and have less absolute error. The goal of
classification algorithms is to generate more certain, precise and accurate
system results. Numerous methods have been suggested for the creation of
ensemble of classifiers. Classification methods are typically strong in
modeling interactions.

References

Schlimmer,
J. C. Concept acquisition through representational adjustment. Doctoral
dissertation, Department of Information and Computer Science, University of
California, Irvine, CA

Han, J.
& Kamber, M.  Data Mining:
Concepts and Techniques. USA:Morgan Kaufmann Publishers.

https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records

How To Handle Missing Values In Machine Learning Data With Weka

http://www.saedsayad.com/missing_values.htmQuinlan,
J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers,
San Mateo CA

Agrawal, R., Imielinski, T., & Swami, A.  Mining association rules between sets of items
in large databases.ACM SIGMOD Conference, pp. 207-216.

M. Al-Razgan, A. S. Al-Khalifa, and H. S. Al-Khalifa,
“Educational data mining: A systematic review of the published literature
20062013,” in Proc. the 1st International Conference on Advanced Data and
Information Engineering, 2013, pp. 711-719.

Sundar.C, M.Chitradevi and Dr.G.Geetharamani ?Classification
of Cardiotocogram Data using Neural Network based Machine Learning Technique?
International Journal of Computer Applications (0975 – 888) Volume 47– No.14,
June 2012

 

 

 

 

 
 
 

 

 

x

Hi!
I'm Brenda!

Would you like to get a custom essay? How about receiving a customized one?

Check it out