What is EDA?
EDA stands for exploratory data analysis and is an approach or philosophy for data analysis that employs a variety of techniques to –
- maximize insight into data sets.
- uncover underlying structure
- extract important variables
- detect outliers and anomalies
- test underlying assumptions
- develop parsimonious models
- determine optimal factor settings
It refers to a set of procedures for producing descriptive and graphical summaries of data. The best part about exploratory data analysis is that it allows you to examine the data as it is without making any assumptions!! It is a useful way of understanding your data and relationships among variables and identifying problems such as data entry errors.
There are the following types of data –
- Categorical
- Nominal
- Ordinal
- Continuous
- Interval
- Ratio
When identifying the type of variable for data, what is the type of data!
Nominal Data – Here values represent discrete units. In this case, changing the order of data does not changes their values.
Ordinal Data – Here values represent the discrete and ordered units. Here the order is very important!
Interval Data – Here, ordered units and the distance between the units is the same. There is no absolute zero in this case.
Ratio Data – It has all the properties of interval data with the additional property of having absolute zero.
Why is it important to know the kind of data in variable?
Statistical methods are designed to work with certain kind of data and not others. Many of the methods that you use analyze continuous data are not same as the ones that are used to analyze categorical data. If the data type is unknown, there are chances of producing wrong type of data. There are many statistical methods available for data analysis. You can choose, mean, median or mode!
EDA is an way of statistics which is used to summarize data using visual methods. A statistical model can be used or not, but it is used know what data can tell us beyond the statistical tasks! It was promoted by John Turkey. EDA is different from IDA which looks out for assumptions for data.
Leave a comment