Difference between Anomalies and Outliers


Outlier = legitimate data point that’s far away from the mean or median in a distribution

Anomaly = illegitimate data point that’s generated by a different process than whatever generated the rest of the data

Ravi Parikh has written a very interesting blog on this topic – Garbage In, Garbage Out: How Anomalies Can Wreck Your Data. The blog talks more about anomalies and how to detect them through proper visualization technique. He gives an example of detecting election fraud through the following visualization:

anomalies_election_fraud

Do read the full post!

Interesting Question

Do you have the capability to assess data quality? Or even suggest appropriate analysis visualizations to help distinguish between Anomalies and Outliers? … Vijay Ghei

 

Comments
One Response to “Difference between Anomalies and Outliers”
  1. nklata says:

    I have been following Ravi’s blogs and he does make a lot of interesting and valid observations!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: