Difference between Anomalies and Outliers

Outlier = legitimate data point that’s far away from the mean or median in a distribution

Anomaly = illegitimate data point that’s generated by a different process than whatever generated the rest of the data

Ravi Parikh has written a very interesting blog on this topic – Garbage In, Garbage Out: How Anomalies Can Wreck Your Data. The blog talks more about anomalies and how to detect them through proper visualization technique. He gives an example of detecting election fraud through the following visualization:


Do read the full post!

Interesting Question

Do you have the capability to assess data quality? Or even suggest appropriate analysis visualizations to help distinguish between Anomalies and Outliers? … Vijay Ghei


