I'm quite curious by nature, and when asked if VIM + SQL + PowerBI could find objects outside of the building shell, I jumped on the chance to learn a new skill + help a customer out. So, as part of my ongoing learning journey in data analysis and wanting to share it with you all here's a data packed story about my journey into something called Interquartile Range! for the last couple of years I've been exploring BIM and how we can extract meaningful insights from it, and the 3D Geometric parts of BIM often present challenges that standard data doesn't, and one of those challenges is dealing with outliers. So, today, I thought I'd share my recent experience with detecting outliers in 3D data using Power BI. It's been an interesting adventure, and I hope this sharing will help you, as it helped me, in your data analysis journey!
Oh also, you can just test the VIM Outliers report here, and if you're really keen, create a report yourself on your Windows desktop...
An Introduction to the Interquartile Range (IQR)
While I started with implementing Standard Deviations, I wasn't happy with the result, and so I upped my game and jumped into the concept of the Interquartile Range (IQR). I found that IQR's statistical measure used to identify outliers - data points that deviate from the 'expected' range of values in a dataset worked a lot better.
The concept is rooted in the idea of 'quartiles'. In a sorted dataset:
The IQR is simply the difference between Q3 and Q1. It gives a measure of where the 'middle half' of the data lies. Generally, outliers are identified as values that fall below Q1 - 1.5xIQR or above Q3 + 1.5xIQR.
In BIM projects, it turns out not all outliers are equal, and it would be valuable to categorize them based on how much they deviate from the typical range. This led me to come up with a multi-level approach to outlier identification. I honed in on the numbers :
This segmentation of outliers helped me understand the severity of deviation and marked the data analysis more nuanced. Outliers are often discarded as anomalies, but they can provide novel insights and, as I learned, deserve more investigation.
Then I had to get my IQR idea into something usable for the spatial data in Power BI. Here's a piece of Power Query M code snippet that illustrates this process. Thanks to the VIM data model in SQL I had some sweet 3D geometry nodes and world space coordinates data to work with, and I needed to identify the outliers among these data points.
First, I began by computing the basic statistical parameters such as Min, Max, and Average for each dimension (X, Y, Z) of the 3D geometry nodes I was working with:
Next, I computed the IQR for each dimension:
Finally, I classified the outliers using the multi-level system I described earlier:
Here, I added a new column "Outlier" to the table. The volume of a 3D object was used to determine whether it's an outlier and if so, its outlier level.
This journey of exploring 3D data and working with outlier detection in Power BI has certainly opened my eyes to neat ways to slice and dice VIM in SQL. I've learned not just how to technically implement these concepts but also about the importance of outliers in our data. Remember, outliers are not just errors to be dismissed but may hold unique insights. They are worth our attention. Keep learning and happy analyzing!
Unlimited access for a single user with 2GB/mo uploads and fast customer support.