  Article
June 30, 2023

# Detecting Outliers in the 3D part of BIM using Power BI

I'm quite curious by nature, and when asked if VIM + SQL + PowerBI could find objects outside of the building shell, I jumped on the chance to learn a new skill + help a customer out. So, as part of my ongoing learning journey in data analysis and wanting to share it with you all here's a data packed story about my journey into something called Interquartile Range! for the last couple of years I've been exploring BIM and how we can extract meaningful insights from it, and the 3D Geometric parts of BIM often present challenges that standard data doesn't, and one of those challenges is dealing with outliers. So, today, I thought I'd share my recent experience with detecting outliers in 3D data using Power BI. It's been an interesting adventure, and I hope this sharing will help you, as it helped me, in your data analysis journey!

Oh also, you can just test the VIM Outliers report here, and if you're really keen, create a report yourself on your Windows desktop...

An Introduction to the Interquartile Range (IQR)

While I started with implementing Standard Deviations, I wasn't happy with the result, and so I upped my game and jumped into the concept of the Interquartile Range (IQR). I found that IQR's statistical measure used to identify outliers - data points that deviate from the 'expected' range of values in a dataset worked a lot better.

The concept is rooted in the idea of 'quartiles'. In a sorted dataset:

• The first quartile (Q1) is the middle value between the smallest number and the median.
• The second quartile (Q2) is the median of the data.
• The third quartile (Q3) is the middle value between the median and the highest value.

The IQR is simply the difference between Q3 and Q1. It gives a measure of where the 'middle half' of the data lies. Generally, outliers are identified as values that fall below Q1 - 1.5xIQR or above Q3 + 1.5xIQR.

### Implementing a Multi-Level Outlier Classification

In BIM projects, it turns out not all outliers are equal, and it would be valuable to categorize them based on how much they deviate from the typical range. This led me to come up with a multi-level approach to outlier identification. I honed in on the numbers :

• Close outliers: Values falling between 1.8xIQR and 2.9xIQR from Q1 or Q3.
• Mid outliers: Values falling between 2.9xIQR and 4.5xIQR from Q1 or Q3.
• Far outliers: Extreme outliers that fall more than 4.5xIQR from Q1 or Q3.

This segmentation of outliers helped me understand the severity of deviation and marked the data analysis more nuanced. Outliers are often discarded as anomalies, but they can provide novel insights and, as I learned, deserve more investigation.

### Getting it all hooked up in Power BI

Then I had to get my IQR idea into something usable for the spatial data in Power BI. Here's a piece of Power Query M code snippet that illustrates this process. Thanks to the VIM data model in SQL I had some sweet 3D geometry nodes and world space coordinates data to work with, and I needed to identify the outliers among these data points.

First, I began by computing the basic statistical parameters such as Min, Max, and Average for each dimension (X, Y, Z) of the 3D geometry nodes I was working with:

###### // Calculate the basic statistical parameters    Stats = Table.Group(Source, {"Dimension"}, {        {"MinX", each List.Min([WorldSpace.Box.Min.X]), type number},        {"MaxX", each List.Max([WorldSpace.Box.Max.X]), type number},        {"AvgX", each List.Average([WorldSpace.Box.X.Center]), type number},        ...        // Repeat the steps for Y, Z dimensions        ...    })in    Stats

Next, I computed the IQR for each dimension:

###### // Calculate the Interquartile Range (IQR)    Q1 = #"Grouped Rows"[Median]{0},    Q3 = #"Grouped Rows"[Median]{2},    IQR = Q3 - Q1,    // Add the calculated IQR back to the table    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "IQR", each IQR)in    #"Added Custom"

Finally, I classified the outliers using the multi-level system I described earlier:

###### // Detect and Classify Outliers    Outliers = Table.AddColumn(#"Added Index", "Outlier",        each if [Volume] < (Q1 - 1.8*IQR) or [Volume] > (Q3 + 1.8*IQR) then            if [Volume] < (Q1 - 2.9*IQR) or [Volume] > (Q3 + 2.9*IQR) then                if [Volume] < (Q1 - 4.5*IQR) or [Volume] > (Q3 + 4.5*IQR) then "3"                else "2"            else "1"        else "0"    )// Expand the data in Power Query Editor    #"Expanded Outlier" = Table.ExpandRecordColumn(Outliers, "Outlier", {"Type"}, {"Outlier.Type"})in    #"Expanded Outlier"

Here, I added a new column "Outlier" to the table. The volume of a 3D object was used to determine whether it's an outlier and if so, its outlier level.

This journey of exploring 3D data and working with outlier detection in Power BI has certainly opened my eyes to neat ways to slice and dice VIM in SQL. I've learned not just how to technically implement these concepts but also about the importance of outliers in our data. Remember, outliers are not just errors to be dismissed but may hold unique insights. They are worth our attention. Keep learning and happy analyzing! ## Unlock faster business intelligence with VIM Cloud    