Today, I am going to talk about large Power BI data models. A lot of times I explain my people about the scalability of the in-memory DAX engine that is responsible for holding and compressing the data in a Power BI model, I get the question what is the limit. What is like a maximum amount of data (rows?) that a PBI model can hold.
And it is actually hard to ‘just’ answer. We know that previously in the PBI service (in the cloud) the limit has been 1 GB for a pbix file. Currently, the limit has been uplifted to 10 GB. But, how much is exactly 10 GB of a PBI data model or file?
Because we have to understand, the limit 10 GB, now, is not how much memory a file consumes, as some might think. The model can actually consume MORE memory that the file size of the ‘sleeping’ (saved on disk) file.
Well, let’s look at a test I made for this purpose, recently. I took the standard ContosoRetailDW SQL database (you can download here).
Using the ‘FactSales’ which has around 3,5 million rows… is not quite enough. Therefore, I got my hands dirty and producing some SQL code I enlarged the table quite a bit making it little over 100 million rows.
Thus for the technical: the data table was 3.406.089 rows, now is 108.994.848, thus enlarging it 32 times.
Now, I did the standard, open up a Power BI instance, and import the data!
So, it took around 15 min, locally, to import my data into the Power BI desktop app. And let us now see the memory consumption – around 6 gigs.
It is to be said that the memory consumption of a pbix file can vary – in my example going from 6 GB to little over 3 GB in … same file, same data, no refresh or any other.
But, as said there is no limit (in the PBI service) on how much memory a PBI model consumes, rather how big the file is. So, I am going to save the file and see its size.
2.34 GB … quite hard to predict based on memory consumption, huh? But, this is the file size on which the limit is imposed for the Power BI service.
2.34 gigs to the limit 10 GB… there is still some room for the size of the data model. The model could be 4.27 times bigger to still be allow under the limit imposed.
And one important point should be considered, this is using just Power BI only, without other enterprise-like services in the Microsoft BI stack as Analysis Services (which has no hard limit except the VM or service you buy) or a SQL Data Warehouse underneath. And with the rising question of long-lasting data refreshes, the recently introduced Power BI Premium feature Incremental refresh, has come to help clients reduce the refresh time of large data models significantly! (Not to mention speed the development time, because only a small portion of data has to imported when building the model.)
All in all this is a test example, and results may vary between projects. Understanding the in-memory engine that is responsible for compression of data, a big role in it plays the column data dictionary size (number of distinct values). Be sure to check Power BI performance tips when before if you have a large pbix file but not as much rows:
- tall, narrow tables are faster (and consume less memory/space)
- do not overuse calculated columns (only in small tables)
- check the column data types (text instead of number consumes more RAM)
- develop a model (that resembles) as a star schema