Go Beyond the 10 GB Power BI Capacity

Power BI free lets you have 1 GB of data per account. On the other hand, pro gives ten times that or 10 GB. Nevertheless, in many organizations, you have a much larger data size than 10 GB. Sometimes, you are dealing with Terabytes, sometimes more Petabytes. Would Power BI restrict you to the visualization?

Short Response; No! In this post, we’ll clarify the problems you face when dealing with big datasets. As with any other solution, there are pros and cons of working with Power BI with large data sets.

Without further ado, let us explore them together.

Compressed Structure of Power BI

First of all, we have to mention that Power BI effectively compresses every dataset before loading it into memory. This is one of the major advantages of the xVelocity In-memory engine that Power BI, Power Pivot, and SSAS Tabular have built upon. The compression happens automatically, which means you don’t need to set a setup or allow Power BI to do something. Each dataset will be compressed at a reasonable level when you import the data into the Power BI.

We have imported CSV file with a data size of ~800 MB as an example. The same file took around 8Mb in Power BI. That’s remarkable compression.

However, the compression rate is not always that way. It depends on the type of data set, the column type of data, and some other details. The bottom line in this section is that if you have a data set that’s more than 10 GB but it doesn’t go past that mark extensively, consider loading it into Power BI and see how compact it would be. Maybe you get it under the 10GB cap.

Live Connection

We may interact live with some data sources with Power BI, on-site or in the cloud. Okay, here’s the trick. If your data set is large, then use Live contact! A simple trick, but very useful.

You can have a SQL Server, Oracle, or whatever data source you want with whatever amount of data you need and build a live link to it with Power BI. The live connection won’t import data into the Power BI model. Live connection brings the metadata and data structure into Power BI, and then you can access the data based on that. A question will be sent to the data source with each visualization, and the answer will be carried within Power BI.

Live Connection to SQL Server On-Premises

Consider an example of such Live connection to the on-premises database of SQL Server. We have large table database, comprising 48 million data records. The table itself is about ~800 MB.

Live Connection to SQL Server On-Premises

We know it’s not more than 10 GB, but even if we have data table of 10 TB scale, the process would be the same. For this example, this table is large enough to demonstrate how live connections work.

Selecting this table (or any other tables in the data set) and clicking Load will show you the option to choose between Live connection or Import. DirectQuery means Live Link.

Choose between Live connection or Import in Power BI Interface

After creating the live connection, you can create a relationship in the model, or build a visualization for that.

Create a relationship in the model, or build a visualization

There are, however, some limitations with Live connection, which we will explore later in this post.

** For the Live connection to on-premises data sources, you need to have Power BI Enterprise or Personal Gateway installed and configured.

Live Connection to Azure

There are also a lot of Azure data sources that you can use for the Azure SQL Database live connection, which is the similar SQL Server on-site database engine (with some differences, of course) but on the cloud. Azure SQL Data Warehouse is the framework of the cloud database, which supports both unstructured and structured data. Azure SQL DW is capable of expanding the database’s computing engine irrespective of its storage capacity. And Sparks on Azure HDInsight is another azure source of data that can be used on azure for live access to the big data framework.

Azure SQL Data Warehouse Connect

Limitations of Live Connection

A live link is ideal for connecting to large data sets. It will not load the data into the database, making the solution for Power BI quite limited. There are some drawbacks to this type of connection, like no:

  • Support for DAX,
  • Data model formatting in Power BI’s Data tab,
  • Support for multiple data sources, and
  • Support for Power Q&A.

Let’s go one by one through them.

No DAX Support

There will be no Data tab in Power BI with Live link to construct measured ratios, columns, or tables. In the stage of the data source, you have to build all the calculations. This means creating all measured columns or measurements in SQL Server source tables if you have SQL Server as the source. Your modeling should be performed via the source of the data. You could only put in Power BI desktop.

No DAX Support

Formatting is also not available via the Power BI Modeling tab. So, if you want to set decimal points, or set column type, it is not possible to do it within Power BI. You need to handle all of these using the data source (or Power Query, read further down).

Formatting is also not available via the Power BI Modeling tab

Full Visualization Support

Thankfully, Live communication mode fully supports the Power BI visualization portion. The underlying reason is that, in Power BI, visualization is a separate device. This is one of the amazing reasons why a system built on top of separate components usually works better than a product all in one with no underlying component. You can create any simulation, and you are not going to be constrained at all.

Multiple Data Sources is Not Supported

One of the great advantages of Power BI is that you can integrate data sets from multiple sources, multiple databases or files can participate in the creation of a model. Unfortunately, you won’t be able to have data from more than one data source when working with a Live connection. If you want to get data on the same server from another database, you will be facing messages shown below.

Multiple Data Sources is Not Supported

Import Data to add another data source, which is not useful for the scenario of working with large data sets.

No Power Q&A

As you know, one of the features of the Power BI website is natural language questions and answering engines called Power Q&A. This functionality isn’t included in Power BI Live Connection (at the time of writing this post). Your dashboard won’t have a Power Q&A question box at the top when you have a live connection.

No Power Q&A

Don’t Forget the Power Query

Thankfully, Live Connection also provides Power Request. It helps you to enter tables, flatten them if you need them, apply data transformation, and prepare the data as you wish. Power Query can also set the data types to interpret the Power BI model more familiarly.

DimProduct, DimProductSubCategory, and DimProductCategory joined to bring all fields in one table

You can see in the screenshot above that joined DimProduct, DimProductSubCategory, and DimProductCategory to bring all fields in one table; DimProduct. And all of that happened with Live Link.

Optimization at Data Source Level

A live connection to the source of data means the report sends queries to the sources of the data. Data Sources are different with regard to response time. Tabular SSAS could produce faster results and a slower standard SQL Server database. Don’t forget that all efficiency and indexing tips should be considered carefully when working with a Live link. When you deal with SQL Server, for example, find proper indexing, column-store indexes, and many other tips for optimization and performance tuning.

Just for very small example of performance tuning, this is the result we get when we have standard index of 48 Million records on my table.

Example of performance tuning

It takes minutes and seconds to run normal pick amount from my table with 48 million records. The same query will respond in less than second when we have clustered index of the column store; and significantly improved performance when we have a Clustered Column Store index on the same table with the same amount of data rows.

It takes 4 minutes and 4 seconds to run a normal pick amount from a table with 48 million records.

We won’t teach you all performance tuning in this post because that’s an entirely different subject on its own. The most important thing is that; performance tuning for each source of data is different. Oracle, SQL Server, and SSAS performance tunings are different. For this section, your buddy would be Google, and for you to review a huge amount of free content available on the Internet.

Final Words

We hope you enjoyed this post; you can share your feedback and your opinion about the article in the comment section below.

Leave a Comment

ProofFactor