Power BI free lets you have 1 GB of data per account and pro 10 GB. Nevertheless, in many organizations, you have a much larger data size than 10 GB, sometimes you are dealing with Terabytes, sometimes more Petabytes. Would Power BI restrict you to the visualization? Short Response; No! In this post, we’ll clarify the problems you face when dealing with big data sets. As with any other solution, there are pros and cons of working with Power BI with large data sets, let us explore them together.
Compressed Structure of Power BI
First of all, we have to mention that Power BI effectively compresses every data set before loading into memory, this is one of the major advantages of the xVelocity In-memory engine that Power BI, Power Pivot and SSAS Tabular have built upon. Compression happens automatically, which means you don’t need to set a setup or allow Power BI to do something. Each data set will be compressed at a reasonable level when you import the data into the Power BI.
We have imported a CSV file with a data size of ~800 MB as an example. We had only 8 MB of Power BI file to deal with when it loaded into Power BI. That’s a remarkable compression.
However, the compression rate is not always that way; it depends on the type of data set, the column type of data, and some other details. The bottom line in this section is that if you have a data set that’s more than 10 GB and won’t grow dramatically, then consider loading it into Power BI and see how compact it would be, maybe you get it under a cap of 10 GB and then you’re good to go.
We may interact live with some data sources with Power BI, on-site or in the cloud. Okay, here’s the trick; if your data set is large, then use Live contact! Simple trick but useful. You can have a SQL Server, Oracle or whatever data source you want with whatever amount of data you need and build a live link to it with Power BI. The live connection won’t import data into the Power BI model. Live connection brings the metadata and data structure into Power BI, and then you can access data based on that. A question will be sent to the data source with each visualization, and the answer will be carried.
Live Connection to SQL Server On-Premises
Consider an example of such a Live connection to the on-premises database of SQL Server. We have a large table database, comprising 48 million data records. The table itself is about ~800 MB.
We know it’s not more than 10 GB, but even if we have a data table of 10 TB scale, the process would be the same. For this example, this table is large enough to demonstrate how live connections work.
Selecting this table (or any other tables in the data set) and clicking Load will show you the option to choose between Live connection or Import. DirectQuery means Live Link.
After creating the live connection, you can create a relationship in the model, or build visualization for that.
There are, however, some limitations with Live connection, which we will explore later in this post.
** For the Live connection to on-premises data sources, you need to have Power BI Enterprise or Personal Gateway installed and configured.
Live Connection to Azure
There are also a lot of Azure data sources that you can use for the Azure SQL Database live connection, which is the similar SQL Server on-site database engine (with some differences, of course) but on the cloud. Azure SQL Data Warehouse is the framework of the cloud database, which supports both unstructured and structured data. Azure SQL DW is capable of expanding the database’s computing engine irrespective of its storage capacity. And Sparks on Azure HDInsight is another azure source of data that can be used on azure for live access to the big data framework.
Limitations of Live Connection
A live link is ideal for connecting to large data sets and will not load the data into the database, so the solution for Power BI would be very limited. But there are some drawbacks to this type of connection, such as; no support for DAX, no data model formatting in Power BI’s Data tab, no support for multiple data sources, and no support for Power Q&A. Let’s go one by one through them.
No DAX Support
There will be no Data tab in Power BI with Live link to construct measured ratios, columns, or tables. In the stage of the data source, you have to build all the calculations. This means creating all measured columns or measurements in SQL Server source tables if you have SQL Server as the source. Your modeling should be performed via the source of the data. You could only put in Power BI desktop.
Formatting is also not available via the Power BI Modeling tab. So, if you want to set decimal points, or set a column type, it is not possible to use Power BI. You need to handle all of these using the data source (or Power Query, read further down).
Full Visualization Support
Thankfully, Live communication mode fully supports the Power BI visualization portion. The underlying reason is that, in Power BI, visualization is a separate device. This is one of the amazing reasons why a system built on top of separate components usually works better than a product all in one with no underlying component. You can create any simulation, and you are not going to be constrained at all.
Multiple Data Sources is Not Supported
One of the great advantages of Power BI is that you can integrate data sets from multiple sources, multiple databases or files can participate in the creation of a model. Unfortunately, you won’t be able to have data from more than one data source when working with a Live connection. If you want to get data on the same server from another database, you will be facing messages shown below.
Import Data to add another data source, which is not useful for the scenario of working with large data sets.
No Power Q&A
As you know, one of the features of the Power BI website is natural language questions and answering engines called Power Q&A. This functionality isn’t included in Power BI Live Connection (at the time of writing this post). Your dashboard won’t have a Power Q&A question box at the top when you have a live connection.
Don’t Forget the Power Query
Thankfully, Live Connection also provides Power Request. It helps you to enter tables, flatten them if you need them, apply data transformation, and prepare the data as you wish. Power Query can also set the data types to interpret the Power BI model more familiarly.
You can see in the screenshot above that I joined DimProduct, DimProductSubCategory, and DimProductCategory to bring all fields in one table; DimProduct. And all of that happened with Live Link.
Optimization at Data Source Level
A live connection to the source of data means the report sends queries to the sources of the data. Data Sources are different with regard to response time. Tabular SSAS could produce faster results and a slower standard SQL Server database. Don’t forget that all efficiency and indexing tips should be considered carefully when working with a Live link. When you deal with SQL Server, for example, find proper indexing, column-store indexes, and many other tips for optimization and performance tuning.
Just for a very small example of performance tuning, this is the result we get when we have a standard index of 48 Million records on my table.
It takes 4 minutes and 4 seconds to run a normal pick amount from my table with 48 million records. The same query will respond in less than a second when we have clustered index of the column store; and significantly improved performance when we have a Clustered Column Store index on the same table with the same amount of data rows.
We won’t teach you all performance tuning in this post because that’s an entirely different subject on its own. The most important thing is that; performance tuning for each source of data is different. Oracle, SQL Server, and SSAS performance tunings are different. For this section, your buddy would be Google, and for you to review a huge amount of free content available on the Internet.
We hope you enjoyed this post; you can share your feedback and your opinion about the article in the comment section below.