EVALUATEANDLOG AND DAX ENGINE OPTIMIZATIONS

This is the fourth and the last article on the EvaluateAndLog function. While DAX developers were excited about the prospect of printing the values of a portion of a DAX expression, many of the early adopters of the function were puzzled by a mysterious message they ran into from time to time: “EvaluateAndLog function is not executed due to optimization.” Another question on people’s mind is “While it’s great to be able to see the values of a sub-expression used in my calculation, is there a way for me to see the values of other expressions which are not needed to calculate the final result?” I will provide answers to both questions in this post.

For today’s DAX examples, I will be building on top of the three-table (‘Sales’, ‘Customer’, ‘Product’) model I used last time and will provide a link to the final pbix file at the bottom.

Let’s start with a simple DAX measure

Measure6 = SumX(‘Sales’, EvaluateAndLog(‘Sales'[Sales Amount]))

If you add [Measure6] to a card visual and expect to see the list of values from the [Sales Amount] column that contribute to the sum, you will see instead what seems like an error message, Figure 1.

Figure 1.

The message tells users that instead of first calculating the ‘Sales'[Sales Amount] sub-expression and then calculating the sum, the DAX Engine was able to send the entire SumX as a single calculation to the storage engine, either the Vertipaq Engine or the DirectQuery Engine. In other words, instead of reading all the rows from the ‘Sales’ table into the DAX Engine, and then adding the values of the [Sales Amount] column one row at a time, the DAX Engine simply asks the storage engine to return the sum value directly. This implies that the EvaluateAndLog function is confined within the DAX Engine and never pushed down to the storage engine. For those who are confused by the separation of the DAX Engine and the Vertipaq Engine because in their mind the two work together as the Power BI Engine, it would be easier for them to imagine a DirectQuery model where all three tables are hosted in a relational database. There is obviously no effective way for the DAX Engine to push EvaluateAndLog into a single SQL query to retrieve a set of intermediate results in addition to the requested aggregation. If the DAX Engine is able to push a bigger expression down to the storage engine, any EvaluateAndLog embedded within will not be able to serialize intermediate values while inside the storage engine therefore is optimized away.

The DAX Engine has implemented a large number of optimizations. Pushing as many operations as possible down to the storage engine is one of the most important ones. Here is another example:

Measure7 = Calculate(
    Sum(‘Sales'[Sales Amount]),
    EvaluateAndLog(‘Product'[Color] = “Red”)
)

If we add [Measure7] to a card visual, we will receive the same notice as in the case of [Measure6]. That’s because the DAX Engine is able to fold the entire Calculate expression into a single storage engine request that joins the ‘Sales’ table with the ‘Product’ table, applies the [Color] = “Red” filter, aggregates the values of the [Sales Amount] column, and finally returns a filtered sum value. Since the DAX Engine is more likely to optimize commonly written expressions, users will run into the special notice more often on frequently used expression patterns.

An expression alone does not determine whether or not an optimization happens, the evaluation context also influences the decision. Let’s use an example adapted from a very good question asked by David Bianconi on LinkedIn by defining a measure as

Measure8 = Calculate(
    Sum(‘Sales'[Sales Amount]),
    EvaluateAndLog(
        CalculateTable(‘Customer’, ‘Customer'[Education] In {“College”, Partial College”})
    )
)

Add the ‘Customer'[Education] column and [Measure8] to a table visual and we will get Figure 2. DAX Debug Output receives two DAXEvaluationLog events because [Measure8] is calculated twice, once in the context of the rows of the distinct values of the ‘Customer'[Education] column and the other in the total row. EvaluateAndLog serialized the table value in one event, Figure 3, but was optimized away in the other, Figure 4. Clearly the DAX Engine has decided that it’s able to optimize in one case but not the other because the contexts are different.

Figure 2.
Figure 3.
Figure 4.

Since optimizations are essential for reasonable query performance in many basic scenarios, the product team had chosen to retain some optimizations despite the presence of EvaluateAndLog. We may turn on even more optimizations in future releases to prevent EvaluateAndLog from causing disastrously expensive calculations. But we will reconsider the decision if we hear strong feedbacks from the community that users would rather see the intermediate result despite the potential risk.

One of the obvious optimizations in the DAX Engine is that the expression of an unused variable is not evaluated. But the product team has decided to make an exception for the EvaluateAndLog function. If the entire expression of an unused variable is wrapped in EvaluateAndLog, the expression will be evaluated! This feature enables DAX developers to print the values of arbitrary DAX expressions even though they don’t contribute to the final result. One can easily imagine possible applications of this feature. We can now print any contextual information in the middle of a DAX expression.

We’ll end this article with an example that takes advantage of the unused variable trick. If we define [Measure9] as

Measure9 = AverageX(
    EvaluateAndLog(Values(‘Sales'[Order Date])),
    Calculate(DistinctCount(‘Sales'[ProductKey]))
)

and add the above measure to a card visual we will see that EvaluateAndLog is optimized away. This one maybe hard to understand because the DAX Engine wasn’t able to push the entire average of distinct counts down to the Vertipaq Engine. On the other hand, the DAX Engine clearly needed to first calculate the distinct counts by different values of the [Order Date] column before it could calculate the average. As it turns out, the AverageX function is so smart that instead of getting the list of dates and the list of distinct counts in two separate steps, it’s able to combine them into a single step hence optimizing away a dedicated step just to retrieve the dates. That’s clever and all but it doesn’t help us see the list of dates if we really want to. To get what we want we can introduce an unused variable. If we rewrite [Measure9] as

Measure9 =
Var _unused = EvaluateAndLog(Values(‘Sales'[Order Date]), “_unused”, 1000)
Return
AverageX(Values(‘Sales'[Order Date]), Calculate(DistinctCount(‘Sales'[ProductKey])))

We will see all the dates that had product sales, Figure 5.

Figure 5.

Today we examined the interactions between the EvaluateAndLog function and optimizations inside the DAX Engine. Key takeaways from today’s lessons are:

  • The DAX Engine performs many optimizations by reducing a multi-step calculation into a single-step one.
  • The optimizations in today’s article are strictly from the perspective of the DAX Engine. A single-step calculation in the DAX Engine may take many steps to compute in storage engines.
  • The EvaluateAndLog function is confined to the DAX Engine and maybe optimized away, in which case a special notice is included in the JSON output of the DAXEvaluationLog event.
  • The DAX Engine will calculate the expression of an unused variable if its outmost function is EvaluateAndLog.

External Links

Download the latest version of Power BI Desktop if you want to try out today’s examples.

The pbix file used in today’s examples can be downloaded from here.

The current version of DAX Debug Output is here. You must have admin rights on the machine and agree to install the software from an unknown publisher. You download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

EVALUATEANDLOG AND DAX ENGINE OPTIMIZATIONS

UNDERSTAND THE OUTPUT OF EVALUATEANDLOG FUNCTION OF TABLE EXPRESSIONS

Last week, we learned how to interpret the output of the EvaluateAndLog function of scalar expressions. Albeit interesting, the new function would be much less useful if it could only return scalar values. After all, Power BI users can always define a scalar expression as a measure and then study its values in various contexts. The real power of EvaluateAndLog is that it can also wrap around any table expressions, the output of which was much harder to visualize until now.

One of the key takeaways from last week’s post is that a scalar expression produces many values in the presence of input columns. Each unique combination of values of input columns causes a reevaluation of the expression. The same is true for table expressions, which means the same table expression can generate multiple tables in a single query. Such dynamic behavior is what makes DAX powerful and hard to learn at the same time. With the help of the EvaluateAndLog function, we can now, for the first time, see all the intermediate tables instead of just imagining them in our head.

Let’s try a table-valued EvaluateAndLog with a simple measure:

Measure4 = CountRows(EvaluateAndLog(‘Sales’))

We inject EvaluateAndLog inside the CountRows function to see which rows are counted.

⚠️For this week’s examples, we will be building on top of last week’s 3-table model, the diagram of which is shown again in Figure 1 just like last week. That’s why our first example is already the 4th measure in the model. A link to the updated pbix file is provided at the end of this article.

⚠️ If you want to try the examples in this post on your own, make sure that you have installed the most recent update of the August 2022 release of Desktop which contains a critical bug fix. The version of your Desktop must be 2.108.997.0 or later.

Figure 1.

Adding ‘Product'[Color] and [Measure4] to a table visual, we get Figure 2 and two DAXEvaluationLog events, Figures 3 and 4. There are eight tables in Figure 3, one for each product color, and only one table in Figure 4 corresponding to the total value of the table visual. Unlike scalar expressions, a table expression has an “outputs” property in its JSON document which is a list of columns that define the schema of the table.

Figure 2.
Figure 3.
Figure 4.

It’s an interesting challenge how to present to the users in DAX Debug Output more than one table from the same table expression. Since all the tables share the same schema, I chose to show them in a single data grid just like the multiple values from a single scalar expression are displayed. This also happens to be the way such data is stored in the DAX Engine: rows from tables calculated from the same DAX expression are saved in a single data cache along with corresponding values of the input columns. To make it easier for users to differentiate between rows from different tables, I added to the data grid three extra columns, whose headers are in light and dark grayish blue colors:

  • [Table Number] indicates to which table the row belongs;
  • [Row Number] indicates which row in the table;
  • [Row Count] indicates the total number of rows in the table.

Since data tables can potentially contain a lot of columns and rows in which case will be very expensive to serialize, the EvaluateAndLog function, by default, only serializes up to 10 rows per table into the JSON output, but it also serializes the total number of rows as the value of the “rowCount” JSON property to let users know if any rows are skipped. Users can use the optional <max rows> parameter of the function to change the limit. Since intermediate tables are not sorted by the DAX Engine, there is no general way of controlling which subset of rows will show up in the output. However, you can use functions like Filter, TopN, or Sample to select a subset of rows you want to see, e.g.

CountRows(EvaluateAndLog(Sample(10, ‘Sales’, ‘Sales'[ProductKey], Asc)))

But injecting Filter/TopN/Sample functions would change the result of the CountRows function, therefore is not a desirable workaround. Fortunately, there is a trick to have better control over the output rows without changing the result of the original expression, I’ll show how to do it next week.

In addition to the row count limit per table, DAX Engine also limits the maximum size of the JSON text. Once the JSON output becomes too big, DAX Engine will stop the serialization except to make sure that the result is a valid JSON document. Moreover, DAX Engine will insert a special JSON property to inform users of the truncation. For example, the following expression

CountRows(EvaluateAndLog(CrossJoin(‘Product’, ‘Customer’), “large output”, 10000))

would produce a large JSON string with the following property toward the end.

“notice”: “Data truncated due to event size limit.”

Currently, the threshold of the JSON size limit is set at 1,000,000 characters but it’s subject to change in the future. Large JSON strings are slow to parse and display in the data grid of DAX Debug Output. When that is the case during your debugging session, you can check the “Parse JSON on demand” option to switch from automatic parsing to on-demand parsing. If the hard limit disappoints you because you want to use DAX Engine’s JSON serialization ability to export large amount of calculated data, you should be warned that it’s not a primary function of Power BI therefore performance will become an issue when data volume is large, but there are a couple of extra easter eggs hidden in the product which may allow you to do so anyway when they become publicly available.

Let’s end the post with a real-life debugging example. Around the time when I was writing the first post, I was asked to help debug a measure that didn’t return the desired result. In order to show the expression using our demo model, I rewrote the problematic DAX expressions into the following two measures:

Uniq Products = DistinctCount(‘Sales'[ProductKey])

Measure5 =
Var _temp = Summarize(‘Sales’, ‘Sales'[CustomerKey], “Uniq Products”, [Uniq Products])
Var _currCustomer = SelectedValue(‘Sales'[CustomerKey])
Var _prevCustomer = TopN(1, Filter(_temp, ‘Sales'[CustomerKey] < _currCustomer), ‘Sales'[CustomerKey], Desc)
Return [Uniq Products] – MinX(_prevCustomer, [Uniq Products])

The expression for [Measure5] is complex and inefficient but the idea was to show the contrast between an old way and a new way (yet another easter egg hidden in the product) of calculating the difference of measure values between the current row and the previous row in a table visual.

[Measure5] works in four steps:

  1. Calculate a two-column table of customers and the corresponding numbers of unique products purchased by them.
  2. Find the current customer in the evaluation context.
  3. Find the previous customer from the intermediate table in Step 1 ordered by the ‘Sales'[CustomerKey] column in ascending order.
  4. Calculate the difference between the numbers of unique products purchased by the current customer and by the previous customer.

But when adding both measures to a table visual, the delta values were clearly wrong, Figure 5. By the way, since the expression is O(N2), I had to apply a filter on the ‘Sales'[CustomerKey] column to limit the number of rows to 30 for the calculation to finish quickly.

Figure 5.

To find out what’s wrong with the expression, let’s wrap EvaluateAndLog around the Summarize function to see if we even get the first step right. We’d like to see a single table with all 30 customers and their product counts so we can navigate from any customer to the previous one. But the actual EvaluateAndLog output consists of 30 single-row tables, one table per customer. That’s because in DAX a simple table name by itself means a filtered table. Since ‘Sales'[CustomerKey] is in the filter context, the first parameter to Summarize, ‘Sales’, is filtered by the current [CustomerKey] value in the filter context which limits ‘Sales’ to only those transactions belonging to a single customer. Summarize would then group the filtered ‘Sales’ by [CustomerKey] to return a single row. This analysis can be easily verified by wrapping EvaluateAndLog directly around ‘Sales’ to see, Figure 8, that it only returns rows that match the values of the input column. The fix is to replace ‘Sales’ by AllSelected(‘Sales’) as the first parameter to the Summarize function, after which, we get the desired output, Figure 8, as well as the correct intermediate table of step 1, Figure 9.

Figure 6.
Figure 7.
Figure 8.
Figure 9.

Today, we studied by examples the table version of the EvaluateAndLog function. Key takeaways from today’s lesson are:

  • Both scalar and table expressions are reevaluated for each unique combination of values from the input columns therefore can potentially generate multiple values or tables in a single query.
  • EvaluateAndLog imposes a default limit of 10 rows per table but includes a “rowCount” property in the JSON document to inform users of the true size of the table. Since the table is not sorted, the 10 rows are randomly selected from user’s perspective.
  • EvaluateAndLog imposes a limit of roughly 1,000,000 characters per JSON string but will keep the final output a valid JSON document.
  • EvaluateAndLog inserts a special “notice” JSON property when its output is truncated due to the size limit.

External Links

Download the latest version of Power BI Desktop if you want to try out today’s examples.

The pbix file used in today’s examples can be downloaded from here.

The current version of DAX Debug Output is here. You must have admin rights on the machine and agree to install the software from an unknown publisher. You download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

UNDERSTAND THE OUTPUT OF EVALUATEANDLOG FUNCTION OF TABLE EXPRESSIONS

Understand the Output of EvaluateAndLog Function of Scalar Expressions

This is the second article in a series on the introduction to the EvaluateAndLog function. As I explained last time, EvaluateAndLog can be used to print to the debug output the intermediate result of any DAX expression. In this article, we will learn how to interpret the output of the function. We first study the output of the function when the first argument is a scalar expression, which returns a single value, be it a number, a string, a datetime, etc. Next week we will cover the output when the first argument is a table expression.

All DAX expressions used here are based on a simple 3-table model shown in Figure 1. The tables and data are imported from one of Microsoft’s AdventureWorksDW sample databases. A link to the pbix file used for the examples is provided at the end of the article.

Figure 1.

Let’s start with a simple measure, [Measure1], and add it to a table visual, Figure 2.

[Measure1] = EvaluateAndLog(Count(‘Product'[ProductKey]), “Measure1”)

Figure 2.

In DAX Debug Output, there are two DAXEvaluationLog events, as shown in Figures 3 and 4. It means that the expression is calculated twice, once, Figure 3, for each color of the product, and the other, Figure 4, for the value of the total row. In case you wonder whether the two calculations are independent of each other, the answer is yes. Even though DAX Engine can derive the total value by simply adding all the per-color values, that’s not the correct way of calculating total values in the general case, e.g. when the aggregation function is a non-additive one such as DistinctCount. That being said, DAX Engine does have the ability to eliminate the separate calculation of the total value when the aggregation function is additive, e.g. Sum. But in the case of [Measure1], the presence of the EvaluteAndLog function caused DAX Engine to choose a less optimal but more general execution plan. This is the key reason that Microsoft enables the function only in Desktop for debugging purposes but not in production environments and why we advice users to remove obsolete instances of EvaluateAndLog in their DAX expressions whenever possible. When Microsoft is ready to enable the function in Power BI services, the product team will give users full control over the behavior of the function in different environments. By the way, whether or not the EvaluateAndLog function disables a particular optimization in DAX Engine is subject to changes at the discretion of the product team. We will discuss more about the interactions between the EvaluateAndLog function and various DAX optimizations in a future article.

Figure 3.
Figure 4.

If you compare Figures 2 and 3 carefully, you can see that the numbers in both figures match but the order of product colors is different. That’s because DAX Engine calculates the values of [Measure1] first and then sort the rows in a separate step afterwards. The EvaluateAndLog function records the results when the values are calculated but before the output rows are sorted. Luckily, you can click any header in the data grid of DAX Debug Output to sort the rows by that header.

If you inspect the output of EvaluateAndLog in JSON format, you can see a property named “inputs”, the value of which is a list of column names. There is also a “data” property whose value is a list of “input” and “output” pairs. The value of the “input” property is a list of values corresponding to the columns in the “inputs” property. The value of the “output” property is the calculated value of the DAX expression for the given “input” values. DAX Debug Output uses different header colors to help users tell apart input and output columns in the data grid. But what are those input columns? Where do they come from? The answer is that they are the set of columns in the evaluation context that affect the results of the calculation and that the columns generate multiple combinations of unique set of values, each of which would cause the scalar expression to calculate a new value in the intermediate result.

To better understand input columns, let’s change the visual to a matrix and then add the [Education] column from the ‘Customer’ table to Columns, Figure 5, If you examine the DAXEvaluationLog events from the matrix query, you will see that they stay the same as in Figures 3 and 4. That’s because [Measure1] counts the [ProductKey] column in the ‘Product’ table therefore is not affected by columns from the ‘Customer’ table. As a result, the ‘Customer'[Education] column is not counted as an input column to the Count(‘Product'[ProductKey]) expression although it is a part of the matrix query. What if we add a filter ‘Product'[Size] = “L” to the visual? There is no change to the number of rows in the output of the DAXEvaluationLog events even though the filter on the ‘Product'[Size] column does change the values of the calculation. As a rule of thumb, input columns most likely come from columns added to visual axes or columns from table parameters of DAX iteration functions.

Figure 5.

Let’s use a more complex example to see how we can leverage the EvaluateAndLog function to solve one of the puzzles often encountered by students of DAX. We define a measure, named [Measure2], to calculate the average number of unique products purchased on each day.

[Measure2] = AverageX(Values(‘Sales'[Order Date]), EvaluateAndLog(DistinctCount(‘Sales'[ProductKey]), “daily unique products”))

When added to a card visual, Figure 6, the value of the calculation doesn’t look right. Even without examining the data in the ‘Sales’ table, it’s highly suspicious for an average calculation to return a perfect whole number.

Figure 6.

Before we inspect the intermediate result labeled “daily unique products”, let’s first predict what we should see. For the AverageX function to calculate an average value, it needs a list of numbers, one per day because the first argument is a single-column table of distinct order dates, therefore we should see a list of dates and corresponding numbers of counts. But the actual output, Figure 7, is surprisingly a single value 158. It tells us that the expression, DistinctCount(‘Sales'[ProductKey]), doesn’t depend on any input column, in particular not the ‘Sales'[OrderDate] column, therefore it returns the same value for all the rows of the table expression Values(‘Sales'[OrderDate]). That explains why the output value is a whole number as the average of the same number, regardless the number of dates, is the number itself.

Figure 7.

For those who already understand the difference between row context and filter context, they know this is where the famous Calculate function is needed to turn row context into filter context. We fix the bug in the DAX expression by wrapping the Calculate function around the DistinctCount function. Afterwards, DAX Debug Output displays the expected list of values per day, Figure 8, and the card visual, Figure 9, shows a more reasonable average value of 21.17.

[Measure2] = AverageX(Values(‘Sales'[Order Date]), EvaluateAndLog(Calculate(DistinctCount(‘Sales'[ProductKey])), “daily unique products”))

Figure 8.
Figure 9.

This quick debugging exercise shows us that you can use the EvaluateAndLog function to see the mysterious evaluation context in action. Input columns are an important, and often difficult to understand, part of the evaluation context. Now we have a way to visualize them.

DAX Debug Output colors the headers of the input columns orange and the header of the output column maroon. When there are multiple input columns, DAX Debug Output alternates dark and light orange colors on their headers, Figure 10.

Figure 10.

Today, we learned how to interpret the output of the EvaluateAndLog function when its first argument is a scalar expression. Key takeaways from today’s lesson are:

  • A scalar expression is calculated against the combinations of values of a set of input columns.
  • An input column is a column in the evaluation context that produces a set of distinct values, each of which causes the scalar expression to calculate a new value.
  • An input column is typically one from the visual axes or is added by a DAX iteration function.
  • A pure filter column in the evaluation context is not considered an input column even if it changes the result of the calculation because its distinct values don’t cause the scalar expression to be evaluated multiple times.
  • For a given scalar expression, when there is no input column, you will see a single value in the debug output; when there are input columns, you will see multiple values, one per unique combinations of the input column values.
  • Intermediate results are not sorted. You can use DAX Debug Output’s sorting feature to help see the results.
  • For a given query, the same EvaluateAndLog expression can generate multiple DAXEvaluationLog events, one per unique combinations of input columns as specified in the query.

External links

  • The pbix file used in today’s examples can be downloaded from here.
  • The current version of DAX Debug Output is here. You must have admin rights on the machine and agree to install the software from an unknown publisher. You download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

Understand the Output of EvaluateAndLog Function of Scalar Expressions

Introduce the DAX EvaluateAndLog Function

There is a hidden gem in the August 2022 release of Power BI Desktop: a secret new DAX function EvaluateAndLog that is not listed in the feature summary of the official blogpost. Yet this function will become a great learning and debugging tool for anyone who writes DAX code in the years to come. Microsoft has a good reason not to announce this function publicly yet. There are a couple of known bugs and, more importantly, the work for intellisense support in the DAX formula bar has not been completed. Therefore if you try the function today, you won’t see it on the suggestion list and, if you type the function name yourself, you will see the red squiggly line underlying the function. But the expression should calculate just fine in the latest version of Desktop. Ironically, the lack of intellisense support is really a blessing in disguise in my opinion. I’ll explain why I think so at the end of this blogpost.

So what is EvaluateAndLog for? Let’s say you define a measure [Profit] with the following expression and add it to the report.

[Profit] = Sum(‘Sales'[SalesAmount]) – Sum(‘Sales'[Cost])

You want to double check if the numbers you see on the report are correct. In the past, you would add the two columns [SalesAmount] and [Cost] as values to your report temporarily to inspect the numbers. Now there is a new way without changing your visual: you can temporarily change your measure definition to

[Profit] = EvaluateAndLog(Sum(‘Sales'[SalesAmount])) – EvaluateAndLog(Sum(‘Sales'[Cost]))

The measure will return the same result so your visual doesn’t change but the modified expression will cause the DAX Engine to send the intermediate results of both Sum(‘Sales'[SalesAmount]) and Sum(‘Sales'[Cost]) to a secret channel, I call it “debug output”, for you to inspect. In other words, you can use EvaluateAndLog to send the intermediate result of any portion of your DAX expression to debug output without changing the result of the original expression. Let that sink in: any intermediate result of your DAX expression. Can you imagine how powerful this is? Not only can you see the values of the two individual Sum’s before the subtraction, but you can also use EvaluateAndLog(‘Sales’) to inspect some of the raw data that feeds into the two Sum’s!  I’ll explain how to do it in a future blogpost as this feature is too big to be covered in a single one.

At this point, some people might have realized that EvaluateAndLog is like the “print debugging statement” in other programming languages. As a professional software engineer my entire career, I have witnessed firsthand how the debugging tools and practices had advanced leaps and bounds in the past decades. In spite of all of the progress, “print debugging statement” still accounts for most actual debugging activities in the real world because no other tool can replace the convenience and versatility of a simple “print” and its many variations in different programming languages. So EvaluateAndLog opens the door to a whole new world of DAX learning and debugging.

Now that you are all hyped up, let’s start with the syntax of EvaluateAndLog:

You can wrap the function around any portion of your DAX expression that returns either a scalar value or a table. When the expression is calculated, DAX Engine sends the intermediate result in JSON format as the content of the TextData column of an Analysis Services Trace Event, called “DAX Evaluation Log”.

Now that we have covered the mechanics of EvaluateAndLog, the challenge is how to retrieve and visualize the output of the function. I believe that, in due time, Power BI Desktop and other third-party tools will add the ability to display the output of the function in a user-friendly fashion. Before it happens, if you must use an officially supported tool from Microsoft, you can download and install the Microsoft SQL Server Profiler which is a component of the free SQL Server Management Studio (SSMS). Afterwards, you have to find out the port number of Power BI Desktop, connect SQL Server Profiler to Desktop, subscribe to the DAX Evaluation Log event and other relevant events, and then you can see the output of the function. Below is a screenshot of SQL Server Profiler displaying the output of EvaluateAndLog.

Going the SQL Server Profiler route is neither easy nor user-friendly. To deliver the benefits of the new way of debugging DAX to the Power BI community sooner, I rolled up my software engineer sleeves, learned some basic UI programming, developed a Windows desktop application on my own. You can download the free software, named DAX Debug Output, from here and install it on the same machine as Power BI Desktop. You need to have admin rights on the machine and agree to install the software from an unknown publisher as this is my own project, not a Microsoft product. Below is an annotated screenshot of the main window of the application. I will explain how the application works in my next blogpost.

EvaluateAndLog is currently enabled only in Power BI Desktop but disabled in other Power BI products. The function doesn’t do anything in the environments where it is disabled other than passing through the result of its child expression. When enabled, the function may disable some DAX optimizations so it’s imperative that you delete the function from your DAX code after you are done with debugging. This is where the red squiggly line due to the current lack of intellisense support comes in handy.

It’s a convenient way to remind you that you should fix the expression before publishing it to Power BI service. Although the function has no impact in the service today, it could cause performance degradations once Microsoft enables the function in more production environments.

We have barely scratched the surface of EvaluateAndLog but this is already a long post. I plan to write more posts on this subject in the following weeks. In the meantime, if you can’t wait to play with the function, here is the link to DAX Debug Output again. Download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

https://github.com/pbidax/DAXDebugOutput/releases/tag/v0.1.0-beta

Introduce the DAX EvaluateAndLog Function

DAX “IN” Operator

It’s hard to keep a secret nowadays. Although Microsoft has not officially announced it, people have discovered on their own that DAX has a new IN operator. Let me explain why the Power BI team have not made this public yet.

  • The IntelliSense work has not completed yet so although a DAX expression containing the new syntax will be accepted by the engine, it will be red squiggly underlined in the DAX editor.

redsquigglyinoperator

  • Users cannot use the new syntax directly as a setfilter argument to the CALCULATE family of functions. So the following syntactic shortcut is not yet accepted by the DAX engine.
CALCULATE([Total Sales], 'Product'[Color] IN { "Red", "Green" }) // Not working right now.

You have to use the full version instead:

CALCULATE(
    [Total Sales],
    FILTER(ALL('Product'[Color]), [Color] IN { "Red", "Green" })
)

If you are unfazed by these limitations which should be addressed early next year, feel free to enjoy the new convenience in DAX today.

'Product'[Color] IN { "Red", "Green" }

or

('Product'[Color], 'Customer'[Country]) IN { ("Red", "Russia"), ("Italy", "Green") }

By the way, since DAX is a functional language, there is an equivalent new function CONTAINSROW:

CONTAINSROW({ "Red", "Blue" }, 'Product'[Color])

or

CONTAINSROW(
    { ("Red", "Russia"), ("Italy", "Green") },
    'Product'[Color],
    'Customer'[Country]
)

The examples above show off not just the IN operator, but also a new way to construct a table expression from a list of values or tuples. As with other DAX functions and operators, the new syntax is fully composable as illustrated in the following examples of valid DAX expressions:

'Product'[Color] IN VALUES('Product2'[Color])
{ VALUES('Date'[Year]) - 1, VALUES('Date'[Year]), VALUES('Date'[Year]) + 1 }

warningAlthough the following syntax is also accepted by the DAX engine today, it is not by design so you should only use curly braces {} instead of parentheses () to enclose a list of values or tuples, the latter of which should be enclosed in ().

'Product'[Color] IN ( "Red", "Green" ) // Do not do this. Replace () by {} instead.

 

DAX “IN” Operator

Show Text Attributes in Tooltips

PowerBI allows users to add measures to tooltips. Many users want to add text columns as well. Today if you drag a text column to the tooltips, a count measure is created which is not very useful. Before PowerBI enhances the tooltip feature by allowing users to drag text columns to tooltips to show the text contents, you can make it happen today by creating DAX measures. Let’s say you want to add ‘Table'[TextColumn] to tooltips. Create a measure as below.

[TextColumn Tooltip] = TOPN(1, VALUES(‘Table'[TextColumn]), [TextColumn], ASC)

Now you can drag the newly created measure to tooltips. I have created a sample model with such a tooltip based on AdventureWorksDW database. As you can see in the picture below, when users hover over an individual product, the tooltips include the [Description] column for that product.

productdescriptiontooltip

 

Show Text Attributes in Tooltips

Connect any number of tables together via a common column

One of the key powers of Power BI is the ability to bring data from different sources into a single model and then join them together by creating relationships among tables to perform analysis and create reports across all the data in the unified model. But a lot of people are confused when they run into one of the errors below while trying to create a relationship between two tables inside Power BI Desktop:

  • We cannot create a relationship between ‘Table A'[Column X] and ‘Table B'[Column Y]. This could be because there is missing intermediate data to connect the two columns.
  • You can’t create a relationship between these two columns because one of the columns must have unique values.
  • You can’t create a relationship between these two columns because the ‘Column X’ column in the ‘Table A’ table contains null values.

Those who are familiar with relational database theories can appreciate the sound principles underlying those error messages. But many users of Power BI may be at a loss of what to do next. While Power BI Desktop can make creating relationships easier in the future by helping users resolve those errors automatically, there are things you can do today to connect the two tables together. The same methodology can be generalized to connect any number of tables together through a common column.

Assume a user has imported five different tables containing various information and transactions about products, all of which identify each product by a [Product] column, but the user is unable to create relationships among the tables as he or she is blocked by the errors listed above. The user wants to connect all five tables together via the [Product] column as illustrated by the conceptual schema in Figure 1.

 

Figure1
Figure 1.

To do so we need to use DAX to create a calculated table in the middle so that the five original tables can each create a relationship between itself and the calculated table. I have created a mock model in this pbix file to demonstrate the technique.

Create a calculated table by clicking the New Table button as shown in Figure 2.

ClickNewTable
Figure 2.

The DAX expression is

All Products =
FILTER(
DISTINCT(
UNION(
DISTINCT(Table1[Product]),
DISTINCT(Table2[Product]),
DISTINCT(Table3[Product]),
DISTINCT(Table4[Product]),
DISTINCT(Table5[Product])
)
),
NOT(ISBLANK([Product]))
)

This DAX formula takes the unique values from the [Product] column of each of the five tables, unions them together, removes duplicate values, and finally removes the BLANK value if there is one. You can use the same formula in any model by changing the number of tables and the names of tables and columns. Now you can create five relationships between each of the five original tables and the newly created calculated table. Figure 3 shows the final result in the Relationships view. You can see that every relationship allows filters to flow both ways as indicated by  bidi.

FiveTablesConnected
Figure 3.

Now that all fields in the model are connected, you can freely add any column from any table to your report, slice and dice to your heart’s content, things will just work as if by magic. Moreover, the calculated table in the middle will automatically recalculate each time you refresh data in the original tables.

 

Connect any number of tables together via a common column

Simple Filter in DAX Measures

Today’s post is about different ways to set a simple filter in DAX. Although there are plenty of writings covering these patterns, I still see DAX authors get confused on a regular basis. In this post, I am going to clarify the differences among the patterns. The examples are based on a pbix file created from the Adventure Works database that you can download from here. It helps if you have learned about filter context before but you should still follow the general idea outlined here without those knowledge.

Setting a basic filter is one of the first things people do when writing a DAX measure. Let’s say you want to write a measure to calculate the total sales on red products. The two common DAX patterns I saw people use are:

Pattern 1. Calculate(Sum([Sales Amount]), ‘Product'[Color] = “Red”)

Pattern 2. Calculate(Sum([Sales Amount]), Filter(‘Product’, [Color] = “Red”))

Both approaches work as expected in simple models and reports, but from time to time people run into troubles on such simple DAX measures and have difficulty figuring out the underlying reason.

Although Patterns 1 and 2 look very similar, they actually have different semantics in DAX. The differences become obvious when you add the [Color] column and both measures to a flat table chart as shown in Figure 1. While Pattern 1 returns sales of red products in all rows, Pattern 2 returns blank values in all rows except for the red color row and the grand total row.SimpleFilter_Figure1

Under the hood, Pattern 1 is simply a syntax shortcut for the following DAX expression:

Pattern 1.1. Calculate(Sum([Sales Amount]), Filter(All(‘Product'[Color]), [Color] = “Red”))

The key difference is the implied All function in Pattern 1. Because of the All function, Pattern 1 does nothing more than replacing any prior filter on the [Color] column by the new filter [Color] = “Red”, on the other hand, the lack of All function in Pattern 2 means the ‘Product’ table is first filtered by any existing filters from the filter context and then the remaining rows are further filtered by the predicate [Color] = “Red”. This explains why you see those blank values in Figure 1: Since the ‘Product’ table is first filtered down to the rows matching the color of the current row in the flat table and then further filtered by the predicate [Color] = “Red”, the net result is empty if a row has a color other than red; the grand total row has no filter coming from the flat table therefore only the predicate takes effect.

In addition to the unwanted results in Figure 1, Pattern 2 also suffers from an unintended performance hit even when it delivers correct results. For example, if a report has a page filter on ‘Date'[Calendar Year] = 2008, the ‘Product’ table will be filtered down to all products sold in 2008 before the predicate [Color] = “Red” is applied, that means DAX engine needs to scan the ‘Internet Sales’ fact table just to calculate the Filter function itself. Few people would want to scan a fact table only to set a filter on a dimension table.

The All function allows a column name or a table name as its argument. So the following pattern also achieves the same effect as Pattern 1 does.

Pattern 3. Calculate(Sum([Sales Amount]), Filter(All(‘Product’), [Color] = “Red”))

The difference between Pattern 3 and Pattern 1 is that Pattern 3 replaces filters on all columns from the ‘Product’ table in the filter context but Pattern 1 only replaces prior filter on the [Color] column. As a result, if you have several columns from the ‘Product’ table in the report, all of them will be removed from the filter context, the only remaining filter on the ‘Product’ table is [Color] = “Red”, as shown in Figure 2. As you can see in the picture, the values of measure [m – Sum Filter Pattern 3] is not affected by the filter [Class] = “L” from the flat table.

SimpleFilter_Figure2

Today we have discussed the differences among three ways of setting a simple filter in DAX. Although all three patterns may work in most cases and any one of them may be the right solution for a particular report, if you are not sure which one to use, I would start with Pattern 1, or its equivalent Pattern 1.1, due to its minimal side effect.

 

Simple Filter in DAX Measures

Handle Complex Relationships inside Power BI Desktop

Introduction

Modern self-service BI products like Power Pivot in Excel and Power BI Desktop are often used to mash up data from different sources. Once multiple tables are imported into the model, users create relationships among them so that they can easily slice and dice data using any column from related tables. For those users of Power BI Desktop who only create relationships in the Manage Relationships dialog box, Figure 1, or connect columns in the Diagram View, Figure 2, they often run into constraints which block them from creating the relationships they need. For example, users cannot create a relationship by joining more than one column from each table. In this blog post, I am going to show you a way to work around such constraints inside Power BI Desktop.

Figure 1.
Figure 1.

Figure 2.
Figure 2.

In Power BI Desktop, users can only create a relationship between a pair of tables by joining two columns, one from each table. Furthermore, at least one of the columns cannot contain any duplicate values or blank values; otherwise, you get errors seen in Figure 3 or Figure 4.

Figure 3.
Figure 3.

Figure 4.
Figure 4.

Users who are familiar with relational database technologies can recognize that relationships are modeled after foreign key to primary key constraints. In relational databases a primary key column uniquely identifies each record in the table, must contain unique values, and cannot contain NULL values.

If you have already imported tables into the model but cannot create relationships due to the above requirements, you are typically told to work around the constraints by

  • creating a calculated column to combine multiple join columns into a single one so you can create relationships between two calculated columns;
  • introducing a third table that contains unique and non-blank values of the join columns and then creating two relationships between each of the original tables and the third table so that the original tables are related indirectly.

The introduction of bi-directional cross-filtering and calculated tables in Power BI Desktop has made both techniques easier to implement than in Power Pivot. Today I want to show you another way of working around the constraints: sometimes you can join two tables inside Query Editor and then bring a single table with all the columns you need into the model.

The test data for the two examples I am going to show can be downloaded from here. You can also download the final pbix file after we have completed the two examples from here.

First Example

In our first example, we want to create a relationship between the two [ProjectID] columns as shown in Figure 5. [ProjectID] is unique in Table1 except sometimes it is blank when a program has been created without any project yet.

Figure 5.
Figure 5.

Assume we want to perform some cost analysis on each program. Although we cannot create a relationship between Table1 and Table2 due to the blank value, it’s sufficient if we can bring [ProgramID] into Table2. Here is how to achieve this in Power BI Desktop.

Once you have selected Table1 and Table2 in the Navigator, click the Edit button, as shown in Figure 6.

Figure 6.
Figure 6.

Select Table1 and then click the Properties button in the ribbon as shown in Figure 7.

Figure 7.
Figure 7.

Uncheck “Enable load to report” and then click OK, see Figure 8. We bring in Table1 just so we can merge its [ProgramID] column into Table2, afterwards, it has served its purpose therefore no longer needed when we build reports.

UncheckEnableLoadToReportTable1
Figure 8.

Select Table2 in Query Editor and then click the Merge Queries in the ribbon, see Figure 9.

MergeQueries
Figure 9.

In the Merge dialog box, click [ProjectID] under Table2, select Table1 in the dropdown box, click [ProjectID] under Table1, select “Full Outer (all rows from both)” as Join Kind, and then click OK. We have chosen “Full Outer” just in case we want to build reports including programs without any projects yet.

Figure 10.
Figure 10.

A new column has been added to Table2 where each row shows Table. The previous step only makes columns of Table1 available in Table2, but we still have to actually pick the individual columns we care about. Click the button at the upper right corner as shown in Figure 11.

Figure 11.
Figure 11.

Since we just want to bring [ProgramID] into Table2 with the same name, keep only ProgramID checked, uncheck the other two boxes and then click OK, see Figure 12.

Figure 12.
Figure 12.

Table2 now has the [ProgramID] column, we can close Query Editor, shown in Figure 13, and start building reports.

Figure 13.
Figure 13.

Second Example

As a second example, we examine Table3 and Table4 as shown in Figure 14. We want do some analysis on Table3 but we also need the [Principal Amount] column from Table4.

Figure 14.
Figure 14.

We cannot create a relationship directly since we must join by both the [Entity Code] column and the [Fiscal Month] column. Although we can create a calculated column on each table to combine [Entity Code] and [Fiscal Month], it’s an artificial and strange thing to do. Instead we will leverage Query Editor’s flexible ability to merge two tables. The steps are identical to those in the first example except that we pick two columns from each table in the Merge dialog box. You can select multiple columns from a single table by holding the Ctrl button and then clicking the columns you want. Note that the order of clicking is important as it determines which pairs of columns will be matched during the merge. The UI conveniently labels the columns with numbers in the order you click, as shown in Figure 15.

Figure 15.
Figure 15.

Conclusion

Today I described a way to work around limitations of relationships by merging tables in the Query Editor before importing them into a model. Strictly speaking, this technique is not the same as creating relationships but, if used correctly, often gives us what we need. Users who are familiar with Power Query Formulas know that they can join tables in more flexible and powerful ways than allowed by the UI. By packaging two powerful engines, one from Power Query and one from Power Pivot, into a single product, Power BI Desktop gives users maximum versatility to shape their data any way they want.

Handle Complex Relationships inside Power BI Desktop

Use Calculated Table to Figure out Monthly Subscriber Churn

Power BI Desktop 2015 SU09 introduced a really cool new feature called calculated tables. Those of you who are already familiar with calculated columns will share my excitement since we now have a flexible, intuitive, and visual way of solving common business problems which were very hard, if not impossible, to tackle previously. Today I am going to show you how to leverage calculated tables to display monthly total number of subscribers starting from a table listing each user account’s start and end dates. The sample data is taken from Kathi Kellenberger’s blog post T-SQL Window Functions Speed Phreakery. If you are a SQL junkie, you should read Kathi’s blog to learn a few clever ways of writing ever faster T-SQL queries to calculate monthly total subscriber numbers from the original data. A sample of the source data is shown in Figure 1.

Registration_ID FirstName LastName DateJoined DateLeft
15 Norma Lynch 1/4/2004 8/20/2008
19 Bobbie Brennan 1/5/2004 9/20/2006
22 Terra Cabrera 1/5/2004 4/14/2007
25 Alicia Navarro 1/6/2004 12/3/2007
55 Lauren Everett 1/10/2004 12/5/2005

Figure 1.

Our goal is to find out the total number of subscribers at the end of each month taking into account the total number of subscribers joined and left from the beginning of time to that month. Figure 2 shows the desired result in the first ten months of 2004.

Month PeopleJoined PeopleLeft Subscribers
1/31/2004 167 0 167
2/29/2004 133 0 300
3/31/2004 144 0 444
4/30/2004 155 0 599
5/31/2004 156 0 755
6/30/2004 136 1 890
7/31/2004 138 0 1028
8/31/2004 132 0 1160
9/30/2004 163 0 1323
10/31/2004 135 0 1458

Figure 2.

Power BI Desktop is a very versatile tool, there is more than one way of transforming the original table into the result we want. I could perform all the calculations and transformations in the Query Editor window using operations enabled by UI plus a little M programming language. But today I am going to show you how to do this in the Data Tools windows using DAX programming language. DAX runs on top of the blazingly fast column-based, in-memory Vertipaq engine, you are likely going to get the best possible performance playing with your data this way. Besides, in the current release of Power BI Desktop, DAX formula bar comes with powerful IntelliSense which makes authoring a DAX expression so much easier. For full disclosure, I have been a key contributor to DAX development from the beginning therefore I have the ulterior motive of promoting the language I helped created. Unlike Kathi in her blog, I am not trying to develop the fastest possible solution through clever language trickeries, instead, I am going to show you how to arrive at the final solution in a step-by-step fashion so you can see the intermediate results after each step.

If you want to try out the steps on your own from the very beginning and have a SQL Server readily available, please download the script SubscriptionList.sql from Kathi’s blog page to create sample data table in your SQL Server and then import the data into Power BI Desktop, otherwise, I have created a PBIX file, MonthlySubscriberCount1.pbix, that has already imported the data into a table ‘Registrations’ for your convenience. While I provided in this blog all DAX expressions for you to copy and paste, I encourage you to type them yourselves to learn how easy it is to enter DAX expressions with the help of IntelliSense.

First, let’s see the original data in the Data Tools window by clicking the Data button, see Figure 3.

ClickDataView

Figure 3.

In the table ‘Registrations’, we see a list of individual accounts and the dates they joined and left the subscription in the two columns [DateJoined] and [DateLeft]. The data in the table should be similar to Figure 1. While each account must have a real date for the [DateJoined] column, its [DateLeft] column can be empty if the account is still active. Since we want to summarize data at month level, let’s add two calculated columns to the table to show the months in which each subscriber joined and left. We will use DAX EOMONTH function to calculate the last day of the month corresponding to the [DateJoined] or [DateLeft] columns.

ClickNewColumn

Figure 4.

Click the New Column button as shown in Figure 4, and then enter the following DAX formula to add a calculated column called [MonthJoined].

MonthJoined = EOMONTH(Registrations[DateJoined], 0)

Now repeat the steps above to add another calculated column [MonthLeft].

MonthLeft = EOMONTH(Registrations[DateLeft], 0)

After we added the two columns, we have in the table both the date and the month (last day of the month to be precise) for which a subscriber joined or left, see Figure 5. We can now find out the total number of subscribers joined or left in each month.

AddedMonthJoinedLeft

Figure 5.

Click the New Table button as shown in Figure 6, and then enter the DAX formula shown below to add a calculated table called ‘MonthlyJoined’.

ClickNewTable

Figure 6.

MonthlyJoined =
SUMMARIZECOLUMNS(
	Registrations[MonthJoined],
	"CountJoined", COUNTROWS(Registrations)
)

Now repeat the steps above to add another calculated table ‘MonthlyLeft’.

MonthlyLeft =
SUMMARIZECOLUMNS(
	Registrations[MonthLeft],
	"CountLeft", COUNTROWS(Registrations)
)

We get two new tables in the model as shown in Figures 7 and 8 respectively. The two tables are also listed under Fields on the right side of the window, see Figure 9.

MonthlyJoined

Figure 7.

MonthlyLeft

Figure 8.

TwoNewTablesInFields

Figure 9.

The formulas of the two calculated tables we just created used two DAX functions, SUMMARIZECOLUMNS and COUNTROWS. SUMMARIZECOLUMNS is an extremely versatile and powerful function. It will take a few blog posts just to explore that function alone. But for today’s purpose, all you need to know is that if you simply supply the columns and calculations you care about as arguments to the function, it will return a new table with those columns and calculated values in the output. While the calculations can be arbitrarily complex DAX expressions, for today’s blog post we only need basic aggregations such as summing the values of a column or counting the number of rows in a table. Note that the columns listed as arguments in SUMMARIZECOLUMNS can be any column from any table in the current model, DAX engine will magically stitch them together by following the relationships defined between tables. If you look at the DAX expression for the ‘MonthlyJoined’ table, we supplied as input arguments one column ‘Registrations’[MonthJoined] and one calculation COUNTROWS(‘Registrations’). SUMMARIZECOLUMNS returns a table of columns. If an argument is a column, it automatically keeps the column name in its output, therefore, we only need to supply new column names for output columns based on calculations. In this case, we picked the name “CountJoined” for the calculation.

warningThe new column names for calculations are enclosed in double quotes, not square brackets.

COUNTROWS function simply returns the number of rows in its table argument. The magic in our example is that COUNTROWS doesn’t simply return a static count of all rows in the ‘Registrations’ table but rather a dynamic count of rows corresponding to the current value of the [MonthJoined] column or the [MonthLeft] column. If you want to dig deeper into how this magic happens, it will take some really technical blogs to explain so it’s for another time. But for most users, just list the columns and aggregations you want to see in the output, thing will work out just right.

Now if you switch to the ‘MonthlyLeft’ table, you should see a row with empty value for the [MonthLeft] column as shown in Figure 10, that’s because many subscribers never left so their [MonthLeft] value is blank.

BlankMonthLeft
Figure 10.
Let’s remove that row by modifying the definition of the table as shown below with the changes highlighted. The changes should be self-evident and expose one power of DAX – that DAX functions are composable. We filter the output table of SUMMARIZECOLUMNS row by row and remove the row with a blank value for the [MonthLeft] column.

MonthlyLeft = 
FILTER(
	SUMMARIZECOLUMNS(
		Registrations[MonthLeft],
		"CountLeft", COUNTROWS(Registrations)
	),
	NOT(ISBLANK(Registrations[MonthLeft]))
)

Although it’s great to see count of subscribes joined or left per month, the information is spread in two separate tables, we need to somehow combine the two tables into a single one so we can see the two counts side by side for a given month. For those of you with SQL background, you are likely thinking about searching for a DAX join function by now. While it is possible to write a DAX expression to join the two tables together, let me show you a different way that is faster and arguably more intuitive. We are going to leverage a fundamental feature of Power BI Desktop: creating relationships between tables.

Click Manage Relationships as shown in Figure 11, then click the New button as shown in Figure 12. In the Create Relationship dialog box, select table ‘MonthlyJoin’ and column [MonthJoined] as the first end of the relationship, select table ‘MonthlyLeft’ and column [MonthLeft] as the second end of the relationship. Click Advanced options to check whether Cross filter direction is set to Both by default and if not change it to Both as shown in Figure 13. Click the OK button to accept the new relationship and then click the Close button to close the Manage Relationships dialog box.

ClickManageRelationships

Figure 11.

ClickNewButton

Figure 12.

CreateRelationship

Figure 13.

Now we are ready to create yet another calculated table to bring the columns [CountJoined] and [CountLeft] together. Click New Table, shown in Figure 6, and enter the following DAX expression to create a new calculated table called ‘MonthlyJoinedAndLeft’. The newly created calculated table is shown in Figure 14.

MonthlyJoinedAndLeft =
SUMMARIZECOLUMNS(
	MonthlyJoined[MonthJoined],
	"CountJoined", SUM(MonthlyJoined[CountJoined]),
	"CountLeft", SUM(MonthlyLeft[CountLeft])
)

MonthlyJoinedAndLeft

Figure 14.

Since the new table only has a single month column, it no longer makes sense to call it [MonthJoined], so let’s just rename it to [Month]. We do that by expanding the ‘MonthlyJoinedAndLeft’ table on the right and then right click the [MonthJoined] column, select Rename from the dropdown menu as shown in Figure 15.

RenameColumn

Figure 15.

We are almost there. Let’s add a column to the newly created table to show the difference between the number of subscribers joined and the number that left. The DAX formula is:

CountNetJoined =
MonthlyJoinedAndLeft[CountJoined] - MonthlyJoinedAndLeft[CountLeft]

The table now looks like Figure 16.

AddedCountNetJoined

Figure 16.

We are ready to count the number of active subscribers by the end of each month. The math is to add up all net joined subscribers up to the current month. The DAX formula for the calculated column is

Subscribers =
SUMX(
	FILTER(
		MonthlyJoinedAndLeft,
		[Month] <= EARLIER([Month])
	),
	[CountNetJoined]
)

This formula is almost self-explanatory except for that mysterious EARLIER function. Without going into the gory details, it’s enough to say that for each row in the table the formula starts a new scan of the ‘MonthlyJoinedAndLeft’ table, keeps only those rows whose value of the [Month] column is on or before the value of the same column of the current row, and then sums up all the values of the [CountNetJoined] column. The final result is shown in Figure 17 and if you have tried out the T-SQL queries in Kathi’s blog you can verify that our results match the T-SQL ones.

AddedSubscribers

Figure 17.

Last but not least, we can’t call it Power BI without a nice chart to show off our calculations. Since the source data is generated, if we plot total subscribers per month we will see a boring linear growth of subscribers over time. So I created an Area Chart showing monthly count of subscribers joined and left respectively. The chart is shown in Figure 18, and since Power BI is very easy to use, I’ll leave it to you to figure out how to create the chart on your own. You can download the final MonthlySubscriberCount2.pbix file to examine all the calculated tables and their DAX formulas.

MonthlyChart

Figure 18.

Use Calculated Table to Figure out Monthly Subscriber Churn