INTRODUCING DAX WINDOW FUNCTIONS (PART 2)

This is the second installment in our two-part series on DAX window functions announced in the December 2022 release of Power BI Desktop. I will first answer the questions that I had received from the community about topics covered in the first article, and then go into more technical details about the implementations of these functions to help advanced users understand the performance characteristics of these functions when working with large volumes of data.

Just like last time, all examples will be based on the Adventure Works DW 2020 DAX sample model. The final pbix file that includes all DAX expressions used for this blog post can be downloaded from here.

THE TABLE TO BE SORTED CANNOT HAVE DUPLICATE ROWS

Several people had reported running into errors when trying the window functions on fact tables. Let’s look at an example by first adding a calculated table of three columns to the model that is defined by the following DAX expression, and then, add all three columns to a table visual as shown in Figure 1.

CalcTable = SUMMARIZECOLUMNS(
    'Customer'[Country-Region],
    'Date'[Fiscal Quarter],
    TREATAS({"Canada", "United States"}, 'Customer'[Country-Region]),
    "Quantity", [Total Quantity]
)
Figure 1.

Next we want to create a measure to return the [Quantity] value in the previous row. However, if you define the measure using the DAX expression below you’ll get the error in Figure 2 that the relation parameter, ALLSELECTED(‘CalcTable’), may have duplicate rows.

Prev Qtr = CALCULATE(
    SUM(CalcTable[Quantity]),
    OFFSET(
        -1,
        ALLSELECTED('CalcTable'),
        ORDERBY([Fiscal Quarter]),
        PARTITIONBY([Country-Region])
    )
)
Figure 2.

Two questions immediately come to mind:

  1. What can go wrong if the <table> parameter had duplicate rows?
  2. Since ‘CalcTable’ clearly has no duplicate rows, why did we get the error in the first place?

To answer the first question, please note that all window functions rely on sorted rows in each partition to determine the absolute or relative positions of the current row and the corresponding output row(s). If the table had duplicate rows, what would be the position numbers assigned to tied rows? Even if the DAX Engine assigns continuous position numbers to a set of tied rows, when the current row from the context matches the tied rows, which position number should the DAX Engine pick? One may argue that although the error is understandable if a current row is required, how come the INDEX function still raises the error even though it does not require a current row to determine its output row? Let us image a scenario where the index number points to one of the tied rows. Should the DAX Engine return just one row, or all the tied rows like what the TOPN function does? Due to these potential ambiguities, the DAX Engine does not allow duplicate rows in the table to be sorted, therefore, it needs a way to find out whether a table may have the problem.

Although the DAX Engine can analyze a DAX table expression to determine whether it has duplicate rows, it relies on metadata saved in the model to find out whether a model table, including calculated tables, has duplicate rows. That means all dimension tables do not have duplicate rows because they must have a primary key column as the endpoint of a relationship. But fact tables, or any standalone tables, are treated as potentially having duplicate rows. The DAX Engine does not attempt to detect row uniqueness at runtime because that may change after the next data refresh. Since our example truly has no duplicate rows and the underlying table is quite small, we could use the DISTINCT function as a workaround to inform the DAX Engine of no duplicate rows at the cost of performance overhead.

Prev Qtr = CALCULATE(
    SUM(CalcTable[Quantity]),
    OFFSET(
        -1,
        DISTINCT(ALLSELECTED('CalcTable')),
        ORDERBY([Fiscal Quarter]),
        PARTITIONBY([Country-Region])
    )
)

AVOID CIRCULAR DEPENDENCY ERROR IN CALCULATED COLUMNS

Last time, we said that the DAX Engine uses all the columns from the <table> parameter to determine the current partition and the current row. This semantics poses a special challenge when using the window functions to define calculated columns. That is because one of the most natural ways to specify the <table> parameter in a calculated column expression is simply using the hosting table name. For example, if I were to add a column to the ‘Customer’ table to return the previous customer within the same city, I could use the following expression.

Prev Customer = SELECTCOLUMNS(
    OFFSET(-1,
        'Customer',
        ORDERBY([Customer ID]),
        PARTITIONBY([City])
    ),
    [Customer]
)

It’s a calculated column expression, not a measure expression.

I used a shortcut syntax of the SELECTCOLUMNS function not yet supported by the formula bar so you will see a red squiggly line under [Customer] if you try the expression yourself but it is accepted by the DAX Engine.

Currently, the expression is accepted by the Desktop due to a product defect but may cause an error in the service where the bug fix has already been deployed or will be deployed soon. The problem is that the table reference ‘Customer’ includes all columns in the table. Since window functions are supposed to use all columns to calculate the current row, the values of the calculated column expression depends on the values of all columns of ‘Customer’, including itself, hence a circular dependency error! Even if the DAX Engine removes the current calculated column from consideration, so that only the rest of the columns from the table are taken into account, we will quickly run into circular dependency again if we were to add a second calculated column such as [Next Customer] because [Prev Customer] depends on all other columns including [Next Customer] while the latter depends on all other columns including the former. Before the product team introduces a less verbose option, it’s recommended that all calculated columns use the SELECTCOLUMNS function to explicitly list all the columns from the hosting table needed for the calculation. So we should modify the DAX expression of [Prev Customer] by using the shortcut version of SELECTCOLUMNS again:

Prev Customer = SELECTCOLUMNS(
    OFFSET(-1, 
        SELECTCOLUMNS('Customer', [CustomerKey], [Customer ID], [City], [Customer]),
        ORDERBY([Customer ID]),
        PARTITIONBY([City])
    ),
    [Customer]
)

You must always include the key column of the dimension table, [CustomerKey] in this case, to satisfy the no duplicate row requirement as described in the previous section.

A MORE ACCURATE DESCRIPTION OF THE APPLY SEMANTICS

If we recall from the last time, apply semantics, as outlined in the section WHAT IF THERE IS NO CURRENT PARTITION OR CURRENT ROW, refers to the way that the DAX Engine returns potentially more than one output when there is no clear current row or current partition from the context. Alberto Ferrari from the SQLBI team pointed out a flaw in the steps. Rest assured that the outlined steps were not the actual algorithm implemented in the DAX Engine but just an attempt to give users a mental model on how to interpret the result when apply semantics kicks in. Below is a second attempt to explain how apply semantics works:

  1. Find all possible rows of columns from <table> by taking into account the current evaluation context, e.g. apply all filters.
  2. For each row from #1, locate the current partition and then the current row within the partition, perform steps #3 and #4.
  3. Calculate the result row(s) as defined by the semantics of the window function.
  4. Output the row(s) as long as they have not been output already.

DEBUG WINDOW FUNCTIONS BY PRINTING INTERMEDIATE RESULTS

Window functions are among the more complex table functions. In the past, it is usually hard to figure out what goes wrong when a formula containing a table function doesn’t return expected results. Luckily, three new DAX functions, EVALUATEANDLOG, TOCSV, TOJSON, were released just in time to make it very easy to see the output of any table expression. For example, if I were to use the above DAX expression as is, without wrapping ‘Customer’ in ALLSELECTED, to create a measure instead of a calculated column, I would get an empty visual if I added the measure to a table visual, with the total row turned off, alongside columns [City], [Customer ID], and [Customer]. Using two EVALUATEANDLOG functions to inspect both the input and the output of the OFFSET function, I can see, in DAX Debug Output, the empty outputs of the OFFSET function, Figure 3, and many single row tables, one per input row, as the input to OFFSET, Figure 4. The bug lies in the fact that OFFSET, in our case, is supposed to partition and sort the same static table independent of the input rows, but instead, the table expression is filtered by them therefore produces a different table for each input row. It’s a very big topic to understand how to interpret the output of the EVALUATEANDLOG function. If you are interested, see my previous blog posts, 1, 2, 3, 4.

Figure 3.
Figure 4.

IMPLEMENTATION DETAILS THAT MAY AFFECT PERFORMANCE

You can skip the rest of the article unless you need to pass a large number of rows, e.g. more than a couple of millions, to the window functions. For the advanced users who are aware of the division of labor between the formula engine and the storage engine, DAX window functions are implemented strictly inside the formula engine. That means window functions are not pushed into the Vertipaq Engine for import models or folded into SQL for DirectQuery models. The rationale for this design decision was based on the observation that window functions are typically applied on top of aggregated values instead of to the raw data in the fact tables.

One consequence of implementing all window functions in the formula engine is that the formula engine must have a copy of all the rows of the input table. Even if the input table is an imported table, the formula engine still makes a copy of the table rows into its own spool structure instead of reusing the data in the Vertipaq Engine. Moreover, the formula engine creates additional in-memory data structures to facilitate fast mappings from any current row as determined by the context to output rows of the <table> parameter. Figure 5 depicts the in-memory data structures created by the formula engine to implement a window function that takes an input row from the context to calculate output row(s) from a static table. The diagram consists of three spools:

  1. The rightmost spool stores the raw data from the static table.
  2. The middle spool maps partition-by columns to sorted arrays. Each array stores sorted rows within a partition.
  3. The leftmost spool maps matching columns to the corresponding partition and the index of the row in the sorted array.

Although Figure 5 is just one example, all window functions are implemented following a similar pattern with more or less memory used as required by the semantics of the specific function. As one could imagine, the memory overhead can be quite significant when there are many columns and rows in the <table> therefore users should strive to minimize both to reduce memory consumption.

Figure 5.

CONCLUSION

DAX window functions have generated a lot of excitement because they enable users to define calculations over a set of rows related to the current row. Even though they are released during the holiday season, the DAX community has already provided valuable feedbacks for the product team to further refine and enrich the design and functionality. DAX developers can expect to see continuous improvements to these functions in the upcoming new year.

Advertisement
INTRODUCING DAX WINDOW FUNCTIONS (PART 2)

Introducing DAX Window Functions (Part 1)

The December 2022 release of Power BI Desktop includes three new DAX functions: OFFSET, INDEX, and WINDOW. They are collectively called window functions because they are closely related to SQL window functions, a powerful feature of the SQL language that allows users to perform calculations on a set of rows that are related to the current row. Because these functions are often used for data analysis, they are sometimes called analytical functions. In contrast, DAX, a language invented specifically for data analysis, had been missing similar functionalities. As a result, users found it hard to write cross-row calculations, such as calculating the difference of the values of a column between two rows or the moving average of the values of a column over a set of rows. Oftentimes even if there is a way to perform such calculations, the resulting expressions are convoluted and cause DAX Engine to consume too much time and memory, therefore, don’t scale to larger number of data points. For these reasons, the DAX product team is super-excited to present the first batch of window functions as an early Christmas gift to the DAX community. Like their SQL counterparts, the DAX window functions are powerful yet more complex than most other DAX functions therefore require more effort to learn. In this blogpost, I’ll describe the syntax and semantics of these functions with a couple of concrete examples so that you can have the correct mental model when you work with these functions. In my next blogpost, I’ll dive deeper under the cover to expose some of the inner workings of the these functions to help you design your own solutions with good performance.

All examples will be based on the Adventure Works DW 2020 DAX sample model.

A taste of DAX window functions

Let’s jump right in and create the first report using the OFFSET function.

  • First add columns ‘Customer'[Customer], ‘Date'[Date], and measure [Total Quantity] to a simple table report.
  • Next apply a filter to limit the rows to customers with multiple sales dates.
  • Now come the interesting part: define a measure [Curr – Prev] that, for each customer and sales date, calculates the delta of [Total Quantity] between the current sales date and the previous sales date.
  • Add [Curr – Prev] to the table to see the result in Figure 1.
  • It can be easily seen that the delta values are all correct. I have also verified that the total value, 100, is the sum of all the delta values.
Curr - Prev =
    [Total Quantity] -
    CALCULATE(
        [Total Quantity],
        OFFSET(
            -1,
            SUMMARIZE(ALLSELECTED('Sales'), Customer[Customer], 'Date'[Date]),
            ORDERBY('Date'[Date]),
            KEEP,
            PARTITIONBY(Customer[Customer])
        )
    )
Figure 1.

Advantages of the window functions

When the OFFSET function was first leaked a couple of months ago, some users questioned its usefulness. They argued that they could achieve the same results using existing DAX functions such as time intelligence functions or setting appropriate filters in the CALCULATE function in the following fashion:

CALCULATE(
    ...,
    VAR Curr = VALUES([OrderByColumn])
    RETURN
    FILTER(ALL([OrderByColumn]), [OrderByColumn]=Curr-1)
)

But window functions are more generic and powerful in that

  • The order-by values don’t need to be continuous.
  • Users can order by multiple columns, e.g. first by [Year] and then by [MonthNumberOfYear].
  • Users can divide rows of a table into partitions and then sort the rows in each partition independently. You can see from our first example that the [Date] values are different in the two partitions in Figure 1.
  • Window functions offer simpler, consistent syntax and better performance than previous solutions using the FILTER function. For those who have studied computer science, the difference in time and space complexity is O(N*Log(N)) or even O(N) for window functions vs O(N2) for hand-crafted FILTER expressions.

Syntax of window functions

Although each window function has its own unique signature, they all follow the same pattern and share common parameters as shown in Figure 2. <row-selection> is the only required parameter(s) that defines the unique characteristics of each function. See Figure 3 for the different ways to select rows from a partition.

Figure 2.
Figure 3.

Any function that takes more than a couple of input parameters appear daunting to the users. For that reason, window functions may look too complex to beginners. But the good news is that most of the parameters are optional. Moreover, the DAX product team has made all optional parameters skippable even if they are not the last parameter. As long as you don’t mind some red-squiggly in the Desktop formula bar, the following DAX expressions are all accepted by the DAX Engine.

Some of the shortcut syntaxes are not yet accepted in the formula bar, therefore you will see the red-squiggly underneath, but they are valid according to DAX Engine. More work is planned for them to be accepted by the formula bar in a future release.

The KEEP keyword in between <order-by> and <partition-by> can be omitted.
e.g. OFFSET(-1, ALLSELECTED(‘Date’), ORDERBY([Date]), PARTITIONBY([Fiscal Year]))
<table> can be omitted if <order-by> is present.
In this case all columns in <order-by> and <partition-by> must belong to the same table.
e.g. INDEX(1, ORDERBY(‘Date'[Date]), PARTITIONBY(‘Date'[Fiscal Year]))
<order-by> can be omitted.
In this case DAX Engine will automatically inject order-by columns.
e.g. WINDOW(2, ABS, -2, ABS, ALL(‘Date’), PARTITIONBY([Fiscal Year]))
<from-type> and <to-type> can be omitted in the WINDOW function.
In this case the type is defaulted to REL.
The formula bar already supports skipping these parameters completely.
e.g. WINDOW(-3, -1, ALL(‘Date’))
Shortcut syntaxes

How do window functions work

The list below describes the logical steps performed by each window function:

  1. Take all rows of the table as specified by the <table> parameter.
  2. Divide the rows into separate partitions by the unique values of the partition-by columns.
  3. Sort the rows within each partition according to the order-by columns and sorting directions.
  4. Determine the current partition and, if necessary, the current row within the partition.
  5. Return zero, one or more rows within the current partition.
    • OFFSET returns 0 or 1 row at a certain distance from the current row.
    • INDEX returns 0 or 1 row at a fixed position in the current partition.
    • WINDOW returns all the rows in between a lower bound and an upper bound. Either bound is a row at a certain distance from the current row or at a fixed position in the current partition.

In addition to the general rules listed above, there are some special, yet common use cases, which are worth calling out:

  • When the <table> parameter is omitted, DAX Engine derives the table expression from the order-by and partition-by columns as ALLSELECTED(<order-by columns>, <partition-by columns>). In this case all columns must be from the same table.
  • When the <partition-by> parameter is omitted, the entire table is treated as a single partition.
  • When the <order-by> parameter is omitted, DAX Engine will order by all the columns in the table. This is convenient when there is only a single column in the table, but not recommended when there are more than one column, in which case it’s a good practice to explicitly specify the order-by columns so the sort order is fully controlled by the user.
  • When the user-specified order-by columns are insufficient to determine the order of all the rows, i.e. there can be ties among some rows, DAX Engine will automatically append additional order-by columns from the table until total order is achieved. If this is not possible because the table has no key columns therefore there maybe duplicate rows, DAX Engine will return an error. Users should always provide enough order-by columns to achieve total order if they want to have full control.

How is the current partition or the current row determined

Since all window functions depend on the current partition and/or the current row to work properly, how do they know what the current partition or the current row is? In comparison, a SQL query always scans a set of rows in the FROM clause so the current row is natural for SQL window functions. On the DAX side, iteration functions such as SUMX, FILTER, SELECTCOLUMNS, GENERATE, etc. also work one row at a time, a window function could leverage that to determine the current partition and current row. For example, when I was writing this blog, someone asked how to filter a given table of sales over dates to keep only those rows with consecutive sales above a threshold. This is a very typical business problem for window functions to solve. If we extract one partition, corresponding to ‘Customer'[Customer] = “Antonio Bennett”, from the table in Figure 1, we get the table in Figure 4. If we want to find out consecutive rows where [Quantity] >= 2, i.e. the highlighted rows, we could use the following DAX query to achieve the result:

DEFINE
VAR _Table = 
    SUMMARIZECOLUMNS(
        'Date'[Date], 
        TREATAS({"Antonio Bennett"}, 'Customer'[Customer]), 
        "Quantity", [Total Quantity]
    )
EVALUATE
    FILTER(
        _Table,
        [Quantity] >= 2 &&
        (
            SELECTCOLUMNS(OFFSET(-1, _Table, ORDERBY([Date])), [Quantity]) >= 2 ||
            SELECTCOLUMNS(OFFSET(1, _Table, ORDERBY([Date])), [Quantity]) >= 2)
        )
ORDER BY [Date]
Figure 4.

In the DAX query above, both the FILTER function and the two OFFSET functions scan the rows from the same table variable, _Table, so it’s pretty easy to see that the two OFFSET functions would use the current row added by FILTER to the row context. In general, there is an evaluation context when a window function is calculated, so the window function will derive the current partition and the current row from its evaluation context.

To determine the current partition, a window function would look for the partition-by columns from row context or from the grouping columns in the filter context. If a match is found, it will use the values of those columns in the context as the key for the partition.

Similarly to determining the current row, a window function would use the same strategy but this time checking for all columns from the <table> parameter. To be precise, only model columns from <table> are considered because only model columns can be added to the filter context.

Figure 5 illustrates how an OFFSET function fetches column values from filter context or row context to determine the current partition and the current row within the partition. Please note that all columns from the <table> parameter are used to identify the current row regardless which columns are needed for order-by. In this example, the [IsHoliday] column isn’t a part of order-by or partition-by but is still needed to identify a row. Of course DAX Engine may optimize [IsHoliday] away from the key columns needed to identify a current row because there is a functional dependency between [IsHoliday] and [Date] but that’s a performance improvement that won’t alter the result of the logical model.

Is it possible for a current row from the context to not have a match to a row from the <table>? Absolutely, the rows from the outer context and the rows from the <table> parameter of a window function are independent of each other in the general case. There can be more rows in the context than rows from the inner table, or vice versa. This is very different from SQL window functions which are tightly coupled with the main query.

Figure 5.

What if there is no current partition or current row

In the previous example, we are lucky in that the evaluation context happens to have all columns necessary to identify the current row. But what if some columns are missing, or, worse yet, all columns are absent? DAX expressions must work in all contexts, that’s the fundamental reason that measures can be reused in any report. The DAX engine team has implemented a concept, called apply semantics, to window functions so that they not only don’t fail when there isn’t enough information in the evaluation context to identify a unique partition or row but they even return meaningful results in common scenarios. The name, apply semantics, was inspired by the CROSS APPLY operator of T-SQL.

Below is the logic of the apply semantics assuming a window function requires the current row:

  1. Divide all columns from <table> into those that can bind to the evaluation context and those that cannot.
  2. Build an iterator to return rows from the outer context corresponding to the bound columns in #1.
  3. For each row R1 from #2, build an iterator to return all possible rows corresponding to the unbound columns in #1 that exists with the R1. For example, if [Column] is the only unbound column, iterate over all rows returned by VALUES([Column]).
  4. For each row R2 from #3, combine R1 and R2 into R. R has values for all columns in #1, therefore is a valid current row.
  5. Use R to locate the current partition and then the current row within the partition.
  6. Calculate the result row(s) as defined by the semantics of the window function.
  7. Output the row(s) as long as it has not been output already.

As you can see, apply semantics effectively enumerates all valid current partitions/rows in a given context, calculates the regular output of a window function for each partition/row, and then returns the union of the output without duplicates. As a result, OFFSET and INDEX may return more than one row, and WINDOW may return more rows than the size of the window. Going back to Figure 1, in order to calculate the value of [Prev – Curr] in the grand total row, DAX Engine iterates over all valid combinations of [Customer] and [Date], shifts to the previous [Date] value within the given [Customer] partition, and in the end outputs all rows in the table except for the last rows in each partition and then use that as a filter to calculate the difference between the values of measure [Total Quantity] with and without the filter.

The apply semantics poses a potential performance risk when the unbound columns come from different dimension tables therefore may produce a big cross-join in contexts without sufficient filters. DAX authors should pay close attention if they use window functions in this fashion.

Summary

The advent of window functions in DAX opens a floodgate of opportunities for Power BI users to solve complex, even previously intractable, problems in an efficient, uniform, and elegant way. They can be used to perform a wide range of calculations on sets of data, e.g.

  • Compute running totals and running averages
  • Find best and worst performers
  • Investigate trends across time
  • Calculate contributions to the whole, such as commission percentages

Moreover, this is just the initial release with limitations and known issues. The DAX product team is actively working on additional improvements and new features to enrich this category of functions so users can expect more exciting capabilities to arrive in the near future.

Introducing DAX Window Functions (Part 1)

EVALUATEANDLOG AND DAX ENGINE OPTIMIZATIONS

This is the fourth and the last article on the EvaluateAndLog function. While DAX developers were excited about the prospect of printing the values of a portion of a DAX expression, many of the early adopters of the function were puzzled by a mysterious message they ran into from time to time: “EvaluateAndLog function is not executed due to optimization.” Another question on people’s mind is “While it’s great to be able to see the values of a sub-expression used in my calculation, is there a way for me to see the values of other expressions which are not needed to calculate the final result?” I will provide answers to both questions in this post.

For today’s DAX examples, I will be building on top of the three-table (‘Sales’, ‘Customer’, ‘Product’) model I used last time and will provide a link to the final pbix file at the bottom.

Let’s start with a simple DAX measure

Measure6 = SumX(‘Sales’, EvaluateAndLog(‘Sales'[Sales Amount]))

If you add [Measure6] to a card visual and expect to see the list of values from the [Sales Amount] column that contribute to the sum, you will see instead what seems like an error message, Figure 1.

Figure 1.

The message tells users that instead of first calculating the ‘Sales'[Sales Amount] sub-expression and then calculating the sum, the DAX Engine was able to send the entire SumX as a single calculation to the storage engine, either the Vertipaq Engine or the DirectQuery Engine. In other words, instead of reading all the rows from the ‘Sales’ table into the DAX Engine, and then adding the values of the [Sales Amount] column one row at a time, the DAX Engine simply asks the storage engine to return the sum value directly. This implies that the EvaluateAndLog function is confined within the DAX Engine and never pushed down to the storage engine. For those who are confused by the separation of the DAX Engine and the Vertipaq Engine because in their mind the two work together as the Power BI Engine, it would be easier for them to imagine a DirectQuery model where all three tables are hosted in a relational database. There is obviously no effective way for the DAX Engine to push EvaluateAndLog into a single SQL query to retrieve a set of intermediate results in addition to the requested aggregation. If the DAX Engine is able to push a bigger expression down to the storage engine, any EvaluateAndLog embedded within will not be able to serialize intermediate values while inside the storage engine therefore is optimized away.

The DAX Engine has implemented a large number of optimizations. Pushing as many operations as possible down to the storage engine is one of the most important ones. Here is another example:

Measure7 = Calculate(
    Sum(‘Sales'[Sales Amount]),
    EvaluateAndLog(‘Product'[Color] = “Red”)
)

If we add [Measure7] to a card visual, we will receive the same notice as in the case of [Measure6]. That’s because the DAX Engine is able to fold the entire Calculate expression into a single storage engine request that joins the ‘Sales’ table with the ‘Product’ table, applies the [Color] = “Red” filter, aggregates the values of the [Sales Amount] column, and finally returns a filtered sum value. Since the DAX Engine is more likely to optimize commonly written expressions, users will run into the special notice more often on frequently used expression patterns.

An expression alone does not determine whether or not an optimization happens, the evaluation context also influences the decision. Let’s use an example adapted from a very good question asked by David Bianconi on LinkedIn by defining a measure as

Measure8 = Calculate(
    Sum(‘Sales'[Sales Amount]),
    EvaluateAndLog(
        CalculateTable(‘Customer’, ‘Customer'[Education] In {“College”, Partial College”})
    )
)

Add the ‘Customer'[Education] column and [Measure8] to a table visual and we will get Figure 2. DAX Debug Output receives two DAXEvaluationLog events because [Measure8] is calculated twice, once in the context of the rows of the distinct values of the ‘Customer'[Education] column and the other in the total row. EvaluateAndLog serialized the table value in one event, Figure 3, but was optimized away in the other, Figure 4. Clearly the DAX Engine has decided that it’s able to optimize in one case but not the other because the contexts are different.

Figure 2.
Figure 3.
Figure 4.

Since optimizations are essential for reasonable query performance in many basic scenarios, the product team had chosen to retain some optimizations despite the presence of EvaluateAndLog. We may turn on even more optimizations in future releases to prevent EvaluateAndLog from causing disastrously expensive calculations. But we will reconsider the decision if we hear strong feedbacks from the community that users would rather see the intermediate result despite the potential risk.

One of the obvious optimizations in the DAX Engine is that the expression of an unused variable is not evaluated. But the product team has decided to make an exception for the EvaluateAndLog function. If the entire expression of an unused variable is wrapped in EvaluateAndLog, the expression will be evaluated! This feature enables DAX developers to print the values of arbitrary DAX expressions even though they don’t contribute to the final result. One can easily imagine possible applications of this feature. We can now print any contextual information in the middle of a DAX expression.

We’ll end this article with an example that takes advantage of the unused variable trick. If we define [Measure9] as

Measure9 = AverageX(
    EvaluateAndLog(Values(‘Sales'[Order Date])),
    Calculate(DistinctCount(‘Sales'[ProductKey]))
)

and add the above measure to a card visual we will see that EvaluateAndLog is optimized away. This one maybe hard to understand because the DAX Engine wasn’t able to push the entire average of distinct counts down to the Vertipaq Engine. On the other hand, the DAX Engine clearly needed to first calculate the distinct counts by different values of the [Order Date] column before it could calculate the average. As it turns out, the AverageX function is so smart that instead of getting the list of dates and the list of distinct counts in two separate steps, it’s able to combine them into a single step hence optimizing away a dedicated step just to retrieve the dates. That’s clever and all but it doesn’t help us see the list of dates if we really want to. To get what we want we can introduce an unused variable. If we rewrite [Measure9] as

Measure9 =
Var _unused = EvaluateAndLog(Values(‘Sales'[Order Date]), “_unused”, 1000)
Return
AverageX(Values(‘Sales'[Order Date]), Calculate(DistinctCount(‘Sales'[ProductKey])))

We will see all the dates that had product sales, Figure 5.

Figure 5.

Today we examined the interactions between the EvaluateAndLog function and optimizations inside the DAX Engine. Key takeaways from today’s lessons are:

  • The DAX Engine performs many optimizations by reducing a multi-step calculation into a single-step one.
  • The optimizations in today’s article are strictly from the perspective of the DAX Engine. A single-step calculation in the DAX Engine may take many steps to compute in storage engines.
  • The EvaluateAndLog function is confined to the DAX Engine and maybe optimized away, in which case a special notice is included in the JSON output of the DAXEvaluationLog event.
  • The DAX Engine will calculate the expression of an unused variable if its outmost function is EvaluateAndLog.

External Links

Download the latest version of Power BI Desktop if you want to try out today’s examples.

The pbix file used in today’s examples can be downloaded from here.

The current version of DAX Debug Output is here. You must have admin rights on the machine and agree to install the software from an unknown publisher. You download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

EVALUATEANDLOG AND DAX ENGINE OPTIMIZATIONS

UNDERSTAND THE OUTPUT OF EVALUATEANDLOG FUNCTION OF TABLE EXPRESSIONS

Last week, we learned how to interpret the output of the EvaluateAndLog function of scalar expressions. Albeit interesting, the new function would be much less useful if it could only return scalar values. After all, Power BI users can always define a scalar expression as a measure and then study its values in various contexts. The real power of EvaluateAndLog is that it can also wrap around any table expressions, the output of which was much harder to visualize until now.

One of the key takeaways from last week’s post is that a scalar expression produces many values in the presence of input columns. Each unique combination of values of input columns causes a reevaluation of the expression. The same is true for table expressions, which means the same table expression can generate multiple tables in a single query. Such dynamic behavior is what makes DAX powerful and hard to learn at the same time. With the help of the EvaluateAndLog function, we can now, for the first time, see all the intermediate tables instead of just imagining them in our head.

Let’s try a table-valued EvaluateAndLog with a simple measure:

Measure4 = CountRows(EvaluateAndLog(‘Sales’))

We inject EvaluateAndLog inside the CountRows function to see which rows are counted.

⚠️For this week’s examples, we will be building on top of last week’s 3-table model, the diagram of which is shown again in Figure 1 just like last week. That’s why our first example is already the 4th measure in the model. A link to the updated pbix file is provided at the end of this article.

⚠️ If you want to try the examples in this post on your own, make sure that you have installed the most recent update of the August 2022 release of Desktop which contains a critical bug fix. The version of your Desktop must be 2.108.997.0 or later.

Figure 1.

Adding ‘Product'[Color] and [Measure4] to a table visual, we get Figure 2 and two DAXEvaluationLog events, Figures 3 and 4. There are eight tables in Figure 3, one for each product color, and only one table in Figure 4 corresponding to the total value of the table visual. Unlike scalar expressions, a table expression has an “outputs” property in its JSON document which is a list of columns that define the schema of the table.

Figure 2.
Figure 3.
Figure 4.

It’s an interesting challenge how to present to the users in DAX Debug Output more than one table from the same table expression. Since all the tables share the same schema, I chose to show them in a single data grid just like the multiple values from a single scalar expression are displayed. This also happens to be the way such data is stored in the DAX Engine: rows from tables calculated from the same DAX expression are saved in a single data cache along with corresponding values of the input columns. To make it easier for users to differentiate between rows from different tables, I added to the data grid three extra columns, whose headers are in light and dark grayish blue colors:

  • [Table Number] indicates to which table the row belongs;
  • [Row Number] indicates which row in the table;
  • [Row Count] indicates the total number of rows in the table.

Since data tables can potentially contain a lot of columns and rows in which case will be very expensive to serialize, the EvaluateAndLog function, by default, only serializes up to 10 rows per table into the JSON output, but it also serializes the total number of rows as the value of the “rowCount” JSON property to let users know if any rows are skipped. Users can use the optional <max rows> parameter of the function to change the limit. Since intermediate tables are not sorted by the DAX Engine, there is no general way of controlling which subset of rows will show up in the output. However, you can use functions like Filter, TopN, or Sample to select a subset of rows you want to see, e.g.

CountRows(EvaluateAndLog(Sample(10, ‘Sales’, ‘Sales'[ProductKey], Asc)))

But injecting Filter/TopN/Sample functions would change the result of the CountRows function, therefore is not a desirable workaround. Fortunately, there is a trick to have better control over the output rows without changing the result of the original expression, I’ll show how to do it next week.

In addition to the row count limit per table, DAX Engine also limits the maximum size of the JSON text. Once the JSON output becomes too big, DAX Engine will stop the serialization except to make sure that the result is a valid JSON document. Moreover, DAX Engine will insert a special JSON property to inform users of the truncation. For example, the following expression

CountRows(EvaluateAndLog(CrossJoin(‘Product’, ‘Customer’), “large output”, 10000))

would produce a large JSON string with the following property toward the end.

“notice”: “Data truncated due to event size limit.”

Currently, the threshold of the JSON size limit is set at 1,000,000 characters but it’s subject to change in the future. Large JSON strings are slow to parse and display in the data grid of DAX Debug Output. When that is the case during your debugging session, you can check the “Parse JSON on demand” option to switch from automatic parsing to on-demand parsing. If the hard limit disappoints you because you want to use DAX Engine’s JSON serialization ability to export large amount of calculated data, you should be warned that it’s not a primary function of Power BI therefore performance will become an issue when data volume is large, but there are a couple of extra easter eggs hidden in the product which may allow you to do so anyway when they become publicly available.

Let’s end the post with a real-life debugging example. Around the time when I was writing the first post, I was asked to help debug a measure that didn’t return the desired result. In order to show the expression using our demo model, I rewrote the problematic DAX expressions into the following two measures:

Uniq Products = DistinctCount(‘Sales'[ProductKey])

Measure5 =
Var _temp = Summarize(‘Sales’, ‘Sales'[CustomerKey], “Uniq Products”, [Uniq Products])
Var _currCustomer = SelectedValue(‘Sales'[CustomerKey])
Var _prevCustomer = TopN(1, Filter(_temp, ‘Sales'[CustomerKey] < _currCustomer), ‘Sales'[CustomerKey], Desc)
Return [Uniq Products] – MinX(_prevCustomer, [Uniq Products])

The expression for [Measure5] is complex and inefficient but the idea was to show the contrast between an old way and a new way (yet another easter egg hidden in the product) of calculating the difference of measure values between the current row and the previous row in a table visual.

[Measure5] works in four steps:

  1. Calculate a two-column table of customers and the corresponding numbers of unique products purchased by them.
  2. Find the current customer in the evaluation context.
  3. Find the previous customer from the intermediate table in Step 1 ordered by the ‘Sales'[CustomerKey] column in ascending order.
  4. Calculate the difference between the numbers of unique products purchased by the current customer and by the previous customer.

But when adding both measures to a table visual, the delta values were clearly wrong, Figure 5. By the way, since the expression is O(N2), I had to apply a filter on the ‘Sales'[CustomerKey] column to limit the number of rows to 30 for the calculation to finish quickly.

Figure 5.

To find out what’s wrong with the expression, let’s wrap EvaluateAndLog around the Summarize function to see if we even get the first step right. We’d like to see a single table with all 30 customers and their product counts so we can navigate from any customer to the previous one. But the actual EvaluateAndLog output consists of 30 single-row tables, one table per customer. That’s because in DAX a simple table name by itself means a filtered table. Since ‘Sales'[CustomerKey] is in the filter context, the first parameter to Summarize, ‘Sales’, is filtered by the current [CustomerKey] value in the filter context which limits ‘Sales’ to only those transactions belonging to a single customer. Summarize would then group the filtered ‘Sales’ by [CustomerKey] to return a single row. This analysis can be easily verified by wrapping EvaluateAndLog directly around ‘Sales’ to see, Figure 8, that it only returns rows that match the values of the input column. The fix is to replace ‘Sales’ by AllSelected(‘Sales’) as the first parameter to the Summarize function, after which, we get the desired output, Figure 8, as well as the correct intermediate table of step 1, Figure 9.

Figure 6.
Figure 7.
Figure 8.
Figure 9.

Today, we studied by examples the table version of the EvaluateAndLog function. Key takeaways from today’s lesson are:

  • Both scalar and table expressions are reevaluated for each unique combination of values from the input columns therefore can potentially generate multiple values or tables in a single query.
  • EvaluateAndLog imposes a default limit of 10 rows per table but includes a “rowCount” property in the JSON document to inform users of the true size of the table. Since the table is not sorted, the 10 rows are randomly selected from user’s perspective.
  • EvaluateAndLog imposes a limit of roughly 1,000,000 characters per JSON string but will keep the final output a valid JSON document.
  • EvaluateAndLog inserts a special “notice” JSON property when its output is truncated due to the size limit.

External Links

Download the latest version of Power BI Desktop if you want to try out today’s examples.

The pbix file used in today’s examples can be downloaded from here.

The current version of DAX Debug Output is here. You must have admin rights on the machine and agree to install the software from an unknown publisher. You download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

UNDERSTAND THE OUTPUT OF EVALUATEANDLOG FUNCTION OF TABLE EXPRESSIONS

Understand the Output of EvaluateAndLog Function of Scalar Expressions

This is the second article in a series on the introduction to the EvaluateAndLog function. As I explained last time, EvaluateAndLog can be used to print to the debug output the intermediate result of any DAX expression. In this article, we will learn how to interpret the output of the function. We first study the output of the function when the first argument is a scalar expression, which returns a single value, be it a number, a string, a datetime, etc. Next week we will cover the output when the first argument is a table expression.

All DAX expressions used here are based on a simple 3-table model shown in Figure 1. The tables and data are imported from one of Microsoft’s AdventureWorksDW sample databases. A link to the pbix file used for the examples is provided at the end of the article.

Figure 1.

Let’s start with a simple measure, [Measure1], and add it to a table visual, Figure 2.

[Measure1] = EvaluateAndLog(Count(‘Product'[ProductKey]), “Measure1”)

Figure 2.

In DAX Debug Output, there are two DAXEvaluationLog events, as shown in Figures 3 and 4. It means that the expression is calculated twice, once, Figure 3, for each color of the product, and the other, Figure 4, for the value of the total row. In case you wonder whether the two calculations are independent of each other, the answer is yes. Even though DAX Engine can derive the total value by simply adding all the per-color values, that’s not the correct way of calculating total values in the general case, e.g. when the aggregation function is a non-additive one such as DistinctCount. That being said, DAX Engine does have the ability to eliminate the separate calculation of the total value when the aggregation function is additive, e.g. Sum. But in the case of [Measure1], the presence of the EvaluteAndLog function caused DAX Engine to choose a less optimal but more general execution plan. This is the key reason that Microsoft enables the function only in Desktop for debugging purposes but not in production environments and why we advice users to remove obsolete instances of EvaluateAndLog in their DAX expressions whenever possible. When Microsoft is ready to enable the function in Power BI services, the product team will give users full control over the behavior of the function in different environments. By the way, whether or not the EvaluateAndLog function disables a particular optimization in DAX Engine is subject to changes at the discretion of the product team. We will discuss more about the interactions between the EvaluateAndLog function and various DAX optimizations in a future article.

Figure 3.
Figure 4.

If you compare Figures 2 and 3 carefully, you can see that the numbers in both figures match but the order of product colors is different. That’s because DAX Engine calculates the values of [Measure1] first and then sort the rows in a separate step afterwards. The EvaluateAndLog function records the results when the values are calculated but before the output rows are sorted. Luckily, you can click any header in the data grid of DAX Debug Output to sort the rows by that header.

If you inspect the output of EvaluateAndLog in JSON format, you can see a property named “inputs”, the value of which is a list of column names. There is also a “data” property whose value is a list of “input” and “output” pairs. The value of the “input” property is a list of values corresponding to the columns in the “inputs” property. The value of the “output” property is the calculated value of the DAX expression for the given “input” values. DAX Debug Output uses different header colors to help users tell apart input and output columns in the data grid. But what are those input columns? Where do they come from? The answer is that they are the set of columns in the evaluation context that affect the results of the calculation and that the columns generate multiple combinations of unique set of values, each of which would cause the scalar expression to calculate a new value in the intermediate result.

To better understand input columns, let’s change the visual to a matrix and then add the [Education] column from the ‘Customer’ table to Columns, Figure 5, If you examine the DAXEvaluationLog events from the matrix query, you will see that they stay the same as in Figures 3 and 4. That’s because [Measure1] counts the [ProductKey] column in the ‘Product’ table therefore is not affected by columns from the ‘Customer’ table. As a result, the ‘Customer'[Education] column is not counted as an input column to the Count(‘Product'[ProductKey]) expression although it is a part of the matrix query. What if we add a filter ‘Product'[Size] = “L” to the visual? There is no change to the number of rows in the output of the DAXEvaluationLog events even though the filter on the ‘Product'[Size] column does change the values of the calculation. As a rule of thumb, input columns most likely come from columns added to visual axes or columns from table parameters of DAX iteration functions.

Figure 5.

Let’s use a more complex example to see how we can leverage the EvaluateAndLog function to solve one of the puzzles often encountered by students of DAX. We define a measure, named [Measure2], to calculate the average number of unique products purchased on each day.

[Measure2] = AverageX(Values(‘Sales'[Order Date]), EvaluateAndLog(DistinctCount(‘Sales'[ProductKey]), “daily unique products”))

When added to a card visual, Figure 6, the value of the calculation doesn’t look right. Even without examining the data in the ‘Sales’ table, it’s highly suspicious for an average calculation to return a perfect whole number.

Figure 6.

Before we inspect the intermediate result labeled “daily unique products”, let’s first predict what we should see. For the AverageX function to calculate an average value, it needs a list of numbers, one per day because the first argument is a single-column table of distinct order dates, therefore we should see a list of dates and corresponding numbers of counts. But the actual output, Figure 7, is surprisingly a single value 158. It tells us that the expression, DistinctCount(‘Sales'[ProductKey]), doesn’t depend on any input column, in particular not the ‘Sales'[OrderDate] column, therefore it returns the same value for all the rows of the table expression Values(‘Sales'[OrderDate]). That explains why the output value is a whole number as the average of the same number, regardless the number of dates, is the number itself.

Figure 7.

For those who already understand the difference between row context and filter context, they know this is where the famous Calculate function is needed to turn row context into filter context. We fix the bug in the DAX expression by wrapping the Calculate function around the DistinctCount function. Afterwards, DAX Debug Output displays the expected list of values per day, Figure 8, and the card visual, Figure 9, shows a more reasonable average value of 21.17.

[Measure2] = AverageX(Values(‘Sales'[Order Date]), EvaluateAndLog(Calculate(DistinctCount(‘Sales'[ProductKey])), “daily unique products”))

Figure 8.
Figure 9.

This quick debugging exercise shows us that you can use the EvaluateAndLog function to see the mysterious evaluation context in action. Input columns are an important, and often difficult to understand, part of the evaluation context. Now we have a way to visualize them.

DAX Debug Output colors the headers of the input columns orange and the header of the output column maroon. When there are multiple input columns, DAX Debug Output alternates dark and light orange colors on their headers, Figure 10.

Figure 10.

Today, we learned how to interpret the output of the EvaluateAndLog function when its first argument is a scalar expression. Key takeaways from today’s lesson are:

  • A scalar expression is calculated against the combinations of values of a set of input columns.
  • An input column is a column in the evaluation context that produces a set of distinct values, each of which causes the scalar expression to calculate a new value.
  • An input column is typically one from the visual axes or is added by a DAX iteration function.
  • A pure filter column in the evaluation context is not considered an input column even if it changes the result of the calculation because its distinct values don’t cause the scalar expression to be evaluated multiple times.
  • For a given scalar expression, when there is no input column, you will see a single value in the debug output; when there are input columns, you will see multiple values, one per unique combinations of the input column values.
  • Intermediate results are not sorted. You can use DAX Debug Output’s sorting feature to help see the results.
  • For a given query, the same EvaluateAndLog expression can generate multiple DAXEvaluationLog events, one per unique combinations of input columns as specified in the query.

External links

  • The pbix file used in today’s examples can be downloaded from here.
  • The current version of DAX Debug Output is here. You must have admin rights on the machine and agree to install the software from an unknown publisher. You download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

Understand the Output of EvaluateAndLog Function of Scalar Expressions

Introduce the DAX EvaluateAndLog Function

There is a hidden gem in the August 2022 release of Power BI Desktop: a secret new DAX function EvaluateAndLog that is not listed in the feature summary of the official blogpost. Yet this function will become a great learning and debugging tool for anyone who writes DAX code in the years to come. Microsoft has a good reason not to announce this function publicly yet. There are a couple of known bugs and, more importantly, the work for intellisense support in the DAX formula bar has not been completed. Therefore if you try the function today, you won’t see it on the suggestion list and, if you type the function name yourself, you will see the red squiggly line underlying the function. But the expression should calculate just fine in the latest version of Desktop. Ironically, the lack of intellisense support is really a blessing in disguise in my opinion. I’ll explain why I think so at the end of this blogpost.

So what is EvaluateAndLog for? Let’s say you define a measure [Profit] with the following expression and add it to the report.

[Profit] = Sum(‘Sales'[SalesAmount]) – Sum(‘Sales'[Cost])

You want to double check if the numbers you see on the report are correct. In the past, you would add the two columns [SalesAmount] and [Cost] as values to your report temporarily to inspect the numbers. Now there is a new way without changing your visual: you can temporarily change your measure definition to

[Profit] = EvaluateAndLog(Sum(‘Sales'[SalesAmount])) – EvaluateAndLog(Sum(‘Sales'[Cost]))

The measure will return the same result so your visual doesn’t change but the modified expression will cause the DAX Engine to send the intermediate results of both Sum(‘Sales'[SalesAmount]) and Sum(‘Sales'[Cost]) to a secret channel, I call it “debug output”, for you to inspect. In other words, you can use EvaluateAndLog to send the intermediate result of any portion of your DAX expression to debug output without changing the result of the original expression. Let that sink in: any intermediate result of your DAX expression. Can you imagine how powerful this is? Not only can you see the values of the two individual Sum’s before the subtraction, but you can also use EvaluateAndLog(‘Sales’) to inspect some of the raw data that feeds into the two Sum’s!  I’ll explain how to do it in a future blogpost as this feature is too big to be covered in a single one.

At this point, some people might have realized that EvaluateAndLog is like the “print debugging statement” in other programming languages. As a professional software engineer my entire career, I have witnessed firsthand how the debugging tools and practices had advanced leaps and bounds in the past decades. In spite of all of the progress, “print debugging statement” still accounts for most actual debugging activities in the real world because no other tool can replace the convenience and versatility of a simple “print” and its many variations in different programming languages. So EvaluateAndLog opens the door to a whole new world of DAX learning and debugging.

Now that you are all hyped up, let’s start with the syntax of EvaluateAndLog:

You can wrap the function around any portion of your DAX expression that returns either a scalar value or a table. When the expression is calculated, DAX Engine sends the intermediate result in JSON format as the content of the TextData column of an Analysis Services Trace Event, called “DAX Evaluation Log”.

Now that we have covered the mechanics of EvaluateAndLog, the challenge is how to retrieve and visualize the output of the function. I believe that, in due time, Power BI Desktop and other third-party tools will add the ability to display the output of the function in a user-friendly fashion. Before it happens, if you must use an officially supported tool from Microsoft, you can download and install the Microsoft SQL Server Profiler which is a component of the free SQL Server Management Studio (SSMS). Afterwards, you have to find out the port number of Power BI Desktop, connect SQL Server Profiler to Desktop, subscribe to the DAX Evaluation Log event and other relevant events, and then you can see the output of the function. Below is a screenshot of SQL Server Profiler displaying the output of EvaluateAndLog.

Going the SQL Server Profiler route is neither easy nor user-friendly. To deliver the benefits of the new way of debugging DAX to the Power BI community sooner, I rolled up my software engineer sleeves, learned some basic UI programming, developed a Windows desktop application on my own. You can download the free software, named DAX Debug Output, from here and install it on the same machine as Power BI Desktop. You need to have admin rights on the machine and agree to install the software from an unknown publisher as this is my own project, not a Microsoft product. Below is an annotated screenshot of the main window of the application. I will explain how the application works in my next blogpost.

EvaluateAndLog is currently enabled only in Power BI Desktop but disabled in other Power BI products. The function doesn’t do anything in the environments where it is disabled other than passing through the result of its child expression. When enabled, the function may disable some DAX optimizations so it’s imperative that you delete the function from your DAX code after you are done with debugging. This is where the red squiggly line due to the current lack of intellisense support comes in handy.

It’s a convenient way to remind you that you should fix the expression before publishing it to Power BI service. Although the function has no impact in the service today, it could cause performance degradations once Microsoft enables the function in more production environments.

We have barely scratched the surface of EvaluateAndLog but this is already a long post. I plan to write more posts on this subject in the following weeks. In the meantime, if you can’t wait to play with the function, here is the link to DAX Debug Output again. Download both DAXDebugOutputSetup.msi and setup.exe to your machine and run setup.

https://github.com/pbidax/DAXDebugOutput/releases/tag/v0.1.0-beta

Introduce the DAX EvaluateAndLog Function

DAX “IN” Operator

It’s hard to keep a secret nowadays. Although Microsoft has not officially announced it, people have discovered on their own that DAX has a new IN operator. Let me explain why the Power BI team have not made this public yet.

  • The IntelliSense work has not completed yet so although a DAX expression containing the new syntax will be accepted by the engine, it will be red squiggly underlined in the DAX editor.

redsquigglyinoperator

  • Users cannot use the new syntax directly as a setfilter argument to the CALCULATE family of functions. So the following syntactic shortcut is not yet accepted by the DAX engine.
CALCULATE([Total Sales], 'Product'[Color] IN { "Red", "Green" }) // Not working right now.

You have to use the full version instead:

CALCULATE(
    [Total Sales],
    FILTER(ALL('Product'[Color]), [Color] IN { "Red", "Green" })
)

If you are unfazed by these limitations which should be addressed early next year, feel free to enjoy the new convenience in DAX today.

'Product'[Color] IN { "Red", "Green" }

or

('Product'[Color], 'Customer'[Country]) IN { ("Red", "Russia"), ("Italy", "Green") }

By the way, since DAX is a functional language, there is an equivalent new function CONTAINSROW:

CONTAINSROW({ "Red", "Blue" }, 'Product'[Color])

or

CONTAINSROW(
    { ("Red", "Russia"), ("Italy", "Green") },
    'Product'[Color],
    'Customer'[Country]
)

The examples above show off not just the IN operator, but also a new way to construct a table expression from a list of values or tuples. As with other DAX functions and operators, the new syntax is fully composable as illustrated in the following examples of valid DAX expressions:

'Product'[Color] IN VALUES('Product2'[Color])
{ VALUES('Date'[Year]) - 1, VALUES('Date'[Year]), VALUES('Date'[Year]) + 1 }

warningAlthough the following syntax is also accepted by the DAX engine today, it is not by design so you should only use curly braces {} instead of parentheses () to enclose a list of values or tuples, the latter of which should be enclosed in ().

'Product'[Color] IN ( "Red", "Green" ) // Do not do this. Replace () by {} instead.

 

DAX “IN” Operator

Show Text Attributes in Tooltips

PowerBI allows users to add measures to tooltips. Many users want to add text columns as well. Today if you drag a text column to the tooltips, a count measure is created which is not very useful. Before PowerBI enhances the tooltip feature by allowing users to drag text columns to tooltips to show the text contents, you can make it happen today by creating DAX measures. Let’s say you want to add ‘Table'[TextColumn] to tooltips. Create a measure as below.

[TextColumn Tooltip] = TOPN(1, VALUES(‘Table'[TextColumn]), [TextColumn], ASC)

Now you can drag the newly created measure to tooltips. I have created a sample model with such a tooltip based on AdventureWorksDW database. As you can see in the picture below, when users hover over an individual product, the tooltips include the [Description] column for that product.

productdescriptiontooltip

 

Show Text Attributes in Tooltips

Connect any number of tables together via a common column

One of the key powers of Power BI is the ability to bring data from different sources into a single model and then join them together by creating relationships among tables to perform analysis and create reports across all the data in the unified model. But a lot of people are confused when they run into one of the errors below while trying to create a relationship between two tables inside Power BI Desktop:

  • We cannot create a relationship between ‘Table A'[Column X] and ‘Table B'[Column Y]. This could be because there is missing intermediate data to connect the two columns.
  • You can’t create a relationship between these two columns because one of the columns must have unique values.
  • You can’t create a relationship between these two columns because the ‘Column X’ column in the ‘Table A’ table contains null values.

Those who are familiar with relational database theories can appreciate the sound principles underlying those error messages. But many users of Power BI may be at a loss of what to do next. While Power BI Desktop can make creating relationships easier in the future by helping users resolve those errors automatically, there are things you can do today to connect the two tables together. The same methodology can be generalized to connect any number of tables together through a common column.

Assume a user has imported five different tables containing various information and transactions about products, all of which identify each product by a [Product] column, but the user is unable to create relationships among the tables as he or she is blocked by the errors listed above. The user wants to connect all five tables together via the [Product] column as illustrated by the conceptual schema in Figure 1.

 

Figure1
Figure 1.

To do so we need to use DAX to create a calculated table in the middle so that the five original tables can each create a relationship between itself and the calculated table. I have created a mock model in this pbix file to demonstrate the technique.

Create a calculated table by clicking the New Table button as shown in Figure 2.

ClickNewTable
Figure 2.

The DAX expression is

All Products =
FILTER(
DISTINCT(
UNION(
DISTINCT(Table1[Product]),
DISTINCT(Table2[Product]),
DISTINCT(Table3[Product]),
DISTINCT(Table4[Product]),
DISTINCT(Table5[Product])
)
),
NOT(ISBLANK([Product]))
)

This DAX formula takes the unique values from the [Product] column of each of the five tables, unions them together, removes duplicate values, and finally removes the BLANK value if there is one. You can use the same formula in any model by changing the number of tables and the names of tables and columns. Now you can create five relationships between each of the five original tables and the newly created calculated table. Figure 3 shows the final result in the Relationships view. You can see that every relationship allows filters to flow both ways as indicated by  bidi.

FiveTablesConnected
Figure 3.

Now that all fields in the model are connected, you can freely add any column from any table to your report, slice and dice to your heart’s content, things will just work as if by magic. Moreover, the calculated table in the middle will automatically recalculate each time you refresh data in the original tables.

 

Connect any number of tables together via a common column

Simple Filter in DAX Measures

Today’s post is about different ways to set a simple filter in DAX. Although there are plenty of writings covering these patterns, I still see DAX authors get confused on a regular basis. In this post, I am going to clarify the differences among the patterns. The examples are based on a pbix file created from the Adventure Works database that you can download from here. It helps if you have learned about filter context before but you should still follow the general idea outlined here without those knowledge.

Setting a basic filter is one of the first things people do when writing a DAX measure. Let’s say you want to write a measure to calculate the total sales on red products. The two common DAX patterns I saw people use are:

Pattern 1. Calculate(Sum([Sales Amount]), ‘Product'[Color] = “Red”)

Pattern 2. Calculate(Sum([Sales Amount]), Filter(‘Product’, [Color] = “Red”))

Both approaches work as expected in simple models and reports, but from time to time people run into troubles on such simple DAX measures and have difficulty figuring out the underlying reason.

Although Patterns 1 and 2 look very similar, they actually have different semantics in DAX. The differences become obvious when you add the [Color] column and both measures to a flat table chart as shown in Figure 1. While Pattern 1 returns sales of red products in all rows, Pattern 2 returns blank values in all rows except for the red color row and the grand total row.SimpleFilter_Figure1

Under the hood, Pattern 1 is simply a syntax shortcut for the following DAX expression:

Pattern 1.1. Calculate(Sum([Sales Amount]), Filter(All(‘Product'[Color]), [Color] = “Red”))

The key difference is the implied All function in Pattern 1. Because of the All function, Pattern 1 does nothing more than replacing any prior filter on the [Color] column by the new filter [Color] = “Red”, on the other hand, the lack of All function in Pattern 2 means the ‘Product’ table is first filtered by any existing filters from the filter context and then the remaining rows are further filtered by the predicate [Color] = “Red”. This explains why you see those blank values in Figure 1: Since the ‘Product’ table is first filtered down to the rows matching the color of the current row in the flat table and then further filtered by the predicate [Color] = “Red”, the net result is empty if a row has a color other than red; the grand total row has no filter coming from the flat table therefore only the predicate takes effect.

In addition to the unwanted results in Figure 1, Pattern 2 also suffers from an unintended performance hit even when it delivers correct results. For example, if a report has a page filter on ‘Date'[Calendar Year] = 2008, the ‘Product’ table will be filtered down to all products sold in 2008 before the predicate [Color] = “Red” is applied, that means DAX engine needs to scan the ‘Internet Sales’ fact table just to calculate the Filter function itself. Few people would want to scan a fact table only to set a filter on a dimension table.

The All function allows a column name or a table name as its argument. So the following pattern also achieves the same effect as Pattern 1 does.

Pattern 3. Calculate(Sum([Sales Amount]), Filter(All(‘Product’), [Color] = “Red”))

The difference between Pattern 3 and Pattern 1 is that Pattern 3 replaces filters on all columns from the ‘Product’ table in the filter context but Pattern 1 only replaces prior filter on the [Color] column. As a result, if you have several columns from the ‘Product’ table in the report, all of them will be removed from the filter context, the only remaining filter on the ‘Product’ table is [Color] = “Red”, as shown in Figure 2. As you can see in the picture, the values of measure [m – Sum Filter Pattern 3] is not affected by the filter [Class] = “L” from the flat table.

SimpleFilter_Figure2

Today we have discussed the differences among three ways of setting a simple filter in DAX. Although all three patterns may work in most cases and any one of them may be the right solution for a particular report, if you are not sure which one to use, I would start with Pattern 1, or its equivalent Pattern 1.1, due to its minimal side effect.

 

Simple Filter in DAX Measures