Slicing Inflation with Power BI

I've been a software engineer for over 30 years with a lot of experience in C#, C++ and SQL Server and more recently in Go, AWS and Postgres. We all understand the benefits of analyzing large amounts of data visually to help us see and understand trends. Interactive screens allowing users to zoom in on aspects of the data can provide exponential benefit in data analysis . For this reason, I was drawn to Power BI and thought it would be good to use data I was familiar with for this first project.

A local grocery chain (King Soopers, affiliated with Kroger and City Market) has the option on their web site to download past receipts as long as you checked out with their savings card. Below I describe how I gathered the data for the report. I also detail some of the steps of putting the report together along with some different approaches I took to produce the report I had envisioned.

Data Collection

To capture the data, I logged into their site, went to each receipt for the past couple of years one at a time and brought up the details. There was only a print option, so I set up print-to-file to capture the data in files.

I then wrote a C# program to parse the data out into fields and post the data to two database tables, Receipts and Receipt Items.

The “Receipts” Power BI report

At a high level, I envisioned a report with three pages: a summary screen where a date range could be selected and a visual would display purchase amounts and savings; a details pane where individual items or groups of items could be selected to focus on details of those items; and a price changes view showing items that were purchased both near the start date and near the end date.

For the Power BI report, on the main summary page, I wanted to show total money spent on groceries for each month and also show the amount saved from coupons, etc., in the same graph:

Summary

The objective of the Details page is to allow the user to choose any one or more items purchased to narrow in on the amount spent on those items and the price changes over the selected date range. It also allows analyzing similar products together:

Details

The date range selected on the summary page applies to the Details and Price Changes pages as well. The products chosen on the Details page will also filter the products on the Price Changes page.

The purpose of the Price Changes visual is to capture the difference between the price near the start of the period and the price near the end of the period, as well as to summarize this data for all the products selected on the Details page. Since I rarely shop more than once per week and often don't buy the same items every week, the data is often sparse, and it would be unlikely to have many products purchased on the start date and the end date. I wanted to extend the dates for both the start and end into a start date range and an end date range. In the following paragraphs, I discuss the range and logic around the start date, but the same concepts were used for the end date.

The way I did this was to calculate the number of days between the start and end dates. Then I took 5% of that result to look for prices before and after the start date by that number of days. The price for a product within this range and the one closest to the begin date is used as the period beginning price for that item. Products that were not purchased within the starting and ending date ranges are not considered in the Price Changes data and do not appear in the table visual.

To illustrate this for the data shown in the screen shots of this blog, the date range is 2022-8-13 through 2023-11-4, a span of 448 days. Using the 5% before and after these dates would create date ranges of 46 days around each date. In the Details page, eight flavors of 8 oz. yogurt were selected. Three of those items were purchased within the starting and ending date ranges of the period. The bottom two visualizations show all the purchase dates and related prices for each of the selected items around the start and end dates respectively. They contain supporting data to verify the contents of the Price Changes grid and would probably be eliminated in a production report.

Price Changes

High Yield Reports with Minimal Time Investment

Writing the first two pages (Summary and Detail) were amazingly simple and followed some of the patterns I learned when taking online courses on Power BI basics and modelling, etc. (See the end of the blog for a link to some of them.) The concept of slicers was new to me, but once I got started, was intuitive and the fact that they automatically filter data in other visuals based on the table relationships made it very simple to create the visuals I planned in the conceptual stage. Basically, a slicer, like the date slicer on the Summary screen or the list selection slicer (for items), is a visual whose data selections are used to filter (slice) the data in other visuals.

Getting the slicers to affect visuals on other pages took a little research, but one of the courses I took mentioned it was possible, and implementing it was straightforward.

Power BI Modeling

Being used to SQL, the concept of duplicating tables, removing unneeded columns, and removing duplicate rows took some getting used to along with not fully normalizing the data. But after using Power BI for a while, I could see why it made sense.

Data modeling is a very important step of building the report and I could see from some of the online courses how modeling well can make the other steps in building the report simpler.

Model View

The first steps I did when creating the model was to get data from the Receipts and ReceptItems tables in the database. More than one course explained the importance of creating a date table, which I did with the CALENDARAUTO function. I added some calculated columns for Year Month and Year Month Num to be used in the visual on the Summary page.

The item description was in the ReceiptItems table. I found from time to time on the receipt from the store that the description changed slightly which made some data on the report look like there were multiple different items even though the UPC code was the same across receipts. I solved this by normalizing the Item Description into its own table which is a copy of ReceiptItems, but with a distinct UPC code, which forced the table to have a distinct description per item.

Another challenge I ran into was that I was constantly needing to get the order date from the Receipt table since it was not in the ReceiptItems table. In this case, I de-normalized the data by adding the order date to ReceiptItems with a calculated column: order_date = RELATED(receipts[order_date]). This simplified a lot of the code I needed to write when it came to the Price Changes visual.

Challenges

The real challenges started when trying to create the Price Changes visual on the tab with the same name. I didn’t want to give up on the concept of expanding the start and end dates into ranges to make up for my data not having prices on every date. When I started, I was using Microsoft Fabric and I solved this problem using a SQL stored procedure to get the prices I was looking for within my calculated start and end date ranges. Then using this stored procedure as a SQL Endpoint, I got the data inside Power BI directly from the Fabric Lake House. I learned some of the differences between Azure SQL and SQL Server SQL along the way. It wasn’t too bad once I found some workarounds from using variables and temp tables, namely using Common Table Expressions (CTEs).

In the middle of all of this, my access to the Fabric Lake House stopped working due to a bug in Fabric and I thought it would be a good time to figure out how to solve the same problem just using Power BI without the stored procedure. This led me to taking a deeper dive into DAX (Data Analysis eXpressions) which I found quite confusing. I found enough info through Copilot and other searches to get very close to what I was attempting. One of the challenges is that by using the date ranges around the start and end of the period, it goes outside of the dates selected with the slicer.

I started to break down the problem by creating two additional tables, both based on Receipt Items — BegPrices and EndPrices. Before adding Measures (similar to calculated columns, but by default, respect automatic filtering from slicers) to the tables, the columns used were order_date, order_number, price, and upc_code. These tables allowed me to have date ranges with associated prices around the beginning and ending of the period chosen. The data from these tables can be seen in the two “Prices” Table Visualizations at the bottom of the Price Changes tab (see screen print above). The main reason for these visuals is to display the columns and measures and verify the data in the Price Changes visual at the top of the page.

Even though I was close to what I had envisioned for the Price Changes page, I couldn’t quite get all the details right. At this point I bought the book The Definitive Guide to DAX* to better understand how the engine is working under the hood. The two issues I was having that I hoped I could resolve while reading the book were:

- On the Price Changes page It was not always picking the price closest to the start date if there were multiple purchases within the start range.

- The totals of the Price Change percent columns did not always correctly sum the detail rows displayed. The details were correct.

Often when writing code in other programming languages, one can search for a code snippet online and adjust it to accomplish the given task. With DAX, it's not quite the same. I found my results were close to what I wanted, but I didn't know why the results were a little off. Here is a quote from the book on page 8: "You need to study DAX theory and thoroughly understand how evaluation contexts work before you can write good DAX code. If you do not have a proper theoretical foundation, you will find that DAX either computes values like magic or it computes strange numbers that make no sense."

At this point, I'm going to delve into some technical revelations that allowed me to produce the correct results on the Price Changes screen that evaded me at first since the screen is more advanced than a typical visual and requires a knowledge of what the DAX engine is doing. Feel free to skip the rest of this section if you're not interested in the technical explanations.

This is just a short blog, so I won't get into much of the theory, but in the case of the two issues I was trying to solve, I'll discuss how understanding the DAX engine rules helped me resolve my problems. In a Power BI report, the user sets the filter context by selecting values in the visual slicers. The other visuals, such as bar charts and tables, will automatically honor the filters that apply to their data. In the case of my Price Changes visual, I wanted to extend the start date to be a date range surrounding the start date. This means for that particular calculation, the filter context would have to be changed to allow dates before the start date.

Programmatically, there are only two functions in DAX that can change the filter context: CALCULATE and CALCULATETABLE. From an example I got online, I started by using the FILTER function to set the date range. I tried many different variations with it and never got the results I was expecting. When I changed the following measure to use CALCULATE instead of FILTER, it produced the desired result. This measure returns the price for the current product from the receipt closest to the start date (as a reminder from above, in this example, it could be up to 23 days before or after the start date). Since it's a measure and not a calculated column, it honors filters set by slicers that are not overridden by CALCULATE (or CALCULATETABLE):

Beg Price = // price closest to beginning of period
    VAR BegDate = MIN('Calendar'[Date]) // start date of the period
    VAR PeriodInDays = 
		// inclusive days between start and end date
		DATEDIFF(min('Calendar'[Date]), max('Calendar'[Date]), DAY) 
		// how many days on either side of start date will we tolerate &
		// still call it the start date price.  ()
		VAR PriceToleranceDays = ROUNDUP(PeriodInDays / 20, 0) 

    RETURN
    CALCULATE(
	  // needed to guarantee 1 row for CALCULATE (TOPN with 1 
	  // for 1st param will insure only one):
	  MAXX(
	    // pick the date (& price) closest to the start (absolute value
	    // to get closest to start date--could be before or after)
	    SELECTCOLUMNS (
	      TOPN (1, BegPrices, ABS(BegPrices[order_date] - BegDate), ASC),
            "price", BegPrices[price] // only return the price column
         ),
         [price]
	  ),
	  // add filter to dates around the start date:
	  DATESBETWEEN ('BegPrices'[order_date], BegDate - 
	    PriceToleranceDays, BegDate + PriceToleranceDays),
	  // removes only the calendar filter to allow going beyond date 
	  // range from slicer, but keeps all other filters selected
	  ALLCROSSFILTERED('Calendar')
    )

Some notes about this measure:

There are a lot of comments in the code to explain each statement.
The code is not complicated. It is the rules around what the engine does that are essential to understanding what behavior the code dictates.
The three "variables" (really constants) are evaluated in the filter context outside the CALCULATE statement. They are actually evaluated when they are used, but in the context before it is changed by the removal of the 'Calendar' filter and the addition of the filter for order dates within the tolerance period around the start date.
CALCULATE is different from most functions in that the filter parameters (starting with the 2nd parameter) are executed before evaluating the expression in the first parameter (the MAXX function and everything within it).
MAXX is only needed to satisfy that CALCULATE only has one row returned from the expression. [price] is the only column returned and DAX will treat a table of one row and one column as a single value.
The TOPN statement will pick the single date and its related price which is closest to the start date. If there happen to be multiple purchases for the same item on the same day, or on days equidistant from the start date, it will pick the first one it comes to. The ABS (absolute value) causes the expression to pick the price with the closest date to the start date.

The other problem I was seeing was that in the Weighted Price Change percent column, the total was often different from what I expected. I wanted the total to be the sum of the detail Weighted Price Change percent values. The details were correct and represent the price change attributed to the current row's product. The Weighted Multiplier column represents how much a particular column should affect the total price index change based on how much was spent on that product during the time period in relation to all the products considered. Only those products that had a price near the start of the period and also near the end of the period were considered. The Annualized Price Change percent total was also off, but I knew if I fixed the Weighted total, that would also be correct. The most important values on this visual are the totals for the two rightmost columns.

I could tell DAX was applying the same formula to compute the total as it was for computing the details. It didn't occur to me until I discovered others online struggling with the same issue around totals that what I had to do was to put branching logic in the measure to treat the total line differently. The reason I couldn't use the same formula for the total values was that to compute the detail Weighted Price Change percent, it was using the value in the Price Change percent column. That was perfect for the details, but the total in that column (which is not shown in the visual) is a meaningless value.

Below is the measure that correctly reports the total. When the report was calculating the incorrect total, the measure was computing the value as [Price Change] * [Weighted Multiplier] for both details and the total. You can see in this measure that the details use the above formula, but on the total row, the value is a sum of the details:

Weighted Price Change = 
IF (
	// true for detail; false for total
	HASONEFILTER(itemDescription[UPC]),
	// detail row calculation (true condition)
    [Price Change] * [Weighted Multiplier],
	// for the total line, bypass default logic and calculate it as the
	// sum of the detail percentages (false condition)
    SUMX (
        itemDescription,
        [Price Change] * [Weighted Multiplier]
    )
)

HASONEFILTER returns true for details and false for the total row. I won't go into that here.
The SUMX function is an iterator function as are other functions that end in 'X' and loops through a table (itemDescription) evaluating the expression for each detail row where the existing filters are in place.
It is important that the table specified for SUMX is itemDescription since this is the table that the visual is based off of (one row per product). A measure needs to be defined in a table, but under the covers it is independent of the table. In fact, both of these measures are defined in receiptItems, since that's what made sense to me when defining them as they both relate to prices and amounts which come from the receipt item. However if receiptItems were specified as the table, the total could be much different since that table has duplicate rows for the same product.

DAX is much different from any other language I've encountered and I found that until I started to understand what was going on in the DAX engine, I was wasting a lot of time looking for solutions to similar problems and tweaking the formulas to try and produce what seemed so simple conceptually. I still have a lot to learn, but I can at least now see how the values in my report are being produced.

Future Considerations

- Rather than using an external C# program to clean the data, use R or Python from Power BI to read the data from the files to produce the Receipts and Receipt Items tables.

Conclusion and Links

Power BI is a great tool for writing visual reports. If the data is fairly clean and the desired visuals are relatively straightforward, the reports can be created quickly and with ease. It was my experience that stepping outside of basic reporting and adding custom logic that requires more complicated DAX, added a steep learning curve. The main reason for this is that a lot of logic happens under the hood to provide automatic joining of data from different tables and filtering of the visual data from slicers. While this hidden logic makes writing straightforward reports simple, it requires a deeper knowledge of how DAX works to accomplish more complicated functionality.

Download

The Power BI file for the project discussed in the blog can be downloaded here:

It can be opened with the desktop version of Power BI.

The desktop version of Power BI can be downloaded here for free: Power BI Desktop

The online version of the project: Online Receipts Report

Most of the online courses I went through were from Pragmatic Works: Creating A Microsoft Fabric End-To-End Solution

Power BI and Fabric Developer Hands-On Power BI Tutorial

Data Modeling for Power BI

Data Engineering in Microsoft Fabric

*Russo, M., Ferrari, A. (2020). The Definitive Guide to DAX (2nd ed.)

About the author, Tim Millar:

I am a senior software engineer with over 20 years of experience designing, developing and implementing applications primarily in the financial industry. The tech stack I've used has included C# and SQL Server on Azure and Go and Postgres on AWS. Feel free to reach out to me directly: tim@myndcorepartners.com

Slicing Inflation with Power BI

Recent Posts

Comments

Subscribe to Our Newsletter