Power Query - cache shared nodes
Update Power Query in Excel to take advantage of caching in cases where a parent node refers to a child node that has already been refreshed (as exists in Power BI desktop today).
This issue creates significant performance problems with refresh times when creating highly interdependent financial and operational models. This is a show stopper from a usability and customer acceptance standpoint.
I wanted to share with you that the new query caching mechanism in Excel has been deployed to Office Insiders starting from Excel version 1801 (build 9001.*). At this stage, we would like to allow some “baking” time as we monitor the feature health metrics.
In this scope, we need your help to ensure a quality release! I encourage you to try the following scenarios and share your feedback:
1. Run Refresh All on a complex workbook with multiple query dependencies. Does it work faster?
2. Run Refresh All on a complex workbook with multiple query dependencies. Do you see any issues with your data?
3. Refresh a single query several times. Do you see any issues with your data?
- The Excel Team
@Adam - anyone with 1801. The only people that won't ultimately get it are those that purchased a perpetual Office 2016 license.
It will be in Office 2019 this fall too.
In an earlier response you indicated that this was available for users with version 1801 and beyond.
Does that only apply to Office Insiders, or anyone with that version? I am on the semi-annual channel.
Chad kukorola commented
@Neil Good, if you can use incremental refresh capabilities that have recently been released, then using Azure would allow you to setup a much better process for bigger models.
Check out https://docs.microsoft.com/en-us/power-bi/service-premium-incremental-refresh or related articles. I believe it’s still in preview and only for Power BI Premium.
I don’t/can’t use it currently, some am left with workarounds for my larger models.
Neil Good commented
Is there any way working with Azure can alleviate this issue? Lots of people seem to have CSV files which need to be imported/supplemented so would going via Azure sort this out?
@Ed - thanks for the response. I think PQ is an amazing tool. We have started heavily integrating it into our finance and accounting operations and would not be able to do some of the things we do without it.
I only wish MSFT had built PQ into Excel a decade ago. Keep improving it - this is where things are going!
Chad kukorola commented
I work almost exclusively with .csv files, and pulling in from File and append.
The two algorithms begin for run-length encoding and dictionary encoding consume the time on refresh. Each time it’s refreshed, it must run these again. For example, I have a model with 3 million record fact table, a few dimensions (one using same source as fact table). Refresh time is about 12-13 minutes.
Now, you need to know it’s 46 fields in the fact table (needed), and the processor is very old. On my newer machines with a newer processor, it’s a little more than half that time.
The compression and mapping algorithms must run regardless (in excel it auto partitions at 1 million records), then it’s the hardware that matters; fast processor, large processor cache, and as fast of RAM as you can buy.
Until we can legit partition when needed, those are your bottlenecks on refresh. However, bottom line; WOW, what a great capability regardless!
@nickolas, it depends heavily on your data source. If it is a bunch of CSV files, then those are pulled in. If they are on Sharepoint/Onedrive, that can take even longer. If it is SQL Server, then it depends on how efficient your queries are and take advantage of query folding. This is one of those "it depends" based on the source (of which Power Query supports several dozen) and how efficiently you are using them.
Nicolas Bransier commented
Thanks @Ed. My models fall in the latter category :(
It's disappointing and frustrating as I was hoping this change would improve the overall performance of the refresh on any models. Looks like it's not the case.
Does anyone know what is happening at the "Retrieving Data" phase, which takes very long in my case? It doesn't seem to be retrieving data since the data download starts after this phase. So what is it retrieving and why does it take so long?
@Nicolas - not all models will see an improvement. If your model calls the same data repeatedly through queries that are referenced by other queries, this enhancement will cache the data from the first query and let the 2nd, 3rd, etc. use it without pulling the data again.
If your models don't do much/any of that, then this wouldn't show much of an improvement if any.
How can Microsoft treat users of perpetual licenses this way ?! You sold the product with an error, which leads us to inability to use it, and do not want to fix it free. Are you waiting for a riot?
Nicolas Bransier commented
Hi Excel Team,
I have version 1803 and unfortunately do not see a change in the refresh performance with my models. It seems that the "Retrieving data" phase is taking the longest time. Is there a way to troubleshoot what is happening during this phase? or Does the improvement in performance only applies to new models?
@adam - it is in build 1801 of Excel or later for Office 365 users. Insiders, Monthly Targeted, and Monthly users should have already. Deferred channel may already have or be getting soon depending on how the IT department is rolling it out.
If you have Office 2016 perpetual license, you'll either have to purchase Office 365, or purchase Office 2019 perpetual when it comes it. AFAIK, it won't be pushed out to 2016 perpetual licenses.
When will we know that this is being pushed out to the general Excel audience?
I can't wait for this new improvement.
Will this be posted as an update to Office 2016 and the like (Power Query normally gets updates), when fully live?
...or will we need to update to Office 2019 or get office 365 for this improved version?
7.3 mb workbook, 1 main query loading to workbook, merging with 8 connection only queries, several custom columns and various transformations, 1 query loading to workbook no merges no transformations, 1 query loading to workbook 1 change type transformation. All query sources are an Excel 2016 workbook.
Excel 2016 365 64-bit v. 1807 (build 10325.*) < 15 seconds, multiple tests under Win 7 i5 3rd gen
Excel 2016 MSO 64 bit (16.0.4266.100) under Win 10 i7 7th gen: times vary, with first test at about 4 minutes and the 3rd test in a row closer to 1.5 minutes.
So, the changes to Excel seem to have dramatically sped up query processing times.
Hi Excel Team. I am excited about the new query caching mechanism for Excel. I was looking forward to reading feedback, but I see none here yet. Is there some other area where it is posted?
Prem Wolanin commented
Dear Excel team, is there a chance we'll see an updated PQ addon for Excel 2013?
@Tim Hoogenboom, me too!
Tim Hoogenboom commented
I use MS Office Pro Plus 2013 and still have to download and install Power Query. Am I right that I will never see the improvement in future PQ download/installs?
Dear Excel Team,
this is really important and I'm not sure if this is being addressed with the issue as described here:
I found strong indications that the refresh order of queries is not based on the interdependencies of queries but the chronology of when the queries were created! If this is true I'd urge you to not limit your efforts on the refresh performance but also on the order in which the queries are refreshed! I guess most users are not aware of that - in contrast to what the visualization feature in Excel for the interdependencies of queries suggests - the 'Refresh All' feature would potentially not have a logic implemented which would refresh the queries in the order of their interdependencies.
In my case of a complex hierarchy of queries I found these strong indications for the described behavior when I ran 'Refresh All' multiple times and watched the order in which the queries were being refreshed. Because of the complexity and the slow performance of the refreshes I recognized that at least one query was refreshed at the end though it actually was in between other queries with regards to its interdependencies. When I had discovered this I also did some research online and found multiple forum threads which seemed to deal with the same issue, suggesting that one would have to run a refresh one query by one using VBA to enforce the correct order.
Since I did this research a couple of weeks ago and I didn't save my findings I unfortunately can't provide further details right now. I just found this uservoice thread and wanted to share my recent experiences immediately to avoid that this aspect might be overlooked. Despite the fact that I can't provide better details right now I expect that you are pretty aware of if my observations make sense or not with respect to what you know about architecture of the 'Refresh All' functionality.
I'm looking forward to your thoughts on this.