Power Query - cache shared nodes
Update Power Query in Excel to take advantage of caching in cases where a parent node refers to a child node that has already been refreshed (as exists in Power BI desktop today).
This issue creates significant performance problems with refresh times when creating highly interdependent financial and operational models. This is a show stopper from a usability and customer acceptance standpoint.
We are happy to announce that the feature is finally available on Production starting from July fork (build 16.0.10726.*) for Office 365 subscribers.
Thank you all for your votes and the feedback.
- Excel Team
@Margaret - as per the Excel team comment below, Office 2019 has this feature, so any issues you are experiencing with long runtimes for queries would be unrelated to cached nodes.
Margaret F. Barr commented
I don't have e 365, I have MS Office Pro Plus 2019, and am leaning Power Query. The length of time for refresh is terrible. Not everyone had the internet connections to do 365. I thought 2019 would be the saving grace...NOT so far. Please do something. I have been on the MS Power Query Forum about this, and there is no resolution.
Just to clarify, the fix will be available for Office 365 customers starting from July fork (build 16.0.10726.*) and for Office 2019 perpetual customers.
The fix is not be available for Office 2016 perpetual customers, or for customers running Office 2013 or Office 2010.
- Excel Team
Hi Excel Team,
This is perfect news, thank you so much for taking our wishes in consideration and rolling it out!
Will an update also be available for earlier versions, e.g. Office 2013 Pro Plus?
Thank you and kind regards,
This was released to the general public in office 365 in Feb/Mar 2018. It will improve old queries. You don't have to rewrite, but it will only improve some queries where Excel was going back to the server repeatedly for the same info because the data was used in multiple places. There are tons of scenarios where this won't help at all as Excel was never calling for the data multiple times in the first place.
I have not been able to find out if this made it into Excel 2019 though. That is based on 1803, so kind of border line as to whether the feature is there or not, but it is definitely in any Office 365 for Windows build out now. Even deferred channel should have it has that is only 6 months behind current versions.
Jim Kemp commented
There seems to be some confusion over whether this improvement is in "mainstream" production for all users or just for "Office Insiders". I, myself (not an "Office Insider"), have not seen the kinds of significant improvements people have been posting about. Is it possible that a) the improvement is not yet released to the general public or, b) the improvement only affects power query files built after the improvement was released?
I am currently using version 16.0.8431.2366 in my organisation... does this include the related improvement? I'm trying to rule this out as a performance issue with my queries.
It took me 20-25 mins to refresh in Excel 2016 as compared to less than 1min in Power BI Desktop!
This was rolled out early 2018 for office 365 and I believe was included in office 2019 when it shipped
Hi, just checking if this is complete and rolled out. Is it?
Hi All! I watched trace log of the version, and have to say that a number of exceptions can be found in the logs.
All of them a happens when there are operations with files of different kind. So far I see exceptions while reading excel files and in case of using cash of web-requests. I think exceptions a thrown during using the cash, because, when I refresh a multi-step query in PQ UI, the exception happens only after the second launch of a function, which makes a web request. Again - all the exceptions can be seen only in trace logs, while you get correct data in result of request. BUT - these exceptions defenitely slow down the work - there are a number of them, and PQ ignores Table.Buffer when it see such exceptions (I can see that in the logs as well).
The error message is:
ExceptionType: System.IO.EndOfStreamException, mscorlib, Version=188.8.131.52, Culture=neutral, PublicKeyToken=b77a5c561934e089
Message: Чтение после конца потока невозможно.
в System.IO.BinaryReader.FillBuffer(Int32 numBytes)
в Microsoft.Mashup.EngineHost.Services.FileSystemAccessHelper.TryIgnoringAccessExceptions[T](Func`1 func, IEvaluationConstants evaluationConstants, T& result, Exception& accessException)
Adam Bender commented
I have noticed significant speed improvements since the changes recently. Using Win 10 64 bit, Office 365 64 bit, connecting to a MS SQL server.
Previewing or making changes to a table with 33M rows and around 50 columns was nearly impossible before, or it would take 5 minutes per step. Now it takes around 20-30 seconds to get the preview and only a few seconds per step.
Just wanted to follow up on this to see if there has been any progress with this update? The last update from the Excel Team was early in July. I also see that there has been no updates to PQ for Excel 2013 since July; whereas PowerBI has received numerous updates. Will these recent updates be flowed through to Power Query?
But it would be definitely interesting to understand, why do you make 2 equal request, and not just 1? Of course is much better in comparison with the current standard release, where you make from 4 to 6 equal requests, but why you decided to make 2 instead of 1?
Guys, you definitely have improved web requests, thanks for that! At least from that point of view Excel now works better than PBI Desktop ;-). So now, if I make any number of references to a web query in Excel (loaded to a model, of course), there is NO any additional requests to a web source while refreshing the model. And in PBIDesctop I still have additional requests per reference. Just cool for Excel users!
I think this is obvious to anyone using PQ along with PowerPivot, but let's just make this distinction really clear:
1- If you are not using PowerPivot, always make sure your table is not loading to the data model ("load to" window)
2- If you are using PowerPivot, make sure you ONLY load to the data model the queries that you absolutely need.
In my case, with a large, complex data set, I made sure to create data model specific final queries where the last step was to "Remove other columns". That serves as a gatekeeper to stop any unnecessary columns from taking up space in my data model.
Another tip for PowerPivot users: download the PowerPivot Utilities add-in to measure the weight of each column, usage of your columns and metrics, etc. It is a fantastic tool for optimizations.
To everyone that's unchecking the load to data model box; the reason it speeds up the refresh is because it's not importing the data to Power Pivot which of course has it's own downsides if you are managing millions of rows and want to generate relationships between the tables.
Neil Good commented
I agree with Darrell Ripkowski below - having the Load to Data Model unticked does make it lad a lot faster. No idea why of course! As Laura says, Power Query is very good but I get the feeling MS do not want to go down this road.....
Love the capability of Power Query, but I am a prisoner to the s . l . o . w . . . update time. Yes I have 5M rows, appending from 25 underlying tables, but it takes an unbelievable amount of time to update. I would abandon it all together, if it had not revolutionized the reporting capability to the Vice Presidents of the company I am contracted at. I am a consultant, so I must do what they ask, and not offer opinions about the time wasted. To them the result is phenomenal, no matter how long it takes. : (
Carlos Cortinas commented
Hi Darrel Ripkowski, can you tell me please where a I found that Store button or checkbox, please?