Make a simple safer version of VLOOKUP and INDEX MATCH
The majority of Excel users use VLOOKUP to get exact matches from a row or column, It would be good if they all used INDEX MATCH but it's more complex so how about this simpler version for the masses.
=GETMATCH(ReferenceToMatch, MatchColumn/Row, GetColumn/Row, Optional Value if no match)
If seen a few other references to updating VLOOKUPS and there are lots of opportunities, but it would be good to address the simple most widely used issue first.
Thanks for the suggestion Wyn! Thanks also to all the contributors to this thread for the thoughtful discussion and debate.
Of the many lookup function requests, this is currently one of the highest ranking on UserVoice. Please keep the votes rolling in if you’d like to see this, it helps us prioritize new feature work.
JoeMcD [MS XL]
The dynamic match_type is meant to overcome the limitations of using a constant match_type. Assuming a constant match_type, MATCH(,,0) is slow even on sorted arrays while MATCH(,,±1) relies on the lookup array always being sorted. The dynamic match_type solves this by always using the fastest search method that still returns the correct result (i.e. ±1 for sorted lookup arrays and 0 otherwise).
As long as you are using helper cells, the dynamic match_type only needs to be calculated once per lookup column, not once per MATCH. Thus a dynamic match_type only costs us 2(N − 1) comparisons, or about 4 average-case exact-match searches, for any number of MATCHes. BTW, I updated the file that I linked.
For anyone thinking that I am getting off-topic, my current stance is that GETMATCH should have a match_type. Without a match_type, GETMATCH would be forced to always use a linear search, which is not the fastest search method for sorted arrays. If GETMATCH has a match_type, then we can provide a dynamic match_type to speed up exact-match searches on sorted lookup arrays without relying on the lookup array always being sorted.
Your dynamic sort state defeats any benefits from binary search.
For unsorted matching, all MATCH does is iterate through its 2nd arg (which should be a 1D list) comparing it to its 1st arg, stops when it finds a match, and returns the match's index. If no match is found, it returns #N/A.
For ascending sorted matching, all MATCH does is check that its 1st arg is >= the first item in its 2nd arg (if not it returns #N/A quickly), then checks if its 1st arg is > than the last item in its 2nd arg (if so, returns the last item's index quickly). If still going, MATCH checks its 1st arg against the middle item in its 2nd arg list, and effectively replaces the whole list with the portion within which its 1st arg falls. It repeats this last bit until it finds an exact match or until item K is < 1st arg and item K+1 > 1st arg. Descending sorted matching works the opposite of this.
With N items in the list, unsorted matching involves no more than N comparisons, and sorted matching involves no more than INT(LOG(N,2))+2 comparisons. Your dynamic matching would involve 2(N-1) comparisons before the matching would begin. How could that aid efficiency?
I'm an idiot. My main argument for not using match types was that if you use, say, MATCH(,,1), then you are assuming that the lookup column will always be in ascending order. However, I never thought of making the match_type dynamic to adapt to the sort state of the lookup array. MATCH(,,±1) is fine. The way that I was using it is not. Now that I realize this, I see that my sort arrow suggestion was unnecessary, so I deleted it.
To set up a dynamic match_type, see the file linked below:
Hopefully the default match_type for GETMATCH is the exact match, since it does not require a sorted lookup array.
Maybe for you most large tables have filters. Not for me. One of the models I work with several times a day has a 2000+ row table in a hidden worksheet. Filters in hidden worksheets would be kinda pointless. There are over 6,000 lookup formulas into this table in other worksheets. I want that table sorted, and I DON'T want lookup functions which would only use binary search when there was an autofilter showing it was sorted.
You're talking about interrelating various pieces of functionality. I'd prefer keeping separate pieces of functionality as orthogonal as possible.
Also, tables can include columns with formulas. Autofiltered tables can be sorted on columns containing formulas, and the autofilter arrow will show sorted. However, those formulas could recalculate and change values, rendering the column no longer sorted, but the autofilter would still show it sorted. Point: relying on autofilter sorted state isn't robust. I've seen too many BIG MISTAKES arising from small assumptions in spreadsheets to be comfortable relying on any assumptions.
My Tables comment refers to looking up using columns other than the first one, and if those other columns were sorted, MATCH could use binary search. Your whole first paragraph in you previous comment showed you didn't understand that this was the situation to which I was referring.
Tables that are big enough for the binary search to save significant time usually have filters on. At some point, before you do the binary search, you need to sort the lookup range. What I am suggesting is that this sorted state is stored as the sort arrow, and if an edit breaks the sort, then remove the arrow.
This way, whenever the sort arrow is there, you know that the range is sorted. Then you don't have to check if the list is sorted by comparing each item with the next. You instead check the sort state by looking if there is a sort arrow, which takes constant time. The sort state wouldn't even use additional memory, since Excel already has a sort arrow, but it currently behaves more uselessly from what I'm suggesting.
I hope that that clears up my side of the argument. Also, here is a picture of the sort arrow:
Lastly, I'm not sure why you mentioned "Tables can be sorted on columns other than the first one.". I never made that assumption, but VLOOKUP does.
MATCH with 3rd argument +/- 1 is meant for sorted ranges, and that's what I meant. Tables can be sorted on columns other than the first one. Also, I understand AND EXPECT +/- 1 to be used only when one KNOWS lists are sorted.
LOOKUP always assumes sorted lists. VLOOKUP and HLOOKUP allow for tables sorted on first column or row, respectively, in ascending order or unordered, but not sorted in descending order. MATCH allows ascending, descending or unordered, and that's part of its strength compared to the lookup functions.
The reason for indicating sorted or unsorted is the algorithm the lookup functions and MATCH use. If you know you have sorted lists, those functions can use binary search, which is O(log(N)). For unsorted lists, those functions have to use linear search, which is O(N). I've dealt with workbooks with lots of unnecessary linear search; it can really slow down recalc.
Finally, checking whether a list is sorted is O(N). Meaning NOTHING GAINED checking whether a list is sorted in order to use binary search. You should only use the binary search variants when you KNOW lists are sorted. That is, there's no sensible alternative to KNOWING whether a list is sorted, then TELLING Excel to treat it as sorted for lookups.
Harlan, if you can think of a use for MATCH(,,±1) in unsorted data, please tell me. Otherwise, we can assume that MATCH(,,±1) is only useful in sorted data, which is its greatest weakness. Why should the user of MATCH(,,±1) have to assume that the data will always be sorted? Even if you add sort detection to your MATCH(,,±1) formula, you still have to manually sort the data if it is not sorted (assuming no UDFs). This is why the match types, as we know them, should not be implemented in GETMATCH.
Of course, one main use for MATCH(,,1) is to get the position of the highest match that is ≤ our lookup value. MATCH(,,-1) is similar. If the match types behaved this way but without the requirement for sorted data, then they'd be OK in GETMATCH.
Another reason why you might want GETMATCH to have match types is to speed up searches in sorted data. I think that lookup functions should switch to binary search if they detect that the column is sorted, rather than be told which search method to use. See the link below:
Totally agree Vlookuo in 2010 version doesn't work as updates and patches are applied
Vlookup alternative for neophytes to match up datat from one spreadsheet to another based on a common data link. End-users scream at us that Vlookup is impossible unless they are programmers. I agree Vlookup is just way too complicated.
Paul K Dick commented
Of the GETMATCH proposal, I especially like the last paramater (value if not found). I think its a pain to have to code an IFERROR with each lookup I do. I must admit that the IFERROR is a **BIG** improvement to ISERROR.
VLOOKUP (and HLOOKUP) still work well for threshold lookups on sorted tables, though INDEX+MATCH also works for that. Indeed, since MATCH can handle ascending and descending orders, INDEX+MATCH is more flexible for that too.
OK, find, time to deprecate the lookup functions, but no need for GETMATCH without providing at least the flexibility of INDEX(...,MATCH(...,...,1)), INDEX(...,MATCH(...,...,0)) and INDEX(...,MATCH(...,...,-1)).
For me, the absolute ideal would be structured references from Excel, regular expressions from LibreOffice Calc, and the FILTER function from Google Sheets. Maybe also the resurrection of SQL.REQUEST and SQL queries against tables.
I should point out that GETMATCH and FILTER don't entirely cover the same scope. GETMATCH is for simple lookups, but for the more complicated stuff, we need FILTER. I'd be happy enough if we had both. Anything besides VLOOKUP.
GETMATCH being the only function aside from IFERROR to provide an alternative return value doesn't make sense. If all the lookup functions did so, much better.
Excel is becoming as encrusted as PL/I back in the bad old days of the early 1970s, just before C and its descendants took over. At the very least, it's time to deprecate the Lotus 1-2-3-like database functions and the rather pointless *IF/*IFS functions in favor of the truly more useful and not difficult to explain FILTER function.
While variants of formulas can serve experienced users brilliantly we still need basic safe and simple formulas for the general Excel user population.
That's who my suggestion is aimed at. Keep it simple with one formula with one purpose.
Simple to use, simple to review / audit
Furthering Kenneth's point, FILTER would eliminate the need for
and very likely over half of actual use of LARGE, SMALL and ROW.
As for Wyn's GETMATCH, just add a 4th argument to LOOKUP to let it do exact and threshold matching. Make that optional 4th arge 1/0/-1 like MATCH, so the lookup array/range could be sorted in either order.
To summarize what Harlan said, FILTER can easily handle multi-criterion lookups and can simulate a "get the nth match" function using INDEX(FILTER(),n). When n = 1, it acts like a multi-criterion INDEX/MATCH.
For me, that's only half of the beauty of FILTER. The other half is doing away with functions like SUMIFS or MAXIFS. FILTER is like a lemma. It's not very useful on its own, but it is extremely versatile when paired with other functions.
Wyn, I don't entirely agree with your idea of trying to integrate IFERROR into a lookup function, since there are other functions that you could try to integrate. For example, you could integrate INDEX by having an "nth match" parameter, or MATCH by having a parameter to indicate whether or not the range being searched is sorted.
However, integrating IFERROR into a lookup function is better than the approach that Microsoft seemed to take. IFNA was recently introduced to deal with the "no match" scenario for VLOOKUP and MATCH. It's like IFERROR, but it's specific to the #N/A error.
Hi Harlan, I like the idea of a multiple criteria lookup function and I think there are a few suggestions for this elsewhere on the UserVoice.
I think both our approaches are the same just the syntax is re-arranged?
The simpler the better, the more options that are provided to users the more daunting most users find it and the more likely they end up with an incorrect formula.
My suggestion of having an optional value if no match again does away with the need for an IFERROR wrapper.
If you have Google Sheet's FILTER,
=INDEX(X:X,MATCH(foo,G:G,0)) == =GETMATCH(foo,G:G,X:X) == =FILTER(X:X,G:G=foo)
Note that if GETMATCH returned #N/A when no match was found, it's easy enough to handle alternative return values using IFERROR.
However, even better,
=INDEX(X:X,MATCH(TRUE,IF(G:G=foo,J:J=bar),0)) == =FILTER(X:X,G:G=foo,J:J=bar)
For that matter,
=INDEX(X:X,SMALL(IF(G:G=foo,ROW(X:X)),n)) == =INDEX(FILTER((X:X,G:G=foo),n)
An equivalent for Google Sheets's FILTER function would be more flexible than INDEX+MATCH or GETMATCH (which is LOOKUP with exact matching).
Glenn Russell commented
Looks good to me!
Harlan, if you use index match then any column can be the key field. That's why I prefer that approach to VLOOKUP. This suggested formula would make that even simpler.