Make a simple safer version of VLOOKUP and INDEX MATCH
The majority of Excel users use VLOOKUP to get exact matches from a row or column, It would be good if they all used INDEX MATCH but it's more complex so how about this simpler version for the masses.
=GETMATCH(ReferenceToMatch, MatchColumn/Row, GetColumn/Row, Optional Value if no match)
If seen a few other references to updating VLOOKUPS and there are lots of opportunities, but it would be good to address the simple most widely used issue first.
Thanks for the suggestion Wyn! Thanks also to all the contributors to this thread for the thoughtful discussion and debate.
Of the many lookup function requests, this is currently one of the highest ranking on UserVoice. Please keep the votes rolling in if you’d like to see this, it helps us prioritize new feature work.
JoeMcD [MS XL]
Tangent first. I agree about the annoyance of Excel STUPIDLY adding the worksheet name to cell references in the same worksheet as the formula. If you're in SheetX!X99, start typing a formula, select a range in another worksheet, then select a range in SheetX, Excel should be smart enough to understand that it doesn't need the worksheet qualifier. At the very least there should be an option to drop the unnecessary worksheet qualifier.
MATCHIFS? What would it be? Would it return the topmost/leftmost row/column index of the first instance in which all criteria were satisfied? Or would it be a generalized lookup? The former might be interesting, but would always have to use linear search, though it could also use short-circuit evaluation, so if the Nth item in range1 didn't satisfy criteria1, the corresponding items in the other ranges wouldn't need to be checked. Yet another way to perform lookups is problematic, and it gets back to something like the FILTER function in Google Sheets. For me, as general as possible beats all alternatives.
Good suggestion Corey, a MATCHIFS could be a good option. I would prefer the ability to select the lookup value first as creating the formula flows better and avoids that Sheet1! annoyance when you jump back to a sheet while creating a formula. However, I can see the logic in consistency with SUMIFS and COUNTIFS
Corey Becker commented
I think it would make more sense to keep in consistent with SUMIFS/COUNTIFS and allow multiple criteria.
=GETMATCH(return_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...)
In that case, my sort arrow suggestion would have complemented your suggestion perfectly… if I didn't delete it. -_-' It would allow for fast searches on sorted ranges without putting any burden on the formula user.
I would really like a simple and safe formula that does not require a match type. I.e exact match only. Speed is not an issue for 90% of users, simplicity is. I agree another formula similar to GetMatch with a match type would be useful but that's not what I'm proposing or voting for here
The dynamic match_type is meant to overcome the limitations of using a constant match_type. Assuming a constant match_type, MATCH(,,0) is slow even on sorted arrays while MATCH(,,±1) relies on the lookup array always being sorted. The dynamic match_type solves this by always using the fastest search method that still returns the correct result (i.e. ±1 for sorted lookup arrays and 0 otherwise).
As long as you are using helper cells, the dynamic match_type only needs to be calculated once per lookup column, not once per MATCH. Thus a dynamic match_type only costs us 2(N − 1) comparisons, or about 4 average-case exact-match searches, for any number of MATCHes. BTW, I updated the file that I linked.
For anyone thinking that I am getting off-topic, my current stance is that GETMATCH should have a match_type. Without a match_type, GETMATCH would be forced to always use a linear search, which is not the fastest search method for sorted arrays. If GETMATCH has a match_type, then we can provide a dynamic match_type to speed up exact-match searches on sorted lookup arrays without relying on the lookup array always being sorted.
Your dynamic sort state defeats any benefits from binary search.
For unsorted matching, all MATCH does is iterate through its 2nd arg (which should be a 1D list) comparing it to its 1st arg, stops when it finds a match, and returns the match's index. If no match is found, it returns #N/A.
For ascending sorted matching, all MATCH does is check that its 1st arg is >= the first item in its 2nd arg (if not it returns #N/A quickly), then checks if its 1st arg is > than the last item in its 2nd arg (if so, returns the last item's index quickly). If still going, MATCH checks its 1st arg against the middle item in its 2nd arg list, and effectively replaces the whole list with the portion within which its 1st arg falls. It repeats this last bit until it finds an exact match or until item K is < 1st arg and item K+1 > 1st arg. Descending sorted matching works the opposite of this.
With N items in the list, unsorted matching involves no more than N comparisons, and sorted matching involves no more than INT(LOG(N,2))+2 comparisons. Your dynamic matching would involve 2(N-1) comparisons before the matching would begin. How could that aid efficiency?
I'm an idiot. My main argument for not using match types was that if you use, say, MATCH(,,1), then you are assuming that the lookup column will always be in ascending order. However, I never thought of making the match_type dynamic to adapt to the sort state of the lookup array. MATCH(,,±1) is fine. The way that I was using it is not. Now that I realize this, I see that my sort arrow suggestion was unnecessary, so I deleted it.
To set up a dynamic match_type, see the file linked below:
Hopefully the default match_type for GETMATCH is the exact match, since it does not require a sorted lookup array.
Maybe for you most large tables have filters. Not for me. One of the models I work with several times a day has a 2000+ row table in a hidden worksheet. Filters in hidden worksheets would be kinda pointless. There are over 6,000 lookup formulas into this table in other worksheets. I want that table sorted, and I DON'T want lookup functions which would only use binary search when there was an autofilter showing it was sorted.
You're talking about interrelating various pieces of functionality. I'd prefer keeping separate pieces of functionality as orthogonal as possible.
Also, tables can include columns with formulas. Autofiltered tables can be sorted on columns containing formulas, and the autofilter arrow will show sorted. However, those formulas could recalculate and change values, rendering the column no longer sorted, but the autofilter would still show it sorted. Point: relying on autofilter sorted state isn't robust. I've seen too many BIG MISTAKES arising from small assumptions in spreadsheets to be comfortable relying on any assumptions.
My Tables comment refers to looking up using columns other than the first one, and if those other columns were sorted, MATCH could use binary search. Your whole first paragraph in you previous comment showed you didn't understand that this was the situation to which I was referring.
Tables that are big enough for the binary search to save significant time usually have filters on. At some point, before you do the binary search, you need to sort the lookup range. What I am suggesting is that this sorted state is stored as the sort arrow, and if an edit breaks the sort, then remove the arrow.
This way, whenever the sort arrow is there, you know that the range is sorted. Then you don't have to check if the list is sorted by comparing each item with the next. You instead check the sort state by looking if there is a sort arrow, which takes constant time. The sort state wouldn't even use additional memory, since Excel already has a sort arrow, but it currently behaves more uselessly from what I'm suggesting.
I hope that that clears up my side of the argument. Also, here is a picture of the sort arrow:
Lastly, I'm not sure why you mentioned "Tables can be sorted on columns other than the first one.". I never made that assumption, but VLOOKUP does.
MATCH with 3rd argument +/- 1 is meant for sorted ranges, and that's what I meant. Tables can be sorted on columns other than the first one. Also, I understand AND EXPECT +/- 1 to be used only when one KNOWS lists are sorted.
LOOKUP always assumes sorted lists. VLOOKUP and HLOOKUP allow for tables sorted on first column or row, respectively, in ascending order or unordered, but not sorted in descending order. MATCH allows ascending, descending or unordered, and that's part of its strength compared to the lookup functions.
The reason for indicating sorted or unsorted is the algorithm the lookup functions and MATCH use. If you know you have sorted lists, those functions can use binary search, which is O(log(N)). For unsorted lists, those functions have to use linear search, which is O(N). I've dealt with workbooks with lots of unnecessary linear search; it can really slow down recalc.
Finally, checking whether a list is sorted is O(N). Meaning NOTHING GAINED checking whether a list is sorted in order to use binary search. You should only use the binary search variants when you KNOW lists are sorted. That is, there's no sensible alternative to KNOWING whether a list is sorted, then TELLING Excel to treat it as sorted for lookups.
Harlan, if you can think of a use for MATCH(,,±1) in unsorted data, please tell me. Otherwise, we can assume that MATCH(,,±1) is only useful in sorted data, which is its greatest weakness. Why should the user of MATCH(,,±1) have to assume that the data will always be sorted? Even if you add sort detection to your MATCH(,,±1) formula, you still have to manually sort the data if it is not sorted (assuming no UDFs). This is why the match types, as we know them, should not be implemented in GETMATCH.
Of course, one main use for MATCH(,,1) is to get the position of the highest match that is ≤ our lookup value. MATCH(,,-1) is similar. If the match types behaved this way but without the requirement for sorted data, then they'd be OK in GETMATCH.
Another reason why you might want GETMATCH to have match types is to speed up searches in sorted data. I think that lookup functions should switch to binary search if they detect that the column is sorted, rather than be told which search method to use. See the link below:
Totally agree Vlookuo in 2010 version doesn't work as updates and patches are applied
Vlookup alternative for neophytes to match up datat from one spreadsheet to another based on a common data link. End-users scream at us that Vlookup is impossible unless they are programmers. I agree Vlookup is just way too complicated.
Paul K Dick commented
Of the GETMATCH proposal, I especially like the last paramater (value if not found). I think its a pain to have to code an IFERROR with each lookup I do. I must admit that the IFERROR is a **BIG** improvement to ISERROR.
VLOOKUP (and HLOOKUP) still work well for threshold lookups on sorted tables, though INDEX+MATCH also works for that. Indeed, since MATCH can handle ascending and descending orders, INDEX+MATCH is more flexible for that too.
OK, find, time to deprecate the lookup functions, but no need for GETMATCH without providing at least the flexibility of INDEX(...,MATCH(...,...,1)), INDEX(...,MATCH(...,...,0)) and INDEX(...,MATCH(...,...,-1)).
For me, the absolute ideal would be structured references from Excel, regular expressions from LibreOffice Calc, and the FILTER function from Google Sheets. Maybe also the resurrection of SQL.REQUEST and SQL queries against tables.
I should point out that GETMATCH and FILTER don't entirely cover the same scope. GETMATCH is for simple lookups, but for the more complicated stuff, we need FILTER. I'd be happy enough if we had both. Anything besides VLOOKUP.
GETMATCH being the only function aside from IFERROR to provide an alternative return value doesn't make sense. If all the lookup functions did so, much better.
Excel is becoming as encrusted as PL/I back in the bad old days of the early 1970s, just before C and its descendants took over. At the very least, it's time to deprecate the Lotus 1-2-3-like database functions and the rather pointless *IF/*IFS functions in favor of the truly more useful and not difficult to explain FILTER function.
While variants of formulas can serve experienced users brilliantly we still need basic safe and simple formulas for the general Excel user population.
That's who my suggestion is aimed at. Keep it simple with one formula with one purpose.
Simple to use, simple to review / audit
Furthering Kenneth's point, FILTER would eliminate the need for
and very likely over half of actual use of LARGE, SMALL and ROW.
As for Wyn's GETMATCH, just add a 4th argument to LOOKUP to let it do exact and threshold matching. Make that optional 4th arge 1/0/-1 like MATCH, so the lookup array/range could be sorted in either order.