Preserve source quoting and separator when saving CSV.
Currently, Excel alters CSV files to it's internal CSV format whenever the file is saved, even if no changes were made to the file. For example, if I open a CSV file like below (anonymized version of a AWS credentials file):
User Name,Access Key Id,Secret Access Key
And then I merely open the file and save it without making any changes, Excel will alter file, removing the quotes:
User Name,Access Key Id,Secret Access Key
In this particular example, the change is harmless but for more complex CSV files, simply opening them in Excel is sufficient to break other tools that expect CSV in a particular format. CSV is a poorly defined format so the only reliable way to work with it is preserving the source quoting and separator usage.
Great suggestion – CSV hasn’t been an area we’ve been looking into recently, but we’ll reconsider. It looks like this is starting to get some votes fairly quickly. If you like the idea, then please keep the votes coming! That’s the best way to help get this onto our engineering schedule.
John [MS XL]
Perhaps the best solution will be to use templates (formatted and saved as text with extension e.g. xlsxt or so) with instructions as to how to handle particular csv file, store templates in Excel templates folder.
At an attempt to open a csv Excel should check for the templates, and offer a list of them to the user. There should be a default template, as an example filled with all available options. When user opens csv for a first time (or 3 times), there should be a message that default template (in folder... ) is being used. If only default template is used, after a few times the message will not bother users anymore. Yet, everyone will learn about the feature.
Templates should indicate handling of all possible situations:
- at opening: show / hide this template warning
conversion/preserving of numbers to dates,
dropping/keeping of leading zeroes,
use of column separator ( , or. or other ASCII, who cares?),
use of decimal point in csv
preserving of ‘ as entered,
preserving spaces in csv (formatting as text),
preserving REAL formats as entered (e.g. 1.200000E-2),
- saving (to csv): saving columns with numbers (specify col.#) with leading zeroes,
formatting any real as prescribed (e.g. -1.10000E2),
use of encoding (if saved as csv)
Template should also indicate rules to save standard xlsx file as csv (appropriate dialog required).
1. Satisfy to virtually any weird formatting setup of original csv files
2. Easy to share particular csv pattern between customers / team members
3. Allow to handle differently csv files from different vendors (I do know a few of those cases).
4. Allow user’s company to write and use own template for their csv files
5. System is adaptable for future needs, if any.
Some false thought: this system, if implemented will promote all kind of weird csv formats in the world.
Answer: perhaps this concern can be dismissed because xlsx format is far more useful vs. csv. CSV is only used only when really needed. Any vendor, once csv format is chosen, will strictly maintain it – if you work with that vendor – adapt for formatting. To be done just once, quite simple, not a problem.
Far better than today’s need (in my case) to open csv with Word (set as default) then use macros to reformat it into something “unhurtable”, only then open with Excel. And reverse process to save as csv. I did it quite a few times… Time consuming… Expensive… It should be fixed…
Biggest problem for us is that we regularly use pipe characters as delimiters as comma is a permitted value in our data. There is no way to edit and save a pipe-delimited file in Excel that is simple and safe. This has been a problem forever and I'm a little bit surprised by it.
It's now July 2020. Doesn't look like you've looked into it enough:
I have an 18 digit number in a txt document. I import as comma delimited, forcing text format for the number's column. It displays all 18 digits. Good. I save as .csv and reopen. The text formatting is gone and the number is now in scientific notation with a loss of presision (the last three numbers are 000. BAD!
I start over. I enter the formula ="="""&B2&"""", which results in =”345234523454353456” being displayed.
Upon creating the formula the data is presented as =”345234523454353456” literally.
After saving as csv and reopening, it loses one level of translation and is presented as 34523452345435 without the = and “”. Passable solution for just a one-time open, maybe?
After closing and reopening again it loses that last level and reverts to scientific notation.
Solution: If you can present a currency format of $123.00 and still manipulate it as a number, you can convert ALL numbers, including over 15 digits to display as their text equivalent by default, right justified, and still manipulate them behind the scenes as numbers.
This business of losing precision when displaying large numbers and losing text formatting when saving to .csv has to stop.
Real world reason: Sometimes real estate parcel numbers are 18 digits long. We need to actually use all 18 digits, ya? Converting them to scientific notation, and/or saving them to .csv to upload to a crm or mailing company, only to have the last three digits converted to 000 is problematic to say it nicely.
Kevin Bouwman commented
We deal with this every week. We use Sage 100 erp and it expects CSV files in the format it exports them in. Excel is constantly messing with this; picking up double quotes and commas it should ignore, seeing part codes in rows with only numeric digits as a number, dropping leading zeros, when the code is alphanumeric. In general I am for all of the points being made here to make CSV easier to work with. I have lost many days worth of labor to this issue.
Another example is to have something like
in one of the cells. Upon save, it will become 2.4 and it breaks my tool that uses data from CSV and that new value is unexpected and not handled correctly (and no, I cannot adjust the tool as it compares that value to another value it gets from 3rd party over which I don't have control )
Kenneth Barber commented
This post mentions the idea of changing the export format of dates in CSV files.
Paul Whitman commented
Many analysis tools out there don't speak native XLS and only speak CSV. This is why strong CSV support is important: not because we like saving documents in CSV (I dislike it), but because we have to deal with information in CSV.
This is massive. I'd also REALLY like some save/export options, similar to dba tools (Access/SQLD/TOAD etc), define a delimiter (pipe/tab), quoted values. I'd find you, and hug you for a very long time if you could make these happen. :)