Convert spreadsheet (Excel) to *.csv during import

I have long lists of data in spreadsheets and have the problem that Mathematica crashes regularly during import of the .xlsx files. Is it possible within Mathematica to take a certain tab of a .xlsx file, convert it in Mathematica to .csv and then import the .csv file to work with it in Mathematica?

I don’t want to convert the .xlsx files to .csv manually before the import, since this costs a lot of time.

=================

  

 

Can you describe the crash cause? Might be more useful in the long run.
– Yves Klett
Jun 26 ’15 at 7:03

1

 

I’ve seen you have accepted my answer. Would you mind to let us know what you ended up to do and if that worked? I think this kind of feedback is what makes the site valuable and keeps people motivitated to answer questions. It is also encouraged to answer own questions in such cases describing in more detail what you did. My answer is kind of vague and probably only of limited value, so I would even be happy if you’d accept your own answer instead of mine 🙂
– Albert Retey
Jun 26 ’15 at 17:54

=================

1 Answer
1

=================

The simplest thing to try would be to just import one sheet at a time. Do you see crashes even when only importing one sheet at a time?

If that still crashes I see several things you could try and am not sure which would work best for you:

Use .NET/Link and COM

You could use .NET/Link to access excel via COM if you are on Windows to do what you want. But that would mean to know/learn some of the object model and methods of the Excel COM interface. There are some examples in the documentation for .NET/Link to get you started, and of course this site has also at least one example here. That way you’d not necessarily have to export to csv but most probably can receive the content of one sheet (or parts of that) directly.

Use external (command line) programs

Alternatively you could search command line programs which can extract data per worksheet from xlsx files. You could then run them from within Mathematica with Run or similar functions and import the results. On Windows this would include solutions which use the windows scripting host to run excel from the command line, one example of how to do that is here (you’ll find plenty when searching)…

Unpack XLSX as ZIP

.xlsx format is actually a zip archive of some files, where the data is in xml files. So you could try to extract the archive and import the data from those xml files. Here is a very simplified approach which would import the values of the first worksheet in an xls file with just numbers in it:

Cases[Import[“file.xlsx”, {“ZIP”,”xl\\worksheets\\sheet1.xml”}],
XMLElement[“v”, _, v_] :> ToExpression[v], Infinity]

to preserve the table structure and handle other data than pure numbers you’d have to put in some extra effort to analyze the symbolic XML-expression…

  

 

An .NET example can be found here.
– Sjoerd C. de Vries
Jun 26 ’15 at 15:03

  

 

@SjoerdC.deVries: I added that link to my answer, thanks for the hint, I was too lazy to do the obvious and search this site…
– Albert Retey
Jun 26 ’15 at 17:50

  

 

Thank you! It was very helpful
– Saesun Kim
May 6 at 15:45