OK, long shot, but... might be worth it...
"We found a problem with some content... " ... "The workbook cannot be opened or repaired by Microsoft Excel because it is corrupt." Sometimes flashes up 10-15 repair attempts before failing.
I have no idea what Excel thinks is corrupt. There are no macros. The content is primarily one pretty big table that goes out to lookups from other tables (lots of XLOOKUP, INDEX/MATCH) to bring back values and do some mild maths. Quite a lot of error-catching, IFERROR and suchlike to limit the inevitably divisions by zero. A couple of pivot tables so I can do quick error checking but not much.
What's weird is...
The files are, by my standards, BIG. As xlsx they are about 500Mb, as xlsb about 330Mb. The big table is about 500K lines deep (but not very wide.) This problem began to appear a few weeks back and I had a couple of workarounds. Clearing the Excel cache seemed to work but now doesn't. Working on the logic while the big table was empty and then dropping in the data, saving an xlsb, xlsx version, that worked.
I'm in trouble now though because the ~50Mb "empty" version is throwing the same error so my workaround isn't working.
The files can be opened on Apple Macs, and they can be read by Tableau. So as I say, I don't think they're actually corrupt. But I don't have a clue how to proceed... of course, I have a few older versions, and they open, but my whole working method is kinda unreliable now.
- Any thoughts what might be the problem ? Functions to avoid, things to look out for ?
- The empty version seems kinda bloated to me. There's really only a few tables in it; less than 12, none of them are deeper than 1,000 lines. Is there hidden cruft I should be trying to trim off ?
- Best of all - how to open/recover my files and have a working method that... works...
Yep, we're trying to move the whole thing to a Python environment sometime. Great, but I'm a subject matter person, not a coding/IT person, and I need to progress the job.
Thanks in advance...
Y
You may have thousands of empty lines at the bottom of your sheets or columns to right - happens when you paste stuff in sometimes.
Go to 1st blank row at bottom of sheet then press CTRL-SHIFT-DOWN ARROW and hit delete.
Go to 1st blank column on right of sheet then press CTRL - SHIFT - RIGHT ARROW and hit delete
Then save as a new file name and see what happens.
My god, what have you done?
I suggest splitting the spreadsheet up in a way that hopefully jettisons any bad state inside the data structures behind the spreadsheet that is probably causing the bug. Wouldn’t surprise me if it’s related to saving a giant spreadsheet over and over.
1. Copy/paste the data table(s) into a clean spreadsheet ideally using “Paste as Values” to leave anything else behind. Save as “data.xlsx” or whatever. Or you could export it as a .csv file then import it in to a clean spreadsheet.
2. Create a second empty spreadsheet in the same folder called “logic.xlsx” or whatever. Build your formulae here and when selecting cell ranges, switch over to the window for “data.xlsx” and select them. Excel will reference the other spreadsheet.
Hopefully both will save fine, and you can keep saving logic.xlsx without churning the save on the big data file. If one does fail to save, you’ve narrowed it down.
> Yep, we're trying to move the whole thing to a Python environment
This is the way.
> Great, but I'm a subject matter person, not a coding/IT person, and I need to progress the job.
Well I hope my cargo cult fixes work but I think the universe is trying to tell you to learn Python!
Does it open in libreoffice?
One thing that will grossly inflate Excel file sizes is range names linked to other files, these can be produced accidentally if importing data from other files. Have a look in the list (Formulas > Name Manager) and delete those that aren’t needed. Problems with largish files can also occur if Excel is set up as 32 bit instead of 64 bit, which may explain your PC vs Mac difference?
Ooh interesting ideas to try. Thanks folks ! I'll report back...
Save multiple copies including one as a .csv
.csv is a text file version that everything can open.
It means you have a safe back up
csv won't preserve much other than the data
If Wintertree's idea to copy the data values into a clean spreadsheet doesn't work, is there any mileage in turning of to auto calculate option?
In the formulas tab in the ribbon, in "Calculation Options" you can select select "Manual" to stop Excel from automatically calculating formulas.
I created a quick table of random data, 3 cells wide by 500k long, and a fourth column just averaging the three values, and it's 35mb.
As a csv it's 25MB so 35mb is probably reasonable for a table of that size.
Not sure how you're getting to 500mb. I can only assume it has lot's of varying cell formatting from all the drag downs you've been doing. I'd be surprised if values only copied to a clean workbook didn't make a massive improvment. I wouldn't split the files however, I find excel to be quite clunky when referencing data from another spreadsheet so it may add additional problems.
> csv won't preserve much other than the data
csv will preserve all of the data. It won't preserve some of the logic, but logic is usually easy and quick to re-create once the thinking has already been done.
It sounds like the original is full of macros.
OP said there are no macros
The vlookups are in all but name. They sound pretty complicated.
Is it saved locally or in Sharepoint?
Edit: It took all my willpower not to ask whether you've tried turning it off and back on again
It's locally saved. It doesn't look up anything external, all the lookups are in tables in other tabs.
It has no complex or conditional formatting (no need, and I'm aware complex formatting tends to mess stuff up.) Just the table format that comes when you select table.
No macros. No COUNTIFs or SUMIFs or anything that needs repetitive sweeps down a column. A couple of pivot tables so I can do quick checks.
Calculation has been set to manual for months now, it's a huge time-save. Takes about eight minutes to run.
Mostly XLOOKUPs, some two-way INDEX MATCH. No VLOOKUPs or HLOOKUPs they're obsolete.
64 bit setup.
It's got ~500K lines because that's the (compressed) minimum size of data for what I'm doing. Width, about 60 columns of which the first twenty are flat input data, the last 40 are the value-add.
I don't need the data, I've got that because I've been trying to follow good practices throughout. Looks like I'll have to do a clean rebuild, which might or might not be a major faff rebuilding/relinking all the lookup references. Now the thing is a bit more mature, it shouldn't need a thousand more saves (is that where the problem's arising ?)
If anyone's got any other "good practice for big Excel", gratefully received ! It turns out I'm the power user in my organisation (slightly surprising/alarming), the MS SARA recovery app is a complete waste of time, and that despite the huge university of training courses we are supposed to learn from, there's not much of actual relevance...
Y
Really, I know you don't want to hear this but... have you any idea how straightforward and robust that application would be in Access? (Other relational databases are available, but Access probably has the simplest interface.) Excel was designed to do one job - financial modelling. The fact that it has been abused by millions of user, abetted by Microsoft, to handle large volumes of data doesn't make it right.
I think the amount of time spent by people trying to use Excel for something it was never designed to do, and the resulting frustration, time wasting and resulting errors, is beyond decent contemplation.
> It's locally saved. It doesn't look up anything external, all the lookups are in tables in other tabs.
Try saving it in Sharepoint/ Onedrive (or similar) and sharing with someone else to see if they get the same issues?
> Really, I know you don't want to hear this but... have you any idea how straightforward and robust that application would be in Access? (Other relational databases are available, but Access probably has the simplest interface.) Excel was designed to do one job - financial modelling. The fact that it has been abused by millions of user, abetted by Microsoft, to handle large volumes of data doesn't make it right.
> I think the amount of time spent by people trying to use Excel for something it was never designed to do, and the resulting frustration, time wasting and resulting errors, is beyond decent contemplation.
Like this:
No idea. But it's a worthwhile thought, thanks. It feels like the entry barrier is lower and the skill more valuable (more widely applicable) than Python. I'll look into it... it's not a thought I'd have ever had for myself...
Designing the tables is key, but interestingly neither complicated or technical - 'inspired common sense' it has been called. If you need some pointers, feel free to pm me - at one point I was going to write a book, working title 'Throw away that spreadsheet!' But only ever managed the first chapter.
Appreciated, thank you. I've just had an hour on how it would be implemented in Access from our internal resource... for whom we'd have to pay. But it looks a lot more accessible than I had expected, and a more valuable transferable skillset than being a mediocre Python coder...
Got some thinking to do...
Y
> at one point I was going to write a book, working title 'Throw away that spreadsheet!' But only ever managed the first chapter.
Did you start writing it in Excel and after one chapter it got too big??
OP: have you saved a copy without the pivot tables to see if that makes a difference?
> Appreciated, thank you. I've just had an hour on how it would be implemented in Access from our internal resource... for whom we'd have to pay. But it looks a lot more accessible than I had expected, and a more valuable transferable skillset than being a mediocre Python coder...
> Got some thinking to do...
> Y
Get a quote from your internal resource for the implementation, including time spent with you shadowing his work so you can take it over when he's done. This would be money well spent, and has the potential to keep your project on track.
Good luck.
Kept on adding features!