After some thinking I've decided to implement my solution in Excel, in fact, in the very same spreadsheet that was given to me. I didn't use any external scripts, nor any macros. Everything was done using some clever column positioning and standard Excel formulas.
My input data looked something like this (not what was given to me for obvious reasons):
Without going into details of the data, or the analysis of the data fixes, lets assume that the ID column in conjunction with the version and status columns is the primary key for each row (as well as in the database). What I wanted to produce is an update SQL statement for certain rows that I identified as needing fixing.
I wanted the update statement to look something like this:
To break this down, I wanted to set a new status and/or version for an existing record, with these rules:
- Update can update the status OR the version
- Update can update BOTH status and version
- Only some rows were to be updated (as marked)
To aid in this, I added three new columns to my spreadsheet: NeedUpdate, NewVersion, NewStatus. The NeedUpdate column would designate whether an update statement should be generated for that row, the two other columns contained new values to set in the update statement.
The resulting spreadsheet took on this form after my analysis (example values only, not representative of actual values):
Now I was ready to implement the SQL generator. I did this in a new worksheet. The original worksheet was called 'Export Worksheet', which is important for the formulas shown below.
Before writing the formulas, I broke down the update SQL statement into the following parts (one column per part):
- Update clause - 'update user.table1 set'
- 1st value set e.g. version=99
- Set list comma separator (if both status and version are being set)
- 2nd value set e.g. status='AAA'
- Where clause
- Where clause conditions
Looking back on this, the where clause and it's conditions could have been combined.
Now for each of the cell formulas, I did a check to see whether that row was to be updated e.g. if the F2 cell in my data worksheet had the value 'y'. If it did, the cell would either get a value from the formula or further conditional processing was applied by combining IF statements.
The set list comma separator was a special case, it could only appear if both the status and version columns were being updated, for this I used the IF function in combination with the ISBLANK function to check whether values in the data workbook were set that corresponded to these variables.
The formulas ended up looking like this (split over multiple lines for readability here):
This produced output like this:
Generating all of the SQL was a matter of selecting the first row and dragging down. Any rows from the data worksheet that needed updates had SQL generated, others just got a blank line.
At this point it was a matter of copy/pasting the worksheet into SQL Developer to execute the queries.
This is nice and simple and doesn't require any shell scripts or using the SQL Loader. This also means that the spreadsheet is completely reusable, just paste new data values into it and the SQL is automatically generated for you!