Igor's Blog

One of my tasks recently has been to analyse erroneous duplicate data in our production system and to come up with a number of data fixes that could be executed using simple SQL update statements. The sanitised data was provided to me as an Excel spreadsheet containing just enough information to work out what should be done with the duplicate, but not enough to identify who the data belongs to. This meant that I could work on it outside of the production environment, but it also meant that I needed to provide a way for the production operations team to reproduce what I did on real data and with relative ease.

After some thinking I've decided to implement my solution in Excel, in fact, in the very same spreadsheet that was given to me. I didn't use any external scripts, nor any macros. Everything was done using some clever column positioning and standard Excel formulas.

My input data looked something like this (not what was given to me for obvious reasons):

Without going into details of the data, or the analysis of the data fixes, lets assume that the ID column in conjunction with the version and status columns is the primary key for each row (as well as in the database). What I wanted to produce is an update SQL statement for certain rows that I identified as needing fixing.

I wanted the update statement to look something like this:
update user.table1 set status='XXX', version=99 where id='YYY' and version=ZZZ and status='AAA';
update user.table1 set status='XXX' where id='YYY' and version=ZZZ and status='AAA';
update user.table1 set version=99 where id='YYY' and version=ZZZ and status='AAA';

To break this down, I wanted to set a new status and/or version for an existing record, with these rules:
  • Update can update the status OR the version
  • Update can update BOTH status and version
  • Only some rows were to be updated (as marked)

To aid in this, I added three new columns to my spreadsheet: NeedUpdate, NewVersion, NewStatus. The NeedUpdate column would designate whether an update statement should be generated for that row, the two other columns contained new values to set in the update statement.

The resulting spreadsheet took on this form after my analysis (example values only, not representative of actual values):

Now I was ready to implement the SQL generator. I did this in a new worksheet. The original worksheet was called 'Export Worksheet', which is important for the formulas shown below.

Before writing the formulas, I broke down the update SQL statement into the following parts (one column per part):
  • Update clause - 'update user.table1 set'
  • 1st value set e.g. version=99
  • Set list comma separator (if both status and version are being set)
  • 2nd value set e.g. status='AAA'
  • Where clause
  • Where clause conditions

Looking back on this, the where clause and it's conditions could have been combined.

Now for each of the cell formulas, I did a check to see whether that row was to be updated e.g. if the F2 cell in my data worksheet had the value 'y'. If it did, the cell would either get a value from the formula or further conditional processing was applied by combining IF statements.

The set list comma separator was a special case, it could only appear if both the status and version columns were being updated, for this I used the IF function in combination with the ISBLANK function to check whether values in the data workbook were set that corresponded to these variables.

The formulas ended up looking like this (split over multiple lines for readability here):
 A1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
"update user.table1 set",
 B1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
IF(ISBLANK('Export Worksheet'!G2),
CONCATENATE("version=", 'Export Worksheet'!G2)),
 C1 cell
=IF(ISBLANK('Export Worksheet'!H2),
IF(ISBLANK('Export Worksheet'!G2),
 D1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
IF(ISBLANK('Export Worksheet'!H2),
CONCATENATE("status='", 'Export Worksheet'!H2,"'")),
 E1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
 F1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
"id='", 'Export Worksheet'!A2,
"' and version=", 'Export Worksheet'!C2,
" and status='", 'Export Worksheet'!D2, "';"),

This produced output like this:

Generating all of the SQL was a matter of selecting the first row and dragging down. Any rows from the data worksheet that needed updates had SQL generated, others just got a blank line.

At this point it was a matter of copy/pasting the worksheet into SQL Developer to execute the queries.

This is nice and simple and doesn't require any shell scripts or using the SQL Loader. This also means that the spreadsheet is completely reusable, just paste new data values into it and the SQL is automatically generated for you!


comments powered by Disqus
Other posts you may like...
Programming, DIY, Games, Hacks, Tech and more.

Recent Blog Posts

Learn How to Add Disclaimer to All Outgoing Emails in Office 365

How to securely host your web site for free with Google Cloud and CloudFlare (part 3)

How to securely host your web site for free with Google Cloud and CloudFlare (part 2)

How to securely host your web site for free with Google Cloud and CloudFlare (part 1)

Beware of the subtle difference in date format strings between MySQL and SQLite

Postman reporting Unexpected 'N' error in response body

Adding extra shelves to the Ikea Detolf display cabinet

Posting array data from a web form to a Jersey REST service

How to avoid double encoding XML entities with Apache Commons StringEscapeUtils

Add your commonly used fonts to macOS Mail font drop down

Recent Galleries

Space Food - Chocolate Ice Cream with Chocolate Chips

Legeod Star Wars AT-DP kit

DIY spare parts computer build with a RAIDMAX Anura case

Fake 'Lepin' brand Lego packaging

Hardwood garden bench with clear resin void filler

Fixing a 3D printer extruder that stopped heating up

Easily increase disk space in a Lenovo Ideapad 100S 14" laptop with an M.2 SSD

Making a multi-piece 3D printed solder spool holder stand

DIY indoor apartment grow light wiring

Good Friday Electronics fun Easter Bunny LED PCB Kit IBEABU-01.0

My Other Web Sites

Igor and Elise's Travels
Riverside Expressway Cam
300 George St Blogumentary

My Online Tools

UUID to OID Converter
Guru JSON-RPC Tester
Extrudifier Object Designer
Travel ┬ÁBlog

Blogs and Friends

Matt Moores Blog
Georgi's FlatPress Guide
Perplexing Permutations
The Security Sleuth
Ilia Rogatchevski
Travelling Fairy

Blog Activity

Blog Activity
Please follow me for more great content!
Don't show this again