Igor's Blog

One of my tasks recently has been to analyse erroneous duplicate data in our production system and to come up with a number of data fixes that could be executed using simple SQL update statements. The sanitised data was provided to me as an Excel spreadsheet containing just enough information to work out what should be done with the duplicate, but not enough to identify who the data belongs to. This meant that I could work on it outside of the production environment, but it also meant that I needed to provide a way for the production operations team to reproduce what I did on real data and with relative ease.

After some thinking I've decided to implement my solution in Excel, in fact, in the very same spreadsheet that was given to me. I didn't use any external scripts, nor any macros. Everything was done using some clever column positioning and standard Excel formulas.

My input data looked something like this (not what was given to me for obvious reasons):

Without going into details of the data, or the analysis of the data fixes, lets assume that the ID column in conjunction with the version and status columns is the primary key for each row (as well as in the database). What I wanted to produce is an update SQL statement for certain rows that I identified as needing fixing.

I wanted the update statement to look something like this:
update user.table1 set status='XXX', version=99 where id='YYY' and version=ZZZ and status='AAA';
update user.table1 set status='XXX' where id='YYY' and version=ZZZ and status='AAA';
update user.table1 set version=99 where id='YYY' and version=ZZZ and status='AAA';

To break this down, I wanted to set a new status and/or version for an existing record, with these rules:
  • Update can update the status OR the version
  • Update can update BOTH status and version
  • Only some rows were to be updated (as marked)

To aid in this, I added three new columns to my spreadsheet: NeedUpdate, NewVersion, NewStatus. The NeedUpdate column would designate whether an update statement should be generated for that row, the two other columns contained new values to set in the update statement.

The resulting spreadsheet took on this form after my analysis (example values only, not representative of actual values):

Now I was ready to implement the SQL generator. I did this in a new worksheet. The original worksheet was called 'Export Worksheet', which is important for the formulas shown below.

Before writing the formulas, I broke down the update SQL statement into the following parts (one column per part):
  • Update clause - 'update user.table1 set'
  • 1st value set e.g. version=99
  • Set list comma separator (if both status and version are being set)
  • 2nd value set e.g. status='AAA'
  • Where clause
  • Where clause conditions

Looking back on this, the where clause and it's conditions could have been combined.

Now for each of the cell formulas, I did a check to see whether that row was to be updated e.g. if the F2 cell in my data worksheet had the value 'y'. If it did, the cell would either get a value from the formula or further conditional processing was applied by combining IF statements.

The set list comma separator was a special case, it could only appear if both the status and version columns were being updated, for this I used the IF function in combination with the ISBLANK function to check whether values in the data workbook were set that corresponded to these variables.

The formulas ended up looking like this (split over multiple lines for readability here):
 A1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
"update user.table1 set",
 B1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
IF(ISBLANK('Export Worksheet'!G2),
CONCATENATE("version=", 'Export Worksheet'!G2)),
 C1 cell
=IF(ISBLANK('Export Worksheet'!H2),
IF(ISBLANK('Export Worksheet'!G2),
 D1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
IF(ISBLANK('Export Worksheet'!H2),
CONCATENATE("status='", 'Export Worksheet'!H2,"'")),
 E1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
 F1 cell
=IF(EXACT('Export Worksheet'!F2, "y"),
"id='", 'Export Worksheet'!A2,
"' and version=", 'Export Worksheet'!C2,
" and status='", 'Export Worksheet'!D2, "';"),

This produced output like this:

Generating all of the SQL was a matter of selecting the first row and dragging down. Any rows from the data worksheet that needed updates had SQL generated, others just got a blank line.

At this point it was a matter of copy/pasting the worksheet into SQL Developer to execute the queries.

This is nice and simple and doesn't require any shell scripts or using the SQL Loader. This also means that the spreadsheet is completely reusable, just paste new data values into it and the SQL is automatically generated for you!


, ,
About — I'm an enthusiastic software engineer and consultant interested in many fields including J2EE, programming, electronics, 3D printing, video games, wood working and gardening.
See my Resume for more information.
The views expressed in this blog are my own and not those of my employer.
comments powered by Disqus
My other posts you may like...
Programming, DIY, Games, Hacks, Tech and more.
Follow me on...
Current and Past Projects
See my Resume


RSS Feed

My Other Web Sites

Igor and Elise's Travels
Riverside Expressway Cam
StrFunc() Online
300 George St Blogumentary
Guru JSON-RPC Tester
Extrudifier Object Designer

Recent Blog Posts

Binding extra variables with PHP PDO returns no results

New Nintendo 3DS XL, R4i SDHC RTS Lite and DSDoom

Older versions of JD-GUI crashing on macOS Sierra

How does Seeed Fusion stack up as a low budget PCB fab

Where to find the last item (Robbie's lost lyrics) in GravityFalls Lake level

Detecting print requests to print contents of a DIV with JavaScript

Using Google DFP with AdSense on responsive pages

How to change the email subject line in Gmail

Force PHP GD library to load JPEG images that have invalid data

How to uninstall Samsung Smart Switch from your Mac

Recent Galleries

Space Food - Chocolate Ice Cream with Chocolate Chips

Legeod Star Wars AT-DP kit

DIY spare parts computer build with a RAIDMAX Anura case

Fake 'Lepin' brand Lego packaging

Hardwood garden bench with clear resin void filler

Fixing a 3D printer extruder that stopped heating up

Easily increase disk space in a Lenovo Ideapad 100S 14" laptop with an M.2 SSD

Making a multi-piece 3D printed solder spool holder stand

DIY indoor apartment grow light wiring

Good Friday Electronics fun Easter Bunny LED PCB Kit IBEABU-01.0

Top Categories

Blogs I follow

Matt Moores Blog
Georgi's FlatPress Guide
Perplexing Permutations
The Security Sleuth


RAWS Parts Online
Alpha Dimensions Hosting
Kristensen Photography
Ilia Rogatchevski
Travelling Fairy

Blog Activity

Blog Activity
Follow me on... 
...or subscribe for updates!

Don't show this again