Monday, June 09, 2008

Quick Groovy Scripting

Recently I have to port some data from mainframe database to SQL based db for testing purposes. I have started with some text report files generated from mainframe. I have fond of using unix awk, grep for these kind of data munging. Also used perl and ruby for some scripting activities in the past. But given that I had to do this on windows and also with fading knowledge of perl, thought of getting in donw with groovy. Since eclipse also supports groovy it became easy to start with.

I got something running which spits SQL statements (using println) for every line of the input. Sooner my eclipse console started eating the output because of the buffer size for console display I had in my settings! Though I had the huge monolithic script which works fine, I cannot able to get the output in single shot. I had to rerun them in parts to get the final collective output. This slowed me on tweaking the final script. Given we didn't have much re-factoring support in eclipse, I couldn't either easily extract them as functions as I could in Java. But I am able to use a more powerful tool i.e. define a closure immediately and redirect the inputs to the println statements to a File without much changes to the original script.
println "insert into table_name (col1, col2, col3) into values (${col1},'${col2}', ${col3})"

def file = new File( "C:\output.txt")

def println = { line ->  file.append(line)}

Just adding these two line saved me a lot of time, also now I can switch back to see the output in command line or to capture them in a file very easily.

Other things that helped me to get things done quickly are the ability to refer the variables inside the string directly as " '${col2}'". This is especially useful where I have to qualify the column of string data type with quotes, otherwise for which I have to use endless escaping and + con-catenations!

Also for the next script I did, I started writing in small classes than single file, so made things easier to change at last minute.  Another gotcha for beginner for the groovy script is the use of '=='. Remember in groovy use of '==' is actually converted to this.equals(that) before the execution. I ran into endless self-recursive calls as I used the == for reference comparison as we do in Java.

As I got the script completed there were lot of duplicate SQL statements in the output. As we get errors due to integrity constraints in database, I have to find some way to remove duplicate statements. In unix, I normally use `uniq` to get this done. Since I have to get that done quickly, i just looped thru the output file and added each line to the Set and dumped it back out to remove the duplicates.

Being used Perl, and Ruby in the past I know the libraries support in perl or ruby are far huge when compared to groovy. But the single fact that I have used to Java in past years and have to work with windows, Groovy was a life saver!

N.B. No data conversion is possible without effective use of Regular Expressions. I did used regular expressions to format the input files before running groovy scripts against them. I used Textpad to do  find/replace with regular expressions. The regular expression support in eclipse editor find/replace tool still needs improvements before could it could be really useful.

Recommended Blog Posts