Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

NOTE: This article is 3 years or older so its information may no longer be relevant. Read on at your own discretion! Comments for this article have automatically been locked, refer to the FAQ for more details.
I've been looking at some data in CSV files that unfortunately had commas within quotes, signifying that those commas should not be used as field delimiters. Since I was doing this on macOS, I've come across an issue with the macOS awk command in that it doesn't support the FPAT feature of gawk which makes it easy to split data by content using regex. Then I came across this article that really helped with my use case.

What I wanted to do was turn a CSV file that had data like this:
 Input CSV File
1,2,"3,4",5
1,2,3,4,5


...into something more usable for me like this:
 Output PSV File
1|2|3,4|5
1|2|3|4|5


Using the last example in the article I've linked to above I came up with a script to convert the CSV file (with commas in quotes) to a pipe separated file (PSV). With a bit of Bash and Awk, the code looks like this:
 Bash and Awk
INPUT_CSV_FILE="input_file.csv"
OUTPUT_PSV_FILE="output_file.txt"
awk 'BEGIN { l=0 }
{
c=0
$0=$0","
while($0) {
match($0,/ *"[^"]*" *,|[^,]*,/)
f=substr($0,RSTART,RLENGTH)
gsub(/^ *"?|"? *,$/,"",f)
c++
if (c == 1 && l > 0) printf "\n"
if (c > 1) printf "|"
printf("%s", f)
l++
$0=substr($0,RLENGTH+1)
}
}' "$INPUT_CSV_FILE" > "$OUTPUT_PSV_FILE"


It's a bit ugly, but it works.



-i

Skip down to comments...
A quick disclaimer...

Although I put in a great effort into researching all the topics I cover, mistakes can happen. If you spot something out of place, please do let me know.

All content and opinions expressed on this Blog are my own and do not represent the opinions of my employer (Oracle). Use of any information contained in this blog post/article is subject to this disclaimer.
 
comments powered by Disqus
Other posts you may like...