Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

I've been looking at some data in CSV files that unfortunately had commas within quotes, signifying that those commas should not be used as field delimiters. Since I was doing this on macOS, I've come across an issue with the macOS awk command in that it doesn't support the FPAT feature of gawk which makes it easy to split data by content using regex. Then I came across this article that really helped with my use case.

What I wanted to do was turn a CSV file that had data like this:
 Input CSV File
1,2,"3,4",5
1,2,3,4,5


...into something more usable for me like this:
 Output PSV File
1|2|3,4|5
1|2|3|4|5


Using the last example in the article I've linked to above I came up with a script to convert the CSV file (with commas in quotes) to a pipe separated file (PSV). With a bit of Bash and Awk, the code looks like this:
 Bash and Awk
INPUT_CSV_FILE="input_file.csv"
OUTPUT_PSV_FILE="output_file.txt"
awk 'BEGIN { l=0 }
{
c=0
$0=$0","
while($0) {
match($0,/ *"[^"]*" *,|[^,]*,/)
f=substr($0,RSTART,RLENGTH)
gsub(/^ *"?|"? *,$/,"",f)
c++
if (c == 1 && l > 0) printf "\n"
if (c > 1) printf "|"
printf("%s", f)
l++
$0=substr($0,RLENGTH+1)
}
}' "$INPUT_CSV_FILE" > "$OUTPUT_PSV_FILE"


It's a bit ugly, but it works.



-i

Hope you found this post useful...

...so please read on! I love writing articles that provide beneficial information, tips and examples to my readers. All information on my blog is provided free of charge and I encourage you to share it as you wish. There is a small favour I ask in return however - engage in comments below, provide feedback, and if you see mistakes let me know.

If you want to show additional support and help me pay for web hosting and domain name registration, donations, no matter how small, are always welcome!

Use of any information contained in this blog post/article is subject to this disclaimer.
comments powered by Disqus
Other posts you may like...