Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

I've been looking at some data in CSV files that unfortunately had commas within quotes, signifying that those commas should not be used as field delimiters. Since I was doing this on macOS, I've come across an issue with the macOS awk command in that it doesn't support the FPAT feature of gawk which makes it easy to split data by content using regex. Then I came across this article that really helped with my use case.

What I wanted to do was turn a CSV file that had data like this:
 Input CSV File
1,2,"3,4",5
1,2,3,4,5


...into something more usable for me like this:
 Output PSV File
1|2|3,4|5
1|2|3|4|5


Using the last example in the article I've linked to above I came up with a script to convert the CSV file (with commas in quotes) to a pipe separated file (PSV). With a bit of Bash and Awk, the code looks like this:
 Bash and Awk
INPUT_CSV_FILE="input_file.csv"
OUTPUT_PSV_FILE="output_file.txt"
awk 'BEGIN { l=0 }
{
c=0
$0=$0","
while($0) {
match($0,/ *"[^"]*" *,|[^,]*,/)
f=substr($0,RSTART,RLENGTH)
gsub(/^ *"?|"? *,$/,"",f)
c++
if (c == 1 && l > 0) printf "\n"
if (c > 1) printf "|"
printf("%s", f)
l++
$0=substr($0,RLENGTH+1)
}
}' "$INPUT_CSV_FILE" > "$OUTPUT_PSV_FILE"


It's a bit ugly, but it works.



-i

A quick disclaimer...

Although I put in a great effort into researching all the topics I cover, mistakes can happen. Use of any information from my blog posts should be at own risk and I do not hold any liability towards any information misuse or damages caused by following any of my posts.

All content and opinions expressed on this Blog are my own and do not represent the opinions of my employer (Oracle). Use of any information contained in this blog post/article is subject to this disclaimer.
Hi! You can search my blog here ⤵
NOTE: (2022) This Blog is no longer maintained and I will not be answering any emails or comments.

I am now focusing on Atari Gamer.