Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

I've been looking at some data in CSV files that unfortunately had commas within quotes, signifying that those commas should not be used as field delimiters. Since I was doing this on macOS, I've come across an issue with the macOS awk command in that it doesn't support the FPAT feature of gawk which makes it easy to split data by content using regex. Then I came across this article that really helped with my use case.

What I wanted to do was turn a CSV file that had data like this:
 Input CSV File
1,2,"3,4",5
1,2,3,4,5


...into something more usable for me like this:
 Output PSV File
1|2|3,4|5
1|2|3|4|5


Using the last example in the article I've linked to above I came up with a script to convert the CSV file (with commas in quotes) to a pipe separated file (PSV). With a bit of Bash and Awk, the code looks like this:
 Bash and Awk
INPUT_CSV_FILE="input_file.csv"
OUTPUT_PSV_FILE="output_file.txt"
awk 'BEGIN { l=0 }
{
c=0
$0=$0","
while($0) {
match($0,/ *"[^"]*" *,|[^,]*,/)
f=substr($0,RSTART,RLENGTH)
gsub(/^ *"?|"? *,$/,"",f)
c++
if (c == 1 && l > 0) printf "\n"
if (c > 1) printf "|"
printf("%s", f)
l++
$0=substr($0,RLENGTH+1)
}
}' "$INPUT_CSV_FILE" > "$OUTPUT_PSV_FILE"


It's a bit ugly, but it works.



-i

Did you like this post or found it useful? Considering supporting this Blog to keep its web servers running, any amount helps! Thanks!
Have comments or feedback on what I wrote? Please share them below!
comments powered by Disqus
Other posts you may like...