Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

| Views: 40
I've been looking at some data in CSV files that unfortunately had commas within quotes, signifying that those commas should not be used as field delimiters. Since I was doing this on macOS, I've come across an issue with the macOS awk command in that it doesn't support the FPAT feature of gawk which makes it easy to split data by content using regex. Then I came across this article that really helped with my use case.

What I wanted to do was turn a CSV file that had data like this:
 Input CSV File
1,2,"3,4",5
1,2,3,4,5


...into something more usable for me like this:
 Output PSV File
1|2|3,4|5
1|2|3|4|5


Using the last example in the article I've linked to above I came up with a script to convert the CSV file (with commas in quotes) to a pipe separated file (PSV). With a bit of Bash and Awk, the code looks like this:
 Bash and Awk
INPUT_CSV_FILE="input_file.csv"
OUTPUT_PSV_FILE="output_file.txt"
awk 'BEGIN { l=0 }
{
c=0
$0=$0","
while($0) {
match($0,/ *"[^"]*" *,|[^,]*,/)
f=substr($0,RSTART,RLENGTH)
gsub(/^ *"?|"? *,$/,"",f)
c++
if (c == 1 && l > 0) printf "\n"
if (c > 1) printf "|"
printf("%s", f)
l++
$0=substr($0,RLENGTH+1)
}
}' "$INPUT_CSV_FILE" > "$OUTPUT_PSV_FILE"


It's a bit ugly, but it works.



-i

Have comments or feedback on what I wrote? Please share them below! Found this useful? Consider sending me a small tip.
comments powered by Disqus
Other posts you may like...