You need to parse a plain- text string or file thats in a format similar to commadelimited format, but its delimiters are some strings other than commas and newlines.
When you call a CSV::Reader method, you can specify strings to act as a row separator (the string between each Row) and a field separator (the string between each Column). You can do the same with simulated keyword arguments passed into FasterCSV.parse. This should let you parse most formats similar to the comma-delimited format:
require csv pipe_separated="1|2ENDa|bEND" CSV::Reader.parse(pipe_separated, |, END) { |r| r.each { |c| puts c } } # 1 # 2 # a # b require ubygems require faster_csv FasterCSV.parse(pipe_separated, :col_sep=>|, :row_sep=>END) do |r| r.each { |c| puts c } end # 1 # 2 # a # b
Value-delimited formats tend to differ along three axes:
Like Reader methods, Writer methods accept custom values for the field and row separators.
data = [[1,2,3],[A,B,C],[do, e,mi]] open(first3.csv, w) do |output| CSV::Writer.generate(output, :, -END-) do |writer| data.each { |x| writer << x } end end open(first3.csv) { |input| input.read() } # => "1:2:3-END-A:B:C-END-do:re:mi-END-" FasterCSV.open(first3.csv, w, :col_sep=>:, :row_sep=>-END-) do |output| data.each { |x| output << x } end open(first3.csv) { |input| input.read() } # => "1:2:3-END-A:B:C-END-do:re:mi-END-"
Its rare that youll need to override the quote character, and neither csv nor fastercsv will let you do it. Both libraries quote characters are hardcoded to the double-quote character. If you need to parse a format that has different quote character, the simplest thing to do is subclass FasterCSV and override its init_parsers method.
Change the regular expression assigned to @parsers[:csv_row], replacing all double quotes with the quote character you want. The most common alternate quote character is the single quote: to get that, youd have an init_parsers method like this:
class MyFasterCSV < FasterCSV def init_parsers(options) super @parsers[:csv_row] = / G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match (?: ((?>[^]*)(?>\[^]*)*) # find quoted fields | # … or … ([^#{Regexp.escape(@col_sep)}]*) # unquoted fields )/x end end MyFasterCSV.parse("1,2,3,4") { |r| puts r } # 1 # 2,3 # 4
Some value-delimited files are simply corrupt: they were generated by programs that didn think to escape quote marks or to quote cells with embedded delimiters. Neither csv nor fastercsv can parse these files, because they e ambiguous or invalid.
missing_quotes=%{20051002, Alice says, "I saw that!"} CSV::Reader.parse(missing_quotes) { |r| r.each { |c| puts c } } # CSV::IllegalFormatError: CSV::IllegalFormatError unescaped_quotes=%{20051002, "Alice says, "I saw that!""} FasterCSV.parse(unescaped_quotes) { |r| r.each { |c| puts c } } # FasterCSV::MalformedCSVError: Unclosed quoted field.
Your best strategy for dealing with this kind of file is to use regular expressions to massage the data into a form that fastercsv can parse, or to parse it with String#split and deal with any quoting problems afterwards. In either case, your code will have to work with the particular quirks of the data you e trying to parse.
Strings
Numbers
Date and Time
Arrays
Hashes
Files and Directories
Code Blocks and Iteration
Objects and Classes8
Modules and Namespaces
Reflection and Metaprogramming
XML and HTML
Graphics and Other File Formats
Databases and Persistence
Internet Services
Web Development Ruby on Rails
Web Services and Distributed Programming
Testing, Debugging, Optimizing, and Documenting
Packaging and Distributing Software
Automating Tasks with Rake
Multitasking and Multithreading
User Interface
Extending Ruby with Other Languages
System Administration