![]() System : Linux host44.registrar-servers.com 4.18.0-513.18.1.lve.2.el8.x86_64 #1 SMP Sat Mar 30 15:36:11 UTC 2024 x86_64 User : vapecompany ( 2719) PHP Version : 7.4.33 Disable Function : NONE Directory : /proc/self/root/opt/alt/ruby34/share/gems/gems/csv-3.3.2/doc/csv/recipes/ |
Upload File : |
== Recipes for Parsing \CSV These recipes are specific code examples for specific \CSV parsing tasks. For other recipes, see {Recipes for CSV}[./recipes_rdoc.html]. All code snippets on this page assume that the following has been executed: require 'csv' === Contents - {Source Formats}[#label-Source+Formats] - {Parsing from a String}[#label-Parsing+from+a+String] - {Recipe: Parse from String with Headers}[#label-Recipe-3A+Parse+from+String+with+Headers] - {Recipe: Parse from String Without Headers}[#label-Recipe-3A+Parse+from+String+Without+Headers] - {Parsing from a File}[#label-Parsing+from+a+File] - {Recipe: Parse from File with Headers}[#label-Recipe-3A+Parse+from+File+with+Headers] - {Recipe: Parse from File Without Headers}[#label-Recipe-3A+Parse+from+File+Without+Headers] - {Parsing from an IO Stream}[#label-Parsing+from+an+IO+Stream] - {Recipe: Parse from IO Stream with Headers}[#label-Recipe-3A+Parse+from+IO+Stream+with+Headers] - {Recipe: Parse from IO Stream Without Headers}[#label-Recipe-3A+Parse+from+IO+Stream+Without+Headers] - {RFC 4180 Compliance}[#label-RFC+4180+Compliance] - {Row Separator}[#label-Row+Separator] - {Recipe: Handle Compliant Row Separator}[#label-Recipe-3A+Handle+Compliant+Row+Separator] - {Recipe: Handle Non-Compliant Row Separator}[#label-Recipe-3A+Handle+Non-Compliant+Row+Separator] - {Column Separator}[#label-Column+Separator] - {Recipe: Handle Compliant Column Separator}[#label-Recipe-3A+Handle+Compliant+Column+Separator] - {Recipe: Handle Non-Compliant Column Separator}[#label-Recipe-3A+Handle+Non-Compliant+Column+Separator] - {Quote Character}[#label-Quote+Character] - {Recipe: Handle Compliant Quote Character}[#label-Recipe-3A+Handle+Compliant+Quote+Character] - {Recipe: Handle Non-Compliant Quote Character}[#label-Recipe-3A+Handle+Non-Compliant+Quote+Character] - {Recipe: Allow Liberal Parsing}[#label-Recipe-3A+Allow+Liberal+Parsing] - {Special Handling}[#label-Special+Handling] - {Special Line Handling}[#label-Special+Line+Handling] - {Recipe: Ignore Blank Lines}[#label-Recipe-3A+Ignore+Blank+Lines] - {Recipe: Ignore Selected Lines}[#label-Recipe-3A+Ignore+Selected+Lines] - {Special Field Handling}[#label-Special+Field+Handling] - {Recipe: Strip Fields}[#label-Recipe-3A+Strip+Fields] - {Recipe: Handle Null Fields}[#label-Recipe-3A+Handle+Null+Fields] - {Recipe: Handle Empty Fields}[#label-Recipe-3A+Handle+Empty+Fields] - {Converting Fields}[#label-Converting+Fields] - {Converting Fields to Objects}[#label-Converting+Fields+to+Objects] - {Recipe: Convert Fields to Integers}[#label-Recipe-3A+Convert+Fields+to+Integers] - {Recipe: Convert Fields to Floats}[#label-Recipe-3A+Convert+Fields+to+Floats] - {Recipe: Convert Fields to Numerics}[#label-Recipe-3A+Convert+Fields+to+Numerics] - {Recipe: Convert Fields to Dates}[#label-Recipe-3A+Convert+Fields+to+Dates] - {Recipe: Convert Fields to DateTimes}[#label-Recipe-3A+Convert+Fields+to+DateTimes] - {Recipe: Convert Fields to Times}[#label-Recipe-3A+Convert+Fields+to+Times] - {Recipe: Convert Assorted Fields to Objects}[#label-Recipe-3A+Convert+Assorted+Fields+to+Objects] - {Recipe: Convert Fields to Other Objects}[#label-Recipe-3A+Convert+Fields+to+Other+Objects] - {Recipe: Filter Field Strings}[#label-Recipe-3A+Filter+Field+Strings] - {Recipe: Register Field Converters}[#label-Recipe-3A+Register+Field+Converters] - {Using Multiple Field Converters}[#label-Using+Multiple+Field+Converters] - {Recipe: Specify Multiple Field Converters in Option :converters}[#label-Recipe-3A+Specify+Multiple+Field+Converters+in+Option+-3Aconverters] - {Recipe: Specify Multiple Field Converters in a Custom Converter List}[#label-Recipe-3A+Specify+Multiple+Field+Converters+in+a+Custom+Converter+List] - {Converting Headers}[#label-Converting+Headers] - {Recipe: Convert Headers to Lowercase}[#label-Recipe-3A+Convert+Headers+to+Lowercase] - {Recipe: Convert Headers to Symbols}[#label-Recipe-3A+Convert+Headers+to+Symbols] - {Recipe: Filter Header Strings}[#label-Recipe-3A+Filter+Header+Strings] - {Recipe: Register Header Converters}[#label-Recipe-3A+Register+Header+Converters] - {Using Multiple Header Converters}[#label-Using+Multiple+Header+Converters] - {Recipe: Specify Multiple Header Converters in Option :header_converters}[#label-Recipe-3A+Specify+Multiple+Header+Converters+in+Option+-3Aheader_converters] - {Recipe: Specify Multiple Header Converters in a Custom Header Converter List}[#label-Recipe-3A+Specify+Multiple+Header+Converters+in+a+Custom+Header+Converter+List] - {Diagnostics}[#label-Diagnostics] - {Recipe: Capture Unconverted Fields}[#label-Recipe-3A+Capture+Unconverted+Fields] - {Recipe: Capture Field Info}[#label-Recipe-3A+Capture+Field+Info] === Source Formats You can parse \CSV data from a \String, from a \File (via its path), or from an \IO stream. ==== Parsing from a \String You can parse \CSV data from a \String, with or without headers. ===== Recipe: Parse from \String with Headers Use class method CSV.parse with option +headers+ to read a source \String all at once (may have memory resource implications): string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" CSV.parse(string, headers: true) # => #<CSV::Table mode:col_or_row row_count:4> Use instance method CSV#each with option +headers+ to read a source \String one row at a time: CSV.new(string, headers: true).each do |row| p row end Output: #<CSV::Row "Name":"foo" "Value":"0"> #<CSV::Row "Name":"bar" "Value":"1"> #<CSV::Row "Name":"baz" "Value":"2"> ===== Recipe: Parse from \String Without Headers Use class method CSV.parse without option +headers+ to read a source \String all at once (may have memory resource implications): string = "foo,0\nbar,1\nbaz,2\n" CSV.parse(string) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] Use instance method CSV#each without option +headers+ to read a source \String one row at a time: CSV.new(string).each do |row| p row end Output: ["foo", "0"] ["bar", "1"] ["baz", "2"] ==== Parsing from a \File You can parse \CSV data from a \File, with or without headers. ===== Recipe: Parse from \File with Headers Use class method CSV.read with option +headers+ to read a file all at once: string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" path = 't.csv' File.write(path, string) CSV.read(path, headers: true) # => #<CSV::Table mode:col_or_row row_count:4> Use class method CSV.foreach with option +headers+ to read one row at a time: CSV.foreach(path, headers: true) do |row| p row end Output: #<CSV::Row "Name":"foo" "Value":"0"> #<CSV::Row "Name":"bar" "Value":"1"> #<CSV::Row "Name":"baz" "Value":"2"> ===== Recipe: Parse from \File Without Headers Use class method CSV.read without option +headers+ to read a file all at once: string = "foo,0\nbar,1\nbaz,2\n" path = 't.csv' File.write(path, string) CSV.read(path) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] Use class method CSV.foreach without option +headers+ to read one row at a time: CSV.foreach(path) do |row| p row end Output: ["foo", "0"] ["bar", "1"] ["baz", "2"] ==== Parsing from an \IO Stream You can parse \CSV data from an \IO stream, with or without headers. ===== Recipe: Parse from \IO Stream with Headers Use class method CSV.parse with option +headers+ to read an \IO stream all at once: string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" path = 't.csv' File.write(path, string) File.open(path) do |file| CSV.parse(file, headers: true) end # => #<CSV::Table mode:col_or_row row_count:4> Use class method CSV.foreach with option +headers+ to read one row at a time: File.open(path) do |file| CSV.foreach(file, headers: true) do |row| p row end end Output: #<CSV::Row "Name":"foo" "Value":"0"> #<CSV::Row "Name":"bar" "Value":"1"> #<CSV::Row "Name":"baz" "Value":"2"> ===== Recipe: Parse from \IO Stream Without Headers Use class method CSV.parse without option +headers+ to read an \IO stream all at once: string = "foo,0\nbar,1\nbaz,2\n" path = 't.csv' File.write(path, string) File.open(path) do |file| CSV.parse(file) end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] Use class method CSV.foreach without option +headers+ to read one row at a time: File.open(path) do |file| CSV.foreach(file) do |row| p row end end Output: ["foo", "0"] ["bar", "1"] ["baz", "2"] === RFC 4180 Compliance By default, \CSV parses data that is compliant with {RFC 4180}[https://www.rfc-editor.org/rfc/rfc4180] with respect to: - Row separator. - Column separator. - Quote character. ==== Row Separator RFC 4180 specifies the row separator CRLF (Ruby <tt>"\r\n"</tt>). Although the \CSV default row separator is <tt>"\n"</tt>, the parser also by default handles row separator <tt>"\r"</tt> and the RFC-compliant <tt>"\r\n"</tt>. ===== Recipe: Handle Compliant Row Separator For strict compliance, use option +:row_sep+ to specify row separator <tt>"\r\n"</tt>, which allows the compliant row separator: source = "foo,1\r\nbar,1\r\nbaz,2\r\n" CSV.parse(source, row_sep: "\r\n") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] But rejects other row separators: source = "foo,1\nbar,1\nbaz,2\n" CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError source = "foo,1\rbar,1\rbaz,2\r" CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError source = "foo,1\n\rbar,1\n\rbaz,2\n\r" CSV.parse(source, row_sep: "\r\n") # Raised MalformedCSVError ===== Recipe: Handle Non-Compliant Row Separator For data with non-compliant row separators, use option +:row_sep+. This example source uses semicolon (<tt>";"</tt>) as its row separator: source = "foo,1;bar,1;baz,2;" CSV.parse(source, row_sep: ';') # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] ==== Column Separator RFC 4180 specifies column separator COMMA (Ruby <tt>","</tt>). ===== Recipe: Handle Compliant Column Separator Because the \CSV default comma separator is ',', you need not specify option +:col_sep+ for compliant data: source = "foo,1\nbar,1\nbaz,2\n" CSV.parse(source) # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] ===== Recipe: Handle Non-Compliant Column Separator For data with non-compliant column separators, use option +:col_sep+. This example source uses TAB (<tt>"\t"</tt>) as its column separator: source = "foo,1\tbar,1\tbaz,2" CSV.parse(source, col_sep: "\t") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] ==== Quote Character RFC 4180 specifies quote character DQUOTE (Ruby <tt>"\""</tt>). ===== Recipe: Handle Compliant Quote Character Because the \CSV default quote character is <tt>"\""</tt>, you need not specify option +:quote_char+ for compliant data: source = "\"foo\",\"1\"\n\"bar\",\"1\"\n\"baz\",\"2\"\n" CSV.parse(source) # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] ===== Recipe: Handle Non-Compliant Quote Character For data with non-compliant quote characters, use option +:quote_char+. This example source uses SQUOTE (<tt>"'"</tt>) as its quote character: source = "'foo','1'\n'bar','1'\n'baz','2'\n" CSV.parse(source, quote_char: "'") # => [["foo", "1"], ["bar", "1"], ["baz", "2"]] ==== Recipe: Allow Liberal Parsing Use option +:liberal_parsing+ to specify that \CSV should attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields: source = 'is,this "three, or four",fields' CSV.parse(source) # Raises MalformedCSVError CSV.parse(source, liberal_parsing: true) # => [["is", "this \"three", " or four\"", "fields"]] === Special Handling You can use parsing options to specify special handling for certain lines and fields. ==== Special Line Handling Use parsing options to specify special handling for blank lines, or for other selected lines. ===== Recipe: Ignore Blank Lines Use option +:skip_blanks+ to ignore blank lines: source = <<-EOT foo,0 bar,1 baz,2 , EOT parsed = CSV.parse(source, skip_blanks: true) parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]] ===== Recipe: Ignore Selected Lines Use option +:skip_lines+ to ignore selected lines. source = <<-EOT # Comment foo,0 bar,1 baz,2 # Another comment EOT parsed = CSV.parse(source, skip_lines: /^#/) parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] ==== Special Field Handling Use parsing options to specify special handling for certain field values. ===== Recipe: Strip Fields Use option +:strip+ to strip parsed field values: CSV.parse_line(' a , b ', strip: true) # => ["a", "b"] ===== Recipe: Handle Null Fields Use option +:nil_value+ to specify a value that will replace each field that is null (no text): CSV.parse_line('a,,b,,c', nil_value: 0) # => ["a", 0, "b", 0, "c"] ===== Recipe: Handle Empty Fields Use option +:empty_value+ to specify a value that will replace each field that is empty (\String of length 0); CSV.parse_line('a,"",b,"",c', empty_value: 'x') # => ["a", "x", "b", "x", "c"] === Converting Fields You can use field converters to change parsed \String fields into other objects, or to otherwise modify the \String fields. ==== Converting Fields to Objects Use field converters to change parsed \String objects into other, more specific, objects. There are built-in field converters for converting to objects of certain classes: - \Float - \Integer - \Date - \DateTime - \Time Other built-in field converters include: - +:numeric+: converts to \Integer and \Float. - +:all+: converts to \DateTime, \Integer, \Float. You can also define field converters to convert to objects of other classes. ===== Recipe: Convert Fields to Integers Convert fields to \Integer objects using built-in converter +:integer+: source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, converters: :integer) parsed.map {|row| row['Value'].class} # => [Integer, Integer, Integer] ===== Recipe: Convert Fields to Floats Convert fields to \Float objects using built-in converter +:float+: source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, converters: :float) parsed.map {|row| row['Value'].class} # => [Float, Float, Float] ===== Recipe: Convert Fields to Numerics Convert fields to \Integer and \Float objects using built-in converter +:numeric+: source = "Name,Value\nfoo,0\nbar,1.1\nbaz,2.2\n" parsed = CSV.parse(source, headers: true, converters: :numeric) parsed.map {|row| row['Value'].class} # => [Integer, Float, Float] ===== Recipe: Convert Fields to Dates Convert fields to \Date objects using built-in converter +:date+: source = "Name,Date\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2001-02-03\n" parsed = CSV.parse(source, headers: true, converters: :date) parsed.map {|row| row['Date'].class} # => [Date, Date, Date] ===== Recipe: Convert Fields to DateTimes Convert fields to \DateTime objects using built-in converter +:date_time+: source = "Name,DateTime\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2020-05-07T14:59:00-05:00\n" parsed = CSV.parse(source, headers: true, converters: :date_time) parsed.map {|row| row['DateTime'].class} # => [DateTime, DateTime, DateTime] ===== Recipe: Convert Fields to Times Convert fields to \Time objects using built-in converter +:time+: source = "Name,Time\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2020-05-07T14:59:00-05:00\n" parsed = CSV.parse(source, headers: true, converters: :time) parsed.map {|row| row['Time'].class} # => [Time, Time, Time] ===== Recipe: Convert Assorted Fields to Objects Convert assorted fields to objects using built-in converter +:all+: source = "Type,Value\nInteger,0\nFloat,1.0\nDateTime,2001-02-04\n" parsed = CSV.parse(source, headers: true, converters: :all) parsed.map {|row| row['Value'].class} # => [Integer, Float, DateTime] ===== Recipe: Convert Fields to Other Objects Define a custom field converter to convert \String fields into other objects. This example defines and uses a custom field converter that converts each column-1 value to a \Rational object: rational_converter = proc do |field, field_context| field_context.index == 1 ? field.to_r : field end source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, converters: rational_converter) parsed.map {|row| row['Value'].class} # => [Rational, Rational, Rational] ==== Recipe: Filter Field Strings Define a custom field converter to modify \String fields. This example defines and uses a custom field converter that strips whitespace from each field value: strip_converter = proc {|field| field.strip } source = "Name,Value\n foo , 0 \n bar , 1 \n baz , 2 \n" parsed = CSV.parse(source, headers: true, converters: strip_converter) parsed['Name'] # => ["foo", "bar", "baz"] parsed['Value'] # => ["0", "1", "2"] ==== Recipe: Register Field Converters Register a custom field converter, assigning it a name; then refer to the converter by its name: rational_converter = proc do |field, field_context| field_context.index == 1 ? field.to_r : field end CSV::Converters[:rational] = rational_converter source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, converters: :rational) parsed['Value'] # => [(0/1), (1/1), (2/1)] ==== Using Multiple Field Converters You can use multiple field converters in either of these ways: - Specify converters in option +:converters+. - Specify converters in a custom converter list. ===== Recipe: Specify Multiple Field Converters in Option +:converters+ Apply multiple field converters by specifying them in option +:converters+: source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n" parsed = CSV.parse(source, headers: true, converters: [:integer, :float]) parsed['Value'] # => [0, 1.0, 2.0] ===== Recipe: Specify Multiple Field Converters in a Custom Converter List Apply multiple field converters by defining and registering a custom converter list: strip_converter = proc {|field| field.strip } CSV::Converters[:strip] = strip_converter CSV::Converters[:my_converters] = [:integer, :float, :strip] source = "Name,Value\n foo , 0 \n bar , 1.0 \n baz , 2.0 \n" parsed = CSV.parse(source, headers: true, converters: :my_converters) parsed['Name'] # => ["foo", "bar", "baz"] parsed['Value'] # => [0, 1.0, 2.0] === Converting Headers You can use header converters to modify parsed \String headers. Built-in header converters include: - +:symbol+: converts \String header to \Symbol. - +:downcase+: converts \String header to lowercase. You can also define header converters to otherwise modify header \Strings. ==== Recipe: Convert Headers to Lowercase Convert headers to lowercase using built-in converter +:downcase+: source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, header_converters: :downcase) parsed.headers # => ["name", "value"] ==== Recipe: Convert Headers to Symbols Convert headers to downcased Symbols using built-in converter +:symbol+: source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, header_converters: :symbol) parsed.headers # => [:name, :value] parsed.headers.map {|header| header.class} # => [Symbol, Symbol] ==== Recipe: Filter Header Strings Define a custom header converter to modify \String fields. This example defines and uses a custom header converter that capitalizes each header \String: capitalize_converter = proc {|header| header.capitalize } source = "NAME,VALUE\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, header_converters: capitalize_converter) parsed.headers # => ["Name", "Value"] ==== Recipe: Register Header Converters Register a custom header converter, assigning it a name; then refer to the converter by its name: capitalize_converter = proc {|header| header.capitalize } CSV::HeaderConverters[:capitalize] = capitalize_converter source = "NAME,VALUE\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, headers: true, header_converters: :capitalize) parsed.headers # => ["Name", "Value"] ==== Using Multiple Header Converters You can use multiple header converters in either of these ways: - Specify header converters in option +:header_converters+. - Specify header converters in a custom header converter list. ===== Recipe: Specify Multiple Header Converters in Option :header_converters Apply multiple header converters by specifying them in option +:header_converters+: source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n" parsed = CSV.parse(source, headers: true, header_converters: [:downcase, :symbol]) parsed.headers # => [:name, :value] ===== Recipe: Specify Multiple Header Converters in a Custom Header Converter List Apply multiple header converters by defining and registering a custom header converter list: CSV::HeaderConverters[:my_header_converters] = [:symbol, :downcase] source = "NAME,VALUE\nfoo,0\nbar,1.0\nbaz,2.0\n" parsed = CSV.parse(source, headers: true, header_converters: :my_header_converters) parsed.headers # => [:name, :value] === Diagnostics ==== Recipe: Capture Unconverted Fields To capture unconverted field values, use option +:unconverted_fields+: source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n" parsed = CSV.parse(source, converters: :integer, unconverted_fields: true) parsed # => [["Name", "Value"], ["foo", 0], ["bar", 1], ["baz", 2]] parsed.each {|row| p row.unconverted_fields } Output: ["Name", "Value"] ["foo", "0"] ["bar", "1"] ["baz", "2"] ==== Recipe: Capture Field Info To capture field info in a custom converter, accept two block arguments. The first is the field value; the second is a +CSV::FieldInfo+ object: strip_converter = proc {|field, field_info| p field_info; field.strip } source = " foo , 0 \n bar , 1 \n baz , 2 \n" parsed = CSV.parse(source, converters: strip_converter) parsed # => [["foo", "0"], ["bar", "1"], ["baz", "2"]] Output: #<struct CSV::FieldInfo index=0, line=1, header=nil> #<struct CSV::FieldInfo index=1, line=1, header=nil> #<struct CSV::FieldInfo index=0, line=2, header=nil> #<struct CSV::FieldInfo index=1, line=2, header=nil> #<struct CSV::FieldInfo index=0, line=3, header=nil> #<struct CSV::FieldInfo index=1, line=3, header=nil>