Reading and Writing CSV files#

Arrow provides a fast CSV reader allowing ingestion of external data to create Arrow Tables or a stream of Arrow RecordBatches.

Reading CSV files#

Data in a CSV file can either be read in as a single Arrow Table using TableReader or streamed as RecordBatches using StreamingReader. See Tradeoffs for a discussion of the tradeoffs between the two methods.

Both these readers require an arrow::io::InputStream instance representing the input file. Their behavior can be customized using a combination of ReadOptions, ParseOptions, and ConvertOptions.

TableReader#

#include "arrow/csv/api.h"

{
   // ...
   arrow::io::IOContext io_context = arrow::io::default_io_context();
   std::shared_ptr<arrow::io::InputStream> input = ...;

   auto read_options = arrow::csv::ReadOptions::Defaults();
   auto parse_options = arrow::csv::ParseOptions::Defaults();
   auto convert_options = arrow::csv::ConvertOptions::Defaults();

   // Instantiate TableReader from input stream and options
   auto maybe_reader =
     arrow::csv::TableReader::Make(io_context,
                                   input,
                                   read_options,
                                   parse_options,
                                   convert_options);
   if (!maybe_reader.ok()) {
     // Handle TableReader instantiation error...
   }
   std::shared_ptr<arrow::csv::TableReader> reader = *maybe_reader;

   // Read table from CSV file
   auto maybe_table = reader->Read();
   if (!maybe_table.ok()) {
     // Handle CSV read error
     // (for example a CSV syntax error or failed type conversion)
   }
   std::shared_ptr<