Reading and Writing CSV files#
Arrow provides a fast CSV reader allowing ingestion of external data to create Arrow Tables or a stream of Arrow RecordBatches.
See also
Reading CSV files#
Data in a CSV file can either be read in as a single Arrow Table using
TableReader or streamed as RecordBatches using
StreamingReader. See Tradeoffs for a
discussion of the tradeoffs between the two methods.
Both these readers require an arrow::io::InputStream instance
representing the input file. Their behavior can be customized using a
combination of ReadOptions,
ParseOptions, and ConvertOptions.
TableReader#
#include "arrow/csv/api.h"
{
// ...
arrow::io::IOContext io_context = arrow::io::default_io_context();
std::shared_ptr<arrow::io::InputStream> input = ...;
auto read_options = arrow::csv::ReadOptions::Defaults();
auto parse_options = arrow::csv::ParseOptions::Defaults();
auto convert_options = arrow::csv::ConvertOptions::Defaults();
// Instantiate TableReader from input stream and options
auto maybe_reader =
arrow::csv::TableReader::Make(io_context,
input,
read_options,
parse_options,
convert_options);
if (!maybe_reader.ok()) {
// Handle TableReader instantiation error...
}
std::shared_ptr<arrow::csv::TableReader> reader = *maybe_reader;
// Read table from CSV file
auto maybe_table = reader->Read();
if (!maybe_table.ok()) {
// Handle CSV read error
// (for example a CSV syntax error or failed type conversion)
}
std::shared_ptr<