View Source csv (util v1.1.5)

CSV file parsing functions

Link to this section Summary

Types

Options for loading data to a database.
{load_type, Type}
Type of loading to perform. recreate will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table. replace will do an insert by using REPLACE INTO statement. ignore_dups will use INSERT IGNORE INTO statement to ignore records with duplicate keys. update_dups will do an INSERT INTO and ON DUPLICATE KEY UPDATE, so that the old records are updated and the new ones are inserted.
{create_table, Allow}
Allow to create a table if it doesn't exist
{col_types, Map}
Types of data for all or some columens. The Map is in the format: ColName::binary() => ColInfo, where ColInfo is ColType | {ColType, ColLen::integer()}, and ColType is: date | datetime | integer | float | blob | number.
{transforms, Map}
Value transformation function for columns. The Map is in the format: ColName::binary() => fun((Value::term()) -> term()).
{batch_size, Size}
Number of records per SQL insert/update/replace call
{blob_size, Size}
Threshold in number of bytes at which a VARCHAR field is defined as BLOB
{save_create_sql_to_file, Filename::string()}
Save CREATE TABLE sql statement to a file
guess_types
When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
{guess_limit_rows, Limit}
Limit the number of rows for guessing the column data types
{max_nulls_pcnt, Percent}
A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as string
{primary_key, Fields}
Names of primary key fields in the created table
{drop_temp_table, boolean()}
When true (default), temp table is dropped.
{encoding, Encoding}
The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
verbose
Print additional details to stdout
CSV Parsing Options.
fix_lengths
if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
binary
Return fields as binaries (default)
list
Return fields as lists
{open, list()}
Options given to file:open/2
{columns, Names}
Return data only in given columns
{converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}
Data format converter. If Col is all, the same converting function is used for all columns. If the converter is a {rex, RegEx, Replace} then the regular expression replacement will be run on a value in the requested column.

Functions

Guess the type of data by its value
Guess data types of fields in the given CSV list of rows obtained by parsing a CSV file with parse(File,[fix_lengths]). The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}, where the Type is a field type, MaxFieldLen is the max length of data in this column, and NumOfNulls is the number of rows with empty values in this column.
Load CSV data from a File to a MySQL database. Tab is the name of a table where to load data. MySqlPid is the pid of a MySQL database connection returned by mysql:start_link/1. The data in the table is replaced according to {import_type, Type}:
  • recreate - The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.
  • replace - Use "REPLACE INTO" instead of "INSERT INTO" existing table
  • ignore_dups - The insert in the existing table is performed and the records with duplicate keys are ignored
  • update_dups - The insert in the existing table is performed and the records with duplicate keys are updated
  • upsert - The insert/update in the existing table is performed without creating a temporary table
NOTE: this function requires https://github.com/mysql-otp/mysql-otp.git
Get max field lengths for a list obtained by parsing a CSV file with parse_csv_file(File,[fix_lengths]).
Parse a CSV file using default options.
Parse a given CSV file.
Parse a CSV line

Link to this section Types

-type load_options() ::
    [{load_type, recreate | replace | ignore_dups | update_dups} |
     {col_types, #{binary() => ColType :: atom() | {ColType :: atom(), ColLen :: integer()}}} |
     {batch_size, integer()} |
     {blob_size, integer()} |
     {create_table, boolean()} |
     {save_create_sql_to_file, string()} |
     {transforms, #{binary() => fun((term()) -> term())}} |
     {guess_types, boolean()} |
     {guess_limit_rows, integer()} |
     {max_nulls_pcnt, float()} |
     {primary_key, PKColumns :: binary() | [binary() | list()]} |
     {drop_temp_table, boolean()} |
     {encoding, string() | atom()} |
     {verbose, boolean() | integer()}].
Options for loading data to a database.
{load_type, Type}
Type of loading to perform. recreate will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table. replace will do an insert by using REPLACE INTO statement. ignore_dups will use INSERT IGNORE INTO statement to ignore records with duplicate keys. update_dups will do an INSERT INTO and ON DUPLICATE KEY UPDATE, so that the old records are updated and the new ones are inserted.
{create_table, Allow}
Allow to create a table if it doesn't exist
{col_types, Map}
Types of data for all or some columens. The Map is in the format: ColName::binary() => ColInfo, where ColInfo is ColType | {ColType, ColLen::integer()}, and ColType is: date | datetime | integer | float | blob | number.
{transforms, Map}
Value transformation function for columns. The Map is in the format: ColName::binary() => fun((Value::term()) -> term()).
{batch_size, Size}
Number of records per SQL insert/update/replace call
{blob_size, Size}
Threshold in number of bytes at which a VARCHAR field is defined as BLOB
{save_create_sql_to_file, Filename::string()}
Save CREATE TABLE sql statement to a file
guess_types
When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
{guess_limit_rows, Limit}
Limit the number of rows for guessing the column data types
{max_nulls_pcnt, Percent}
A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as string
{primary_key, Fields}
Names of primary key fields in the created table
{drop_temp_table, boolean()}
When true (default), temp table is dropped.
{encoding, Encoding}
The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
verbose
Print additional details to stdout
-type parse_options() ::
    [fix_lengths | binary | list |
     {open, list()} |
     {columns, [binary() | string()]} |
     {converters,
      [{binary() | string() | all,
        fun((binary(), binary()) -> binary()) | {rex, binary(), binary()}}]}].
CSV Parsing Options.
fix_lengths
if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
binary
Return fields as binaries (default)
list
Return fields as lists
{open, list()}
Options given to file:open/2
{columns, Names}
Return data only in given columns
{converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}
Data format converter. If Col is all, the same converting function is used for all columns. If the converter is a {rex, RegEx, Replace} then the regular expression replacement will be run on a value in the requested column.

Link to this section Functions

-spec guess_data_type(binary()) -> {null | date | datetime | integer | float | string, term(), string()}.
Guess the type of data by its value
Link to this function

guess_data_types(HasHeaderRow, Rows)

View Source
-spec guess_data_types(HasHeaderRow :: boolean(), Rows :: [Fields :: [binary()]]) ->
                    {Type :: string | integer | number | float | date | datetime,
                     MaxFieldLen :: integer(),
                     NumOfNulls :: integer()}.
Guess data types of fields in the given CSV list of rows obtained by parsing a CSV file with parse(File,[fix_lengths]). The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}, where the Type is a field type, MaxFieldLen is the max length of data in this column, and NumOfNulls is the number of rows with empty values in this column.
Link to this function

load_to_mysql(File, Tab, MySqlPid, Opts)

View Source
-spec load_to_mysql(File :: string(), Tab :: string(), MySqlPid :: pid(), Opts :: load_options()) ->
                 {Columns :: list(), AffectedCount :: integer(), RecCount :: integer()}.
Load CSV data from a File to a MySQL database. Tab is the name of a table where to load data. MySqlPid is the pid of a MySQL database connection returned by mysql:start_link/1. The data in the table is replaced according to {import_type, Type}:
  • recreate - The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.
  • replace - Use "REPLACE INTO" instead of "INSERT INTO" existing table
  • ignore_dups - The insert in the existing table is performed and the records with duplicate keys are ignored
  • update_dups - The insert in the existing table is performed and the records with duplicate keys are updated
  • upsert - The insert/update in the existing table is performed without creating a temporary table
NOTE: this function requires https://github.com/mysql-otp/mysql-otp.git
Link to this function

max_field_lengths(HasHeaderRow, Rows)

View Source
-spec max_field_lengths(HasHeaderRow :: boolean(), Rows :: [Fields :: list()]) -> [Len :: integer()].
Get max field lengths for a list obtained by parsing a CSV file with parse_csv_file(File,[fix_lengths]).
-spec parse(string()) -> [[binary()]].
Parse a CSV file using default options.
-spec parse(binary() | string(), parse_options()) -> [[string()]].
Parse a given CSV file.
-spec parse_line(binary()) -> list().
Parse a CSV line