View Source csv (util v1.1.5)

CSV file parsing functions

Link to this section Summary

Types

load_options/0

Options for loading data to a database.

{load_type, Type}: Type of loading to perform. recreate will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table. replace will do an insert by using REPLACE INTO statement. ignore_dups will use INSERT IGNORE INTO statement to ignore records with duplicate keys. update_dups will do an INSERT INTO and ON DUPLICATE KEY UPDATE, so that the old records are updated and the new ones are inserted.
{create_table, Allow}: Allow to create a table if it doesn't exist
{col_types, Map}: Types of data for all or some columens. The Map is in the format: ColName::binary() => ColInfo, where ColInfo is ColType | {ColType, ColLen::integer()}, and ColType is: date | datetime | integer | float | blob | number.
{transforms, Map}: Value transformation function for columns. The Map is in the format: ColName::binary() => fun((Value::term()) -> term()).
{batch_size, Size}: Number of records per SQL insert/update/replace call
{blob_size, Size}: Threshold in number of bytes at which a VARCHAR field is defined as BLOB
{save_create_sql_to_file, Filename::string()}: Save CREATE TABLE sql statement to a file
guess_types: When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
{guess_limit_rows, Limit}: Limit the number of rows for guessing the column data types
{max_nulls_pcnt, Percent}: A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as string
{primary_key, Fields}: Names of primary key fields in the created table
{drop_temp_table, boolean()}: When true (default), temp table is dropped.
{encoding, Encoding}: The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
verbose: Print additional details to stdout

parse_options/0

CSV Parsing Options.

fix_lengths: if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
binary: Return fields as binaries (default)
list: Return fields as lists
{open, list()}: Options given to file:open/2
{columns, Names}: Return data only in given columns
{converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}: Data format converter. If Col is all, the same converting function is used for all columns. If the converter is a {rex, RegEx, Replace} then the regular expression replacement will be run on a value in the requested column.

Functions

guess_data_type(S)

Guess the type of data by its value

guess_data_types(HasHeaderRow, Rows)

Guess data types of fields in the given CSV list of rows obtained by parsing a CSV file with parse(File,[fix_lengths]). The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}, where the Type is a field type, MaxFieldLen is the max length of data in this column, and NumOfNulls is the number of rows with empty values in this column.

load_to_mysql(File, Tab, MySqlPid, Opts)

Load CSV data from a File to a MySQL database. Tab is the name of a table where to load data. MySqlPid is the pid of a MySQL database connection returned by mysql:start_link/1. The data in the table is replaced according to {import_type, Type}:

recreate - The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.
replace - Use "REPLACE INTO" instead of "INSERT INTO" existing table
ignore_dups - The insert in the existing table is performed and the records with duplicate keys are ignored
update_dups - The insert in the existing table is performed and the records with duplicate keys are updated
upsert - The insert/update in the existing table is performed without creating a temporary table

NOTE: this function requires https://github.com/mysql-otp/mysql-otp.git

max_field_lengths(HasHeaderRow, Rows)

Get max field lengths for a list obtained by parsing a CSV file with parse_csv_file(File,[fix_lengths]).

parse(File)

Parse a CSV file using default options.

parse(File, Opts)

Parse a given CSV file.

parse_line(Line)

Parse a CSV line

Link to this section Types

load_options/0

-type load_options() ::
    [{load_type, recreate | replace | ignore_dups | update_dups} |
     {col_types, #{binary() => ColType :: atom() | {ColType :: atom(), ColLen :: integer()}}} |
     {batch_size, integer()} |
     {blob_size, integer()} |
     {create_table, boolean()} |
     {save_create_sql_to_file, string()} |
     {transforms, #{binary() => fun((term()) -> term())}} |
     {guess_types, boolean()} |
     {guess_limit_rows, integer()} |
     {max_nulls_pcnt, float()} |
     {primary_key, PKColumns :: binary() | [binary() | list()]} |
     {drop_temp_table, boolean()} |
     {encoding, string() | atom()} |
     {verbose, boolean() | integer()}].

Options for loading data to a database.

{load_type, Type}: Type of loading to perform. recreate will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table. replace will do an insert by using REPLACE INTO statement. ignore_dups will use INSERT IGNORE INTO statement to ignore records with duplicate keys. update_dups will do an INSERT INTO and ON DUPLICATE KEY UPDATE, so that the old records are updated and the new ones are inserted.
{create_table, Allow}: Allow to create a table if it doesn't exist
{col_types, Map}: Types of data for all or some columens. The Map is in the format: ColName::binary() => ColInfo, where ColInfo is ColType | {ColType, ColLen::integer()}, and ColType is: date | datetime | integer | float | blob | number.
{transforms, Map}: Value transformation function for columns. The Map is in the format: ColName::binary() => fun((Value::term()) -> term()).
{batch_size, Size}: Number of records per SQL insert/update/replace call
{blob_size, Size}: Threshold in number of bytes at which a VARCHAR field is defined as BLOB
{save_create_sql_to_file, Filename::string()}: Save CREATE TABLE sql statement to a file
guess_types: When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
{guess_limit_rows, Limit}: Limit the number of rows for guessing the column data types
{max_nulls_pcnt, Percent}: A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as string
{primary_key, Fields}: Names of primary key fields in the created table
{drop_temp_table, boolean()}: When true (default), temp table is dropped.
{encoding, Encoding}: The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
verbose: Print additional details to stdout

parse_options/0

-type parse_options() ::
    [fix_lengths | binary | list |
     {open, list()} |
     {columns, [binary() | string()]} |
     {converters,
      [{binary() | string() | all,
        fun((binary(), binary()) -> binary()) | {rex, binary(), binary()}}]}].

CSV Parsing Options.

fix_lengths: if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
binary: Return fields as binaries (default)
list: Return fields as lists
{open, list()}: Options given to file:open/2
{columns, Names}: Return data only in given columns
{converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}: Data format converter. If Col is all, the same converting function is used for all columns. If the converter is a {rex, RegEx, Replace} then the regular expression replacement will be run on a value in the requested column.

Link to this section Functions

guess_data_type(S)

-spec guess_data_type(binary()) -> {null | date | datetime | integer | float | string, term(), string()}.

Guess the type of data by its value

guess_data_types(HasHeaderRow, Rows)

-spec guess_data_types(HasHeaderRow :: boolean(), Rows :: [Fields :: [binary()]]) ->
                    {Type :: string | integer | number | float | date | datetime,
                     MaxFieldLen :: integer(),
                     NumOfNulls :: integer()}.

load_to_mysql(File, Tab, MySqlPid, Opts)

-spec load_to_mysql(File :: string(), Tab :: string(), MySqlPid :: pid(), Opts :: load_options()) ->
                 {Columns :: list(), AffectedCount :: integer(), RecCount :: integer()}.

recreate - The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.
replace - Use "REPLACE INTO" instead of "INSERT INTO" existing table
ignore_dups - The insert in the existing table is performed and the records with duplicate keys are ignored
update_dups - The insert in the existing table is performed and the records with duplicate keys are updated
upsert - The insert/update in the existing table is performed without creating a temporary table

NOTE: this function requires https://github.com/mysql-otp/mysql-otp.git

max_field_lengths(HasHeaderRow, Rows)

-spec max_field_lengths(HasHeaderRow :: boolean(), Rows :: [Fields :: list()]) -> [Len :: integer()].

Get max field lengths for a list obtained by parsing a CSV file with parse_csv_file(File,[fix_lengths]).

parse(File)

-spec parse(string()) -> [[binary()]].

Parse a CSV file using default options.

parse(File, Opts)

-spec parse(binary() | string(), parse_options()) -> [[string()]].

Parse a given CSV file.

parse_line(Line)

-spec parse_line(binary()) -> list().

Parse a CSV line

Settings View Source csv (util v1.1.5)

Link to this section Summary

Types

Functions

Link to this section Types

load_options/0

parse_options/0

Link to this section Functions

guess_data_type(S)

guess_data_types(HasHeaderRow, Rows)

load_to_mysql(File, Tab, MySqlPid, Opts)

max_field_lengths(HasHeaderRow, Rows)

parse(File)

parse(File, Opts)

parse_line(Line)

View Source csv (util v1.1.5)