View Source csv (util v1.1.5)
Link to this section Summary
Types
Options for loading data to a database.
- {load_type, Type}
- Type of loading to perform.
recreate
will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table.replace
will do an insert by usingREPLACE INTO
statement.ignore_dups
will useINSERT IGNORE INTO
statement to ignore records with duplicate keys.update_dups
will do anINSERT INTO
andON DUPLICATE KEY UPDATE
, so that the old records are updated and the new ones are inserted. - {create_table, Allow}
- Allow to create a table if it doesn't exist
- {col_types, Map}
- Types of data for all or some columens. The Map is in the format:
ColName::binary() => ColInfo
, where ColInfo isColType | {ColType, ColLen::integer()}
, andColType
is:date | datetime | integer | float | blob | number
. - {transforms, Map}
- Value transformation function for columns. The Map is in the format:
ColName::binary() => fun((Value::term()) -> term())
. - {batch_size, Size}
- Number of records per SQL insert/update/replace call
- {blob_size, Size}
- Threshold in number of bytes at which a VARCHAR field is defined as BLOB
- {save_create_sql_to_file, Filename::string()}
- Save CREATE TABLE sql statement to a file
- guess_types
- When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
- {guess_limit_rows, Limit}
- Limit the number of rows for guessing the column data types
- {max_nulls_pcnt, Percent}
- A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as
string
- {primary_key, Fields}
- Names of primary key fields in the created table
- {drop_temp_table, boolean()}
- When true (default), temp table is dropped.
- {encoding, Encoding}
- The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
- verbose
- Print additional details to stdout
CSV Parsing Options.
- fix_lengths
- if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
- binary
- Return fields as binaries (default)
- list
- Return fields as lists
- {open, list()}
- Options given to file:open/2
- {columns, Names}
- Return data only in given columns
- {converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}
- Data format converter. If
Col
isall
, the same converting function is used for all columns. If the converter is a{rex, RegEx, Replace}
then the regular expression replacement will be run on a value in the requested column.
Functions
Guess the type of data by its value
Guess data types of fields in the given CSV list of rows obtained by parsing a CSV file with
parse(File,[fix_lengths])
. The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}
, where the Type
is a field type, MaxFieldLen
is the max length of data in this column, and NumOfNulls
is the number of rows with empty values in this column.Load CSV data from a
File
to a MySQL
database. Tab
is the name of a table where to load data. MySqlPid
is the pid of a MySQL database connection returned by mysql:start_link/1
. The data in the table is replaced according to {import_type, Type}
:recreate
- The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.replace
- Use "REPLACE INTO" instead of "INSERT INTO" existing tableignore_dups
- The insert in the existing table is performed and the records with duplicate keys are ignoredupdate_dups
- The insert in the existing table is performed and the records with duplicate keys are updatedupsert
- The insert/update in the existing table is performed without creating a temporary table
Get max field lengths for a list obtained by parsing a CSV file with
parse_csv_file(File,[fix_lengths])
.Parse a CSV file using default options.
Parse a given CSV file.
Parse a CSV line
Link to this section Types
-type load_options() ::
[{load_type, recreate | replace | ignore_dups | update_dups} |
{col_types, #{binary() => ColType :: atom() | {ColType :: atom(), ColLen :: integer()}}} |
{batch_size, integer()} |
{blob_size, integer()} |
{create_table, boolean()} |
{save_create_sql_to_file, string()} |
{transforms, #{binary() => fun((term()) -> term())}} |
{guess_types, boolean()} |
{guess_limit_rows, integer()} |
{max_nulls_pcnt, float()} |
{primary_key, PKColumns :: binary() | [binary() | list()]} |
{drop_temp_table, boolean()} |
{encoding, string() | atom()} |
{verbose, boolean() | integer()}].
- {load_type, Type}
- Type of loading to perform.
recreate
will replace the table by atomically dropping the old one, creating/loading the new one, and replacing the table.replace
will do an insert by usingREPLACE INTO
statement.ignore_dups
will useINSERT IGNORE INTO
statement to ignore records with duplicate keys.update_dups
will do anINSERT INTO
andON DUPLICATE KEY UPDATE
, so that the old records are updated and the new ones are inserted. - {create_table, Allow}
- Allow to create a table if it doesn't exist
- {col_types, Map}
- Types of data for all or some columens. The Map is in the format:
ColName::binary() => ColInfo
, where ColInfo isColType | {ColType, ColLen::integer()}
, andColType
is:date | datetime | integer | float | blob | number
. - {transforms, Map}
- Value transformation function for columns. The Map is in the format:
ColName::binary() => fun((Value::term()) -> term())
. - {batch_size, Size}
- Number of records per SQL insert/update/replace call
- {blob_size, Size}
- Threshold in number of bytes at which a VARCHAR field is defined as BLOB
- {save_create_sql_to_file, Filename::string()}
- Save CREATE TABLE sql statement to a file
- guess_types
- When specified, the function will try to guess the type of data in columns instead of treating all data as string fields. The possible data typed guessed: integer, float, date, datetime, number, string
- {guess_limit_rows, Limit}
- Limit the number of rows for guessing the column data types
- {max_nulls_pcnt, Percent}
- A percentage threshold of permissible NULLs in a column (0-100), above which the column data type is forced to be treated as
string
- {primary_key, Fields}
- Names of primary key fields in the created table
- {drop_temp_table, boolean()}
- When true (default), temp table is dropped.
- {encoding, Encoding}
- The name of the encoding to use for storing data. For the list of permissible values [see this link](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html)
- verbose
- Print additional details to stdout
-type parse_options() ::
[fix_lengths | binary | list |
{open, list()} |
{columns, [binary() | string()]} |
{converters,
[{binary() | string() | all,
fun((binary(), binary()) -> binary()) | {rex, binary(), binary()}}]}].
- fix_lengths
- if a record has a column count greater than what's found in the header row, those extra columns will be dropped, and if a row has fewer columns, empty columns will be added.
- binary
- Return fields as binaries (default)
- list
- Return fields as lists
- {open, list()}
- Options given to file:open/2
- {columns, Names}
- Return data only in given columns
- {converters, [{Col, fun((ColName, Value) -> NewValue)| {rex, RegEx, Replace}]}
- Data format converter. If
Col
isall
, the same converting function is used for all columns. If the converter is a{rex, RegEx, Replace}
then the regular expression replacement will be run on a value in the requested column.
Link to this section Functions
-spec guess_data_type(binary()) -> {null | date | datetime | integer | float | string, term(), string()}.
-spec guess_data_types(HasHeaderRow :: boolean(), Rows :: [Fields :: [binary()]]) ->
{Type :: string | integer | number | float | date | datetime,
MaxFieldLen :: integer(),
NumOfNulls :: integer()}.
parse(File,[fix_lengths])
. The function returns a list of tuples {Type, MaxFieldLen, NumOfNulls}
, where the Type
is a field type, MaxFieldLen
is the max length of data in this column, and NumOfNulls
is the number of rows with empty values in this column.
-spec load_to_mysql(File :: string(), Tab :: string(), MySqlPid :: pid(), Opts :: load_options()) -> {Columns :: list(), AffectedCount :: integer(), RecCount :: integer()}.
File
to a MySQL
database. Tab
is the name of a table where to load data. MySqlPid
is the pid of a MySQL database connection returned by mysql:start_link/1
. The data in the table is replaced according to {import_type, Type}
:recreate
- The table is entirely replaced with the data from file. The data from the file is loaded atomically - i.e. either the whole file loading succeeds or fails. This is accomplished by first loading data to a temporary table, and then using the database's ACID properties to replace the target table with the temporary table.replace
- Use "REPLACE INTO" instead of "INSERT INTO" existing tableignore_dups
- The insert in the existing table is performed and the records with duplicate keys are ignoredupdate_dups
- The insert in the existing table is performed and the records with duplicate keys are updatedupsert
- The insert/update in the existing table is performed without creating a temporary table
-spec max_field_lengths(HasHeaderRow :: boolean(), Rows :: [Fields :: list()]) -> [Len :: integer()].
parse_csv_file(File,[fix_lengths])
.
-spec parse(string()) -> [[binary()]].
-spec parse(binary() | string(), parse_options()) -> [[string()]].
-spec parse_line(binary()) -> list().