ETL and high performance computing

Midio has an experimental native package for using the polars data frame library for handling large amounts of data.

How to use it

Start by adding the polars package using the package manager.

Then, use one of the functions in the Polars.Source module, to import your data into the polars format. Supported formats include CSV, JSON, newline delimited JSON and querying a Postgres database.

These functions return a dataframe object, which can be operated on using either Polars.Execute Sql or Polars.Execute Dynamic Sql.

Execute Sql

Execute Sql lets you provide a list of inputs using, and then executing an SQL query over those inputs. By default its accepts a single input.

Execute Dynamic SqlAbout the output

Works in a simliar way, but expects an object where the key is the name of the source, and the value is a data frame.

Getting the results

After executing one or more queries, the results can be collected using the Polars.Collect function.

About the output

The output is by default in a column-oriented format, meaning you get an object where each field represents a column, and its value is a list containing the values for each row.

To get this data converted back to a list of objects, which is often a more useful format, you can use the Transpose function.

Last updated

Was this helpful?