If you’re writing a library which consumes dataframes, should you choose to support pandas, Polars, cuDF, modin, vaex, pyspark, dask, or something else?
Don’t choose - learn how the DataFrame Interchange Protocol and/or Narwhals enable you support them all!
In 2023, we saw several libraries - which had previously only supported pandas - add support for other dataframe libraries such as Polars, Modin, and cuDF. They typically did this by keeping their existing code, and converting non-pandas inputs to pandas. They’ve usually been smart about only converting the parts of the dataframe which they need, but nonetheless, this approach has limitations.
Downsides of the “just convert to pandas” approach are:
This talk will introduce you to the DataFrame Interchange Protocol and to Narwhals, which allow library developers to:
The format will roughly be:
By the end of the talk, attendees will have learned about the dataframe ecosystem, and those involved with dataframe-consuming libraries will know all they need in order to effectively support multiple dataframe libraries. Library maintainers and contributors will get the most out of the talk, but anyone regularly using dataframes will also learn a lot about the tools they use.
Marco is a Senior Software Engineer at Quansight Labs, where he works on pandas, Polars, Dataframe Interchange Protocol, and assorted consulting and training activities. He holds an MSc in Mathematics and Foundations of Computer Science from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition.