#sklearn #dataframes
Оказывается, есть инициатива унификации библиотек работы с датафреймами. И её поддерживают в sklearn.
"Enhancement All estimators now recognizes the column names from any dataframe that adopts the DataFrame Interchange Protocol. Dataframes that return a correct representation through np.asarray(df) is expected to work with our estimators and functions."
"Python users today have a number of great choices for dataframe libraries. From Pandas and cuDF to Vaex, Koalas, Modin, Ibis, and more. Combining multiple types of dataframes in a larger application or analysis workflow, or developing a library which uses dataframes as a data structure, presents a challenge though. Those libraries all have different APIs, and there is no standard way of converting one type of dataframe into another."
Похожая идея с использованием массивов Array API:
"Python users have a wealth of choice for libraries and frameworks for numerical computing, data science, machine learning, and deep learning. New frameworks pushing forward the state of the art in these fields are appearing every year. One unintended consequence of all this activity and creativity has been fragmentation in multidimensional array (a.k.a. tensor) libraries - which are the fundamental data structure for these fields. Choices include NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, Xarray, and others.
The APIs of each of these libraries are largely similar, but with enough differences that it’s quite difficult to write code that works with multiple (or all) of these libraries. This array API standard aims to address that issue, by specifying an API for the most common ways arrays are constructed and used.
Why not simply pick an existing API and bless that as the standard?"
Оказывается, есть инициатива унификации библиотек работы с датафреймами. И её поддерживают в sklearn.
"Enhancement All estimators now recognizes the column names from any dataframe that adopts the DataFrame Interchange Protocol. Dataframes that return a correct representation through np.asarray(df) is expected to work with our estimators and functions."
"Python users today have a number of great choices for dataframe libraries. From Pandas and cuDF to Vaex, Koalas, Modin, Ibis, and more. Combining multiple types of dataframes in a larger application or analysis workflow, or developing a library which uses dataframes as a data structure, presents a challenge though. Those libraries all have different APIs, and there is no standard way of converting one type of dataframe into another."
Похожая идея с использованием массивов Array API:
"Python users have a wealth of choice for libraries and frameworks for numerical computing, data science, machine learning, and deep learning. New frameworks pushing forward the state of the art in these fields are appearing every year. One unintended consequence of all this activity and creativity has been fragmentation in multidimensional array (a.k.a. tensor) libraries - which are the fundamental data structure for these fields. Choices include NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, Xarray, and others.
The APIs of each of these libraries are largely similar, but with enough differences that it’s quite difficult to write code that works with multiple (or all) of these libraries. This array API standard aims to address that issue, by specifying an API for the most common ways arrays are constructed and used.
Why not simply pick an existing API and bless that as the standard?"
❤1⚡1