GraphQL

If you want to try out this notebook with a live Python kernel, use mybinder:

https://mybinder.org/badge_logo.svg

vaex-graphql is a plugin package that exposes a DataFrame via a GraphQL interface. This allows easy sharing of data or aggregations/statistics or machine learning models to frontends or other programs with a standard query languages.

(Install with $ pip install vaex-graphql, no conda-forge support yet)

[3]:
import vaex.ml
df = vaex.ml.datasets.load_titanic()
df
[3]:
# pclass survived name sex age sibsp parch ticket fare cabin embarked boat body home_dest
0 1 True Allen, Miss. Elisabeth Walton female29.0 0 0 24160 211.3375B5 S 2 nan St Louis, MO
1 1 True Allison, Master. Hudson Trevor male 0.91671 2 113781 151.55 C22 C26S 11 nan Montreal, PQ / Chesterville, ON
2 1 False Allison, Miss. Helen Loraine female2.0 1 2 113781 151.55 C22 C26S None nan Montreal, PQ / Chesterville, ON
3 1 False Allison, Mr. Hudson Joshua Creighton male 30.0 1 2 113781 151.55 C22 C26S None 135.0 Montreal, PQ / Chesterville, ON
4 1 False Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female25.0 1 2 113781 151.55 C22 C26S None nan Montreal, PQ / Chesterville, ON
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1,3043 False Zabour, Miss. Hileni female14.5 1 0 2665 14.4542 None C None 328.0 None
1,3053 False Zabour, Miss. Thamine femalenan 1 0 2665 14.4542 None C None nan None
1,3063 False Zakarian, Mr. Mapriededer male 26.5 0 0 2656 7.225 None C None 304.0 None
1,3073 False Zakarian, Mr. Ortin male 27.0 0 0 2670 7.225 None C None nan None
1,3083 False Zimmerman, Mr. Leo male 29.0 0 0 315082 7.875 None S None nan None
[10]:
result = df.graphql.execute("""
    {
        df {
            min {
                age
                fare
            }
            mean {
                age
                fare
            }
            max {
                age
                fare
            }
            groupby {
                sex {
                   count
                   mean {
                       age
                   }
                }
            }
        }
    }
    """)
result.data
[10]:
OrderedDict([('df',
              OrderedDict([('min',
                            OrderedDict([('age', 0.1667), ('fare', 0.0)])),
                           ('mean',
                            OrderedDict([('age', 29.8811345124283),
                                         ('fare', 33.29547928134572)])),
                           ('max',
                            OrderedDict([('age', 80.0), ('fare', 512.3292)])),
                           ('groupby',
                            OrderedDict([('sex',
                                          OrderedDict([('count', [466, 843]),
                                                       ('mean',
                                                        OrderedDict([('age',
                                                                      [28.6870706185567,
                                                                       30.585232978723408])]))]))]))]))])

Pandas support

After importing vaex.graphql, vaex also installs a pandas accessor, so it is also accessible for Pandas DataFrames.

[11]:
df_pandas = df.to_pandas_df()
[20]:
df_pandas.graphql.execute("""
    {
        df(where: {age: {_gt: 20}}) {
            row(offset: 3, limit: 2) {
                name
                survived
            }
        }
    }
    """
).data
[20]:
OrderedDict([('df',
              OrderedDict([('row',
                            [OrderedDict([('name', 'Anderson, Mr. Harry'),
                                          ('survived', True)]),
                             OrderedDict([('name',
                                           'Andrews, Miss. Kornelia Theodosia'),
                                          ('survived', True)])])]))])

Server

The easiest way to learn to use the GraphQL language/vaex interface is to launch a server, and play with the GraphiQL graphical interface, its autocomplete, and the schema explorer.

We try to stay close to the Hasura API: https://docs.hasura.io/1.0/graphql/manual/api-reference/graphql-api/query.html

A server can be started from the command line:

$ python -m vaex.graphql myfile.hdf5

Or from within Python using df.graphql.serve