Datasets to download

Here we list a few datasets, that might be interesting to explore with vaex

New york taxi dataset

See for instance Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance for some ideas.

import vaex
df ="/Users/users/breddels/.vaex/data/nyc_taxi/nyc_taxi2015.hdf5")
df.plot(df.col.pickup_longitude, df.col.pickup_latitude, f="log1p", show=True, limits="96%");

SDSS - dereddened

Only: ra, dec, g, r, g_r (deredenned using Schlegel maps).

The original query at SDSS archive was (although split in small parts):

SELECT ra, dec, g, r from PhotoObjAll WHERE type = 6 and  clean = 1 and r>=10.0 and r<23.5;
sdss ="/Users/maartenbreddels/vaex/data/sdss/sdss_dereddened.hdf5")
sdss.healpix_plot(sdss.col.healpix, show=True, f="log", healpix_max_level=9, healpix_level=9,
                healpix_input='galactic', healpix_output='galactic', rotation=(0,45)


See the Gaia Science Homepage for details, and you may want to try the Gaia Archive for ADQL (SQL like) queries.

gaia ="/data/users/gaia/gaia-dr2/gaia-dr2-sort-by-source_id.hdf5")
gaia.plot("ra", "dec", f="log", limits=[[360, 0], [-90, 90]], show=True);

Helmi & de Zeeuw 2000

Result of an N-body simulation of the accretion of 33 satellite galaxies into a Milky Way dark matter halo * 3 million rows - 252MB

hdz = vaex.datasets.helmi_de_zeeuw.fetch() # this will download it on the fly
hdz.plot([["x", "y"], ["Lz", "E"]], f="log", figsize=(12,5), show=True);