Binary data transfer¶
Motivation¶
Often for visualizations in genomics, massive social networks, or sensor data visualizations, it helps to be able to plot millions rather than simply hundreds of thousands of points.
By default, pydeck sends data from Jupyter to the frontend by serializing data to JSON. However, for massive data sets, the costs to serialize and deserialize this JSON can prevent a visualization from rendering.
In order to get around this, pydeck supports binary data transfer, which significantly reduces data size. Binary transfer relies on NumPy and its typed arrays, which are converted to JavaScript typed arrays and passed to deck.gl using precalculated binary attributes.
Usage¶
Binary transport will only work if the following requirements are met:
use_binary_transport
must be set toTrue
explictly on yourLayer
Layer input data must be a
pandas.DataFrame
object.Data that is not intend to be rendered should not be passed into the layer.
Accessor names must be strings representing column names within the data frame, e.g.,
get_position='position'
is correct, notget_position=['x', 'y']
. For example,This data format, where
x
&y
represent a position andr
,g
, andb
represent color values,x y r g b 0 1 0 0 0 0 5 255 0 0 5 1 255 255 0 should be converted to this format
position color [0, 1] [0, 0, 0] [0, 5] [255, 0, 0] [5, 1] [255, 255, 0] Binary transfer only works within Jupyter environments via
pydeck.bindings.deck.Deck.show()
. It relies on the socket-level communication built into the Jupyter environment.
Example¶
"""
Binary Transport
================
Example of binary transport in pydeck. This notebook renders 10k points via the web sockets within
a Jupyter notebook if you run with ``generate_vis(notebook_display=True)``
Since binary transfer relies on Jupyter's kernel communication,
note that the .html in the pydeck documentation does not use binary transfer
and is just for illustration.
"""
import pydeck as pdk
import pandas as pd
NODES_URL = "https://raw.githubusercontent.com/ajduberstein/geo_datasets/master/social_nodes.csv"
def generate_graph_data(num_nodes, random_seed):
"""Generates a graph of 10k nodes with a 3D force layout
This function is unused but serves as an example of how the data in
this visualization was generated
"""
import networkx as nx # noqa
g = nx.random_internet_as_graph(num_nodes, random_seed)
node_positions = nx.fruchterman_reingold_layout(g, dim=3)
force_layout_df = pd.DataFrame.from_records(node_positions).transpose()
force_layout_df["group"] = [d[1]["type"] for d in g.nodes.data()]
force_layout_df.columns = ["x", "y", "z", "group"]
return force_layout_df
def make_renderer(nodes: pd.DataFrame, use_binary_transport: bool = False) -> pdk.Deck:
"""Creates the pydeck visualization for rendering"""
view_state = pdk.ViewState(
offset=[0, 0],
target=[0, 0, 0],
latitude=None,
longitude=None,
bearing=None,
pitch=None,
zoom=10,
)
views = [pdk.View(type="OrbitView", controller=True)]
nodes_layer = pdk.Layer(
"PointCloudLayer",
nodes,
get_position="position",
get_normal=[10, 100, 10],
get_color="color",
pickable=True,
# Set use_binary_transport to `True`
use_binary_transport=use_binary_transport,
auto_highlight=True,
highlight_color=[255, 255, 0],
radius=50,
)
return pdk.Deck(layers=[nodes_layer], initial_view_state=view_state, views=views, map_provider=None)
r = None
def generate_vis(notebook_display: bool = False):
global r
nodes = pd.read_csv(NODES_URL)
colors = pdk.data_utils.assign_random_colors(nodes["group"])
# Divide by 255 to normalize the colors
# Specify positions and colors as columns of lists
nodes["color"] = nodes.apply(
lambda row: [c / 255 if notebook_display else c for c in colors.get(row["group"])], axis=1
)
nodes["position"] = nodes.apply(lambda row: [row["x"], row["y"], row["z"]], axis=1)
# Remove all unused columns
del nodes["x"]
del nodes["y"]
del nodes["z"]
del nodes["group"]
if not notebook_display:
r = make_renderer(nodes, use_binary_transport=False)
r.to_html("binary_transport.html", css_background_color="charcoal", notebook_display=notebook_display)
else:
r = make_renderer(nodes, use_binary_transport=True)
display(r.show()) # noqa
if __name__ == "__main__":
generate_vis()