TableSet#

The TableSet class collects a set of related tables in a single data structure. The most common way of creating a TableSet is using the Table.group_by() method, which is similar to SQL’s GROUP BY keyword. The resulting set of tables will all have identical columns structure.

TableSet functions as a dictionary. Individual tables in the set can be accessed by using their name as a key. If the table set was created using Table.group_by() then the names of the tables will be the grouping factors found in the original data.

TableSet replicates the majority of the features of Table. When methods such as TableSet.select(), TableSet.where() or TableSet.order_by() are used, the operation is applied to each table in the set and the result is a new TableSet instance made up of entirely new Table instances.

TableSet instances can also contain other TableSet’s. This means you can chain calls to Table.group_by() and TableSet.group_by() and end up with data grouped across multiple dimensions. TableSet.aggregate() on nested TableSets will then group across multiple dimensions.

agate.TableSet

An group of named tables with identical column definitions.

Properties#

agate.TableSet.key_name

Get the name of the key this TableSet is grouped by.

agate.TableSet.key_type

Get the DataType this TableSet is grouped by.

agate.TableSet.column_types

Get an ordered list of this TableSet's column types.

agate.TableSet.column_names

Get an ordered list of this TableSet's column names.

Creating#

agate.TableSet.from_csv

Create a new TableSet from a directory of CSVs.

agate.TableSet.from_json

Create a new TableSet from a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for each Table.

Saving#

agate.TableSet.to_csv

Write each table in this set to a separate CSV in a given directory.

agate.TableSet.to_json

Write TableSet to either a set of JSON files for each table or a single nested JSON file.

Processing#

agate.TableSet.aggregate

Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new Table.

agate.TableSet.having

Create a new TableSet with only those tables that pass a test.

agate.TableSet.merge

Convert this TableSet into a single table.

Previewing#

agate.TableSet.print_structure

Print the keys and row counts of each table in the tableset.

Charting#

agate.TableSet.bar_chart

Render a lattice/grid of bar charts using leather.Lattice.

agate.TableSet.column_chart

Render a lattice/grid of column charts using leather.Lattice.

agate.TableSet.line_chart

Render a lattice/grid of line charts using leather.Lattice.

agate.TableSet.scatterplot

Render a lattice/grid of scatterplots using leather.Lattice.

Table Proxy Methods#

agate.TableSet.bins

Calls Table.bins() on each table in the TableSet.

agate.TableSet.compute

Calls Table.compute() on each table in the TableSet.

agate.TableSet.denormalize

Calls Table.denormalize() on each table in the TableSet.

agate.TableSet.distinct

Calls Table.distinct() on each table in the TableSet.

agate.TableSet.exclude

Calls Table.exclude() on each table in the TableSet.

agate.TableSet.find

Calls Table.find() on each table in the TableSet.

agate.TableSet.group_by

Calls Table.group_by() on each table in the TableSet.

agate.TableSet.homogenize

Calls Table.homogenize() on each table in the TableSet.

agate.TableSet.join

Calls Table.join() on each table in the TableSet.

agate.TableSet.limit

Calls Table.limit() on each table in the TableSet.

agate.TableSet.normalize

Calls Table.normalize() on each table in the TableSet.

agate.TableSet.order_by

Calls Table.order_by() on each table in the TableSet.

agate.TableSet.pivot

Calls Table.pivot() on each table in the TableSet.

agate.TableSet.select

Calls Table.select() on each table in the TableSet.

agate.TableSet.where

Calls Table.where() on each table in the TableSet.

Detailed list#

class agate.TableSet(tables, keys, key_name='group', key_type=None, _is_fork=False)#

Bases: MappedSequence

An group of named tables with identical column definitions. Supports (almost) all the same operations as Table. When executed on a TableSet, any operation that would have returned a new Table instead returns a new TableSet. Any operation that would have returned a single value instead returns a dictionary of values.

TableSet is implemented as a subclass of MappedSequence

Parameters:
  • tables – A sequence Table instances.

  • keys – A sequence of keys corresponding to the tables. These may be any type except int.

  • key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.

  • key_type – An instance some subclass of DataType. If not provided it will default to a :class`.Text`.

  • _is_fork – Used internally to skip certain validation steps when data is propagated from an existing tablset.

property key_name#

Get the name of the key this TableSet is grouped by. (If created using Table.group_by() then this is the original column name.)

property key_type#

Get the DataType this TableSet is grouped by. (If created using Table.group_by() then this is the original column type.)

property column_types#

Get an ordered list of this TableSet’s column types.

Returns:

A tuple of DataType instances.

property column_names#

Get an ordered list of this TableSet’s column names.

Returns:

A tuple of strings.

aggregate(aggregations)#

Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new Table.

aggregations must be a sequence of tuples, where each has two parts: a new_column_name and a Aggregation instance.

The resulting table will have the keys from this TableSet (and any nested TableSets) set as its row_names. See Table.__init__() for more details.

Parameters:

aggregations – A list of tuples in the format (new_column_name, aggregation), where each aggregation is an instance of Aggregation.

Returns:

A new Table.

bar_chart(label=0, value=1, path=None, width=None, height=None)#

Render a lattice/grid of bar charts using leather.Lattice.

Parameters:
  • label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.

  • value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.

  • path – If specified, the resulting SVG will be saved to this location. If None and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.

  • width – The width of the output SVG.

  • height – The height of the output SVG.

bins(*args, **kwargs)#

Calls Table.bins() on each table in the TableSet.

column_chart(label=0, value=1, path=None, width=None, height=None)#

Render a lattice/grid of column charts using leather.Lattice.

Parameters:
  • label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.

  • value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.

  • path – If specified, the resulting SVG will be saved to this location. If None and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.

  • width – The width of the output SVG.

  • height – The height of the output SVG.

compute(*args, **kwargs)#

Calls Table.compute() on each table in the TableSet.

count(value) integer -- return number of occurrences of value#
denormalize(*args, **kwargs)#

Calls Table.denormalize() on each table in the TableSet.

dict()#

Retrieve the contents of this sequence as an collections.OrderedDict.

distinct(*args, **kwargs)#

Calls Table.distinct() on each table in the TableSet.

exclude(*args, **kwargs)#

Calls Table.exclude() on each table in the TableSet.

find(*args, **kwargs)#

Calls Table.find() on each table in the TableSet.

classmethod from_csv(dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs)#

Create a new TableSet from a directory of CSVs.

See Table.from_csv() for additional details.

Parameters:
  • dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.

  • column_names – See Table.__init__().

  • column_types – See Table.__init__().

  • row_names – See Table.__init__().

  • header – See Table.from_csv().

classmethod from_json(path, column_names=None, column_types=None, keys=None, **kwargs)#

Create a new TableSet from a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for each Table.

See Table.from_json() for additional details.

Parameters:
  • path – Path to a directory containing JSON files or filepath/file-like object of nested JSON file.

  • keys – A list of keys of the top-level dictionaries for each file. If specified, length must be equal to number of JSON files in path.

  • column_types – See Table.__init__().

get(key, default=None)#

Equivalent to collections.OrderedDict.get().

group_by(*args, **kwargs)#

Calls Table.group_by() on each table in the TableSet.

having(aggregations, test)#

Create a new TableSet with only those tables that pass a test.

This works by applying a sequence of Aggregation instances to each table. The resulting dictionary of properties is then passed to the test function.

This method does not modify the underlying tables in any way.

Parameters:
  • aggregations – A list of tuples in the format (name, aggregation), where each aggregation is an instance of Aggregation.

  • test (function) – A function that takes a dictionary of aggregated properties and returns True if it should be included in the new TableSet.

Returns:

A new TableSet.

homogenize(*args, **kwargs)#

Calls Table.homogenize() on each table in the TableSet.

index(value[, start[, stop]]) integer -- return first index of value.#

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

items()#

Equivalent to collections.OrderedDict.items().

join(*args, **kwargs)#

Calls Table.join() on each table in the TableSet.

keys()#

Equivalent to collections.OrderedDict.keys().

limit(*args, **kwargs)#

Calls Table.limit() on each table in the TableSet.

line_chart(x=0, y=1, path=None, width=None, height=None)#

Render a lattice/grid of line charts using leather.Lattice.

Parameters:
  • x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.

  • y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.

  • path – If specified, the resulting SVG will be saved to this location. If None and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.

  • width – The width of the output SVG.

  • height – The height of the output SVG.

merge(groups=None, group_name=None, group_type=None)#

Convert this TableSet into a single table. This is the inverse of Table.group_by().

Any row_names set on the merged tables will be lost in this process.

Parameters:
  • groups – A list of grouping factors to add to merged rows in a new column. If specified, it should have exactly one element per Table in the TableSet. If not specified or None, the grouping factor will be the name of the Row’s original Table.

  • group_name – This will be the column name of the grouping factors. If None, defaults to the TableSet.key_name.

  • group_type – This will be the column type of the grouping factors. If None, defaults to the TableSet.key_type.

Returns:

A new Table.

normalize(*args, **kwargs)#

Calls Table.normalize() on each table in the TableSet.

order_by(*args, **kwargs)#

Calls Table.order_by() on each table in the TableSet.

pivot(*args, **kwargs)#

Calls Table.pivot() on each table in the TableSet.

print_structure(max_rows=20, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)#

Print the keys and row counts of each table in the tableset.

Parameters:
  • max_rows – The maximum number of rows to display before truncating the data. Defaults to 20.

  • output – The output used to print the structure of the Table.

Returns:

None

scatterplot(x=0, y=1, path=None, width=None, height=None)#

Render a lattice/grid of scatterplots using leather.Lattice.

Parameters:
  • x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.

  • y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.

  • path – If specified, the resulting SVG will be saved to this location. If None and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.

  • width – The width of the output SVG.

  • height – The height of the output SVG.

select(*args, **kwargs)#

Calls Table.select() on each table in the TableSet.

to_csv(dir_path, **kwargs)#

Write each table in this set to a separate CSV in a given directory.

See Table.to_csv() for additional details.

Parameters:

dir_path – Path to the directory to write the CSV files to.

to_json(path, nested=False, indent=None, **kwargs)#

Write TableSet to either a set of JSON files for each table or a single nested JSON file.

See Table.to_json() for additional details.

Parameters:
  • path – Path to the directory to write the JSON file(s) to. If nested is True, this should be a file path or file-like object to write to.

  • nested – If True, the output will be a single nested JSON file with each Table’s key paired with a list of row objects. Otherwise, the output will be a set of files for each table. Defaults to False.

  • indent – See Table.to_json().

values()#

Equivalent to collections.OrderedDict.values().

where(*args, **kwargs)#

Calls Table.where() on each table in the TableSet.