TableSet#
The TableSet
class collects a set of related tables in a single data
structure. The most common way of creating a TableSet
is using the
Table.group_by()
method, which is similar to SQL’s GROUP BY
keyword.
The resulting set of tables will all have identical columns structure.
TableSet
functions as a dictionary. Individual tables in the set can
be accessed by using their name as a key. If the table set was created using
Table.group_by()
then the names of the tables will be the grouping
factors found in the original data.
TableSet
replicates the majority of the features of Table
.
When methods such as TableSet.select()
, TableSet.where()
or
TableSet.order_by()
are used, the operation is applied to each table
in the set and the result is a new TableSet
instance made up of
entirely new Table
instances.
TableSet
instances can also contain other TableSet’s. This means you
can chain calls to Table.group_by()
and TableSet.group_by()
and end up with data grouped across multiple dimensions.
TableSet.aggregate()
on nested TableSets will then group across multiple
dimensions.
An group of named tables with identical column definitions. |
Properties#
Creating#
Saving#
Write each table in this set to a separate CSV in a given directory. |
|
Write |
Processing#
Previewing#
Print the keys and row counts of each table in the tableset. |
Charting#
Render a lattice/grid of bar charts using |
|
Render a lattice/grid of column charts using |
|
Render a lattice/grid of line charts using |
|
Render a lattice/grid of scatterplots using |
Table Proxy Methods#
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
Detailed list#
- class agate.TableSet(tables, keys, key_name='group', key_type=None, _is_fork=False)#
Bases:
MappedSequence
An group of named tables with identical column definitions. Supports (almost) all the same operations as
Table
. When executed on aTableSet
, any operation that would have returned a newTable
instead returns a newTableSet
. Any operation that would have returned a single value instead returns a dictionary of values.TableSet is implemented as a subclass of
MappedSequence
- Parameters:
tables – A sequence
Table
instances.keys – A sequence of keys corresponding to the tables. These may be any type except
int
.key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.
key_type – An instance some subclass of
DataType
. If not provided it will default to a :class`.Text`._is_fork – Used internally to skip certain validation steps when data is propagated from an existing tablset.
- property key_name#
Get the name of the key this TableSet is grouped by. (If created using
Table.group_by()
then this is the original column name.)
- property key_type#
Get the
DataType
this TableSet is grouped by. (If created usingTable.group_by()
then this is the original column type.)
- property column_names#
Get an ordered list of this
TableSet
’s column names.- Returns:
A
tuple
of strings.
- aggregate(aggregations)#
Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new
Table
.aggregations
must be a sequence of tuples, where each has two parts: anew_column_name
and aAggregation
instance.The resulting table will have the keys from this
TableSet
(and any nested TableSets) set as itsrow_names
. SeeTable.__init__()
for more details.- Parameters:
aggregations – A list of tuples in the format
(new_column_name, aggregation)
, where eachaggregation
is an instance ofAggregation
.- Returns:
A new
Table
.
- bar_chart(label=0, value=1, path=None, width=None, height=None)#
Render a lattice/grid of bar charts using
leather.Lattice
.- Parameters:
label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.
value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- bins(*args, **kwargs)#
Calls
Table.bins()
on each table in the TableSet.
- column_chart(label=0, value=1, path=None, width=None, height=None)#
Render a lattice/grid of column charts using
leather.Lattice
.- Parameters:
label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.
value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- compute(*args, **kwargs)#
Calls
Table.compute()
on each table in the TableSet.
- count(value) integer -- return number of occurrences of value #
- denormalize(*args, **kwargs)#
Calls
Table.denormalize()
on each table in the TableSet.
- dict()#
Retrieve the contents of this sequence as an
collections.OrderedDict
.
- distinct(*args, **kwargs)#
Calls
Table.distinct()
on each table in the TableSet.
- exclude(*args, **kwargs)#
Calls
Table.exclude()
on each table in the TableSet.
- find(*args, **kwargs)#
Calls
Table.find()
on each table in the TableSet.
- classmethod from_csv(dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs)#
Create a new
TableSet
from a directory of CSVs.See
Table.from_csv()
for additional details.- Parameters:
dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.
column_names – See
Table.__init__()
.column_types – See
Table.__init__()
.row_names – See
Table.__init__()
.header – See
Table.from_csv()
.
- classmethod from_json(path, column_names=None, column_types=None, keys=None, **kwargs)#
Create a new
TableSet
from a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for eachTable
.See
Table.from_json()
for additional details.- Parameters:
path – Path to a directory containing JSON files or filepath/file-like object of nested JSON file.
keys – A list of keys of the top-level dictionaries for each file. If specified, length must be equal to number of JSON files in path.
column_types – See
Table.__init__()
.
- get(key, default=None)#
Equivalent to
collections.OrderedDict.get()
.
- group_by(*args, **kwargs)#
Calls
Table.group_by()
on each table in the TableSet.
- having(aggregations, test)#
Create a new
TableSet
with only those tables that pass a test.This works by applying a sequence of
Aggregation
instances to each table. The resulting dictionary of properties is then passed to thetest
function.This method does not modify the underlying tables in any way.
- Parameters:
aggregations – A list of tuples in the format
(name, aggregation)
, where eachaggregation
is an instance ofAggregation
.test (
function
) – A function that takes a dictionary of aggregated properties and returnsTrue
if it should be included in the newTableSet
.
- Returns:
A new
TableSet
.
- homogenize(*args, **kwargs)#
Calls
Table.homogenize()
on each table in the TableSet.
- index(value[, start[, stop]]) integer -- return first index of value. #
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- items()#
Equivalent to
collections.OrderedDict.items()
.
- join(*args, **kwargs)#
Calls
Table.join()
on each table in the TableSet.
- keys()#
Equivalent to
collections.OrderedDict.keys()
.
- limit(*args, **kwargs)#
Calls
Table.limit()
on each table in the TableSet.
- line_chart(x=0, y=1, path=None, width=None, height=None)#
Render a lattice/grid of line charts using
leather.Lattice
.- Parameters:
x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.
y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- merge(groups=None, group_name=None, group_type=None)#
Convert this TableSet into a single table. This is the inverse of
Table.group_by()
.Any row_names set on the merged tables will be lost in this process.
- Parameters:
groups – A list of grouping factors to add to merged rows in a new column. If specified, it should have exactly one element per
Table
in theTableSet
. If not specified or None, the grouping factor will be the name of theRow
’s original Table.group_name – This will be the column name of the grouping factors. If None, defaults to the
TableSet.key_name
.group_type – This will be the column type of the grouping factors. If None, defaults to the
TableSet.key_type
.
- Returns:
A new
Table
.
- normalize(*args, **kwargs)#
Calls
Table.normalize()
on each table in the TableSet.
- order_by(*args, **kwargs)#
Calls
Table.order_by()
on each table in the TableSet.
- pivot(*args, **kwargs)#
Calls
Table.pivot()
on each table in the TableSet.
- print_structure(max_rows=20, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)#
Print the keys and row counts of each table in the tableset.
- Parameters:
max_rows – The maximum number of rows to display before truncating the data. Defaults to 20.
output – The output used to print the structure of the
Table
.
- Returns:
None
- scatterplot(x=0, y=1, path=None, width=None, height=None)#
Render a lattice/grid of scatterplots using
leather.Lattice
.- Parameters:
x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.
y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
None
and running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- select(*args, **kwargs)#
Calls
Table.select()
on each table in the TableSet.
- to_csv(dir_path, **kwargs)#
Write each table in this set to a separate CSV in a given directory.
See
Table.to_csv()
for additional details.- Parameters:
dir_path – Path to the directory to write the CSV files to.
- to_json(path, nested=False, indent=None, **kwargs)#
Write
TableSet
to either a set of JSON files for each table or a single nested JSON file.See
Table.to_json()
for additional details.- Parameters:
path – Path to the directory to write the JSON file(s) to. If nested is True, this should be a file path or file-like object to write to.
nested – If True, the output will be a single nested JSON file with each Table’s key paired with a list of row objects. Otherwise, the output will be a set of files for each table. Defaults to False.
indent – See
Table.to_json()
.
- values()#
Equivalent to
collections.OrderedDict.values()
.
- where(*args, **kwargs)#
Calls
Table.where()
on each table in the TableSet.