How to Easily Understand Your Python Objects

Austin McKay
Insight
Published in
12 min readJul 23, 2019

--

Image used with permission from https://pixabay.com/photos/spacescape-telescope-stars-3583621/

Have you ever had a new Python object that you wanted to quickly familiarize yourself with? Or maybe you have a familiar object and you’re looking for that one particular method, but you don’t know how to describe it to Google. I frequently run into this issue in my data science workflow with complex objects in libraries, like TensorFlow. I also find myself wishing there was a faster way to get to know simple objects in new libraries, as documentation can be unavailable, incorrect, and time-consuming to look up.

In this blog post, I’ll show you how to deeply inspect objects yourself, and introduce a pip installable CLI tool I built called peep dis, which will do the work for you. If you want to jump straight to the tool, skip to the CLI Object Inspector: Peep Dis section.

Object Inspection

As a toy example, we’ll define a Rectangle class with a few simple methods and attributes.

The dir function is a simple built-in that lists all attributes and methods of an object unless __dir__ has been overloaded. This is what text editors and IDEs use for autocomplete.

>>> rect = Rectangle(3., 4.)
>>> dir(rect)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'a', 'area', 'b', 'bisect', 'scale']

The output is a list of strings representing the attributes and methods of the object, mostly consisting of built-ins. Usually, built-ins aren’t particularly useful and just add clutter.

Filtering Out Built-ins

Depending on our definition of built-ins, we can use either string filtering or type filtering to remove these.

String Filtering:

def dir_string_filter(obj):
is_magic = lambda x: (x.startswith('__') and x.endswith('__'))
return [x for x in dir(obj) if not is_magic(x)]
>>> dir_string_filter(rect)
['a', 'area', 'b', 'bisect', 'scale']

Type Filtering:

from types import BuiltinMethodTypedef dir_type_filter(obj):
is_builtin = lambda x: isinstance(getattr(obj, x), BuiltinMethodType)
return [x for x in dir(obj) if not is_builtin(x)]
>>> dir_type_filter(rect)
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__repr__', '__setattr__', '__str__', '__weakref__', 'a', 'area', 'b', 'bisect', 'scale']

String filtering removes all “magic” methods and attributes, while filtering by BuiltinMethodType filters out built-in methods written in C, which leaves magic attributes and removes many non-magic methods, like string manipulations. In most cases, the magic attributes and methods are what we’d like to exclude, so we’ll use the string filtering method.

>>> dir_filtered = dir_string_filter(rect)

Separating Methods from Attributes

Of the items returned after filtering, we still don’t know which are attributes and which are methods. We can use the built-in callable function to filter them.

Attributes:

>>> attrs = [x for x in dir_filtered if not callable(getattr(rect, x))]>>> attrs
['a', 'b']

Methods:

>>> methods = [x for x in dir_filtered if callable(getattr(rect, x))]>>> methods
['area', 'bisect', 'scale']

To see the values of the attributes:

>>> attr_outputs = {x: getattr(rect, x) for x in attrs}>>> attr_outputs
{'a': 3.0, 'b': 4.0}

Calling Methods

For the methods, it’s not quite as simple to see the output values. One risk associated with indiscriminately calling a random method is that it could modify the original object state. For example, Rectangle.bisect will return None, but it reduces the size of the rectangle by a factor of 2 (copied below).

 ...
def bisect(self):
""" reduce a by a factor of 2 to "cut in half" """
self.a /= 2

We can avoid modifications to the original object by making a copy.deepcopy of it before each method call, although this can be computationally intensive for large objects. Note that methods which modify class variables, global variables, or interact with their external environment may still have lasting effects.

The get_callable function defined below copies the original object and returns the method attached to that copy, which can be called independent of its parent object.

from copy import deepcopydef get_callable(obj, name: str):
return getattr(deepcopy(obj), name)

Methods that require positional arguments provide an additional challenge, like Rectangle.scale (copied below).

 ...
def scale(self, factor: float):
""" scale the side lengths by factor """
self.a = factor * self.a self.b = factor * self.b

We can get the outputs of the methods that don’t require positionals by using the “leap before you look policy”, or by using gestfullargspec from the insepct built-in module to determine which objects don’t require positional arguments and evaluating only those.

Calling Methods Technique 1: Leap Before You Look

def attempt_method_call(func):
try:
return str(func())
except:
return '(failed to evaluate method)'
>>> outputs = {x: attempt_method_call(get_callable(rect, x)) for x in methods}
>>> outputs
{'area': 12.0, 'bisect': None, 'scale': '(failed to evaluate method)', }

As expected, area and bisect executed successfully, whereas scale, which requires positional arguments, did not.

Calling Methods Technique 2: Check for Positionals

First, let’s introduce getfullargspec:

from inspect import getfullargspec>>> getfullargspec(rect.scale)
FullArgSpec(args=['self', 'factor'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={'factor': <class 'float'>})

It returns a FullArgSpec object. args contains the argument names. vargs and varkw contain the names of variable length arguments and keyword arguments, as specified by the * and ** operators, respectively (usually *args and **kwargs). defaults contains the default values for keyword arguments. kwonlyargs lists names of keyword-only args. kwonlydefaults is a dictionary with keyword-only arg default values. annotations is a dictionary specifying any type annotations.

We can use this information to check if a method has positional arguments and evaluate it only if it doesn’t. To start, we will attempt to get the FullArgSpec of the method, although not all callables are supported. Then, we’ll extract the args and define a utility function _remove_self to remove the self argument which is implicit to standard methods. Although it’s not done here, we could additionally avoid calling class methods by checking for the cls argument. Finally, if all args have defaults, then there are no positionals and the method can be called.

Using this method, we get the same results as the leap before you look method.

>>> method_outputs = {x: call_if_no_positionals(get_callable(rect, x)) for x in methods}>>> method_outputs
{'area': 12.0, 'bisect': None, 'scale': '(requires positional args)'}

Inferring Argument Types

Next, we can attempt to infer the type of each argument from any type annotations or default values. We defineinfer_arg_types, which starts out similarly to the call_if_no_positionals, but rather than calling the method, it populates an OrderedDict with the inferred types.

Calling this on our Rectangle instance, we get the types of all methods which require arguments, since they were all type hinted. Note, if they weren’t type hinted, this would only work for keyword arguments.

>>> method_arg_types = {x: get_arg_types(getattr(rect, x)) for x in methods}>>> method_arg_types
{'area': None, 'scale': OrderedDict([('factor', 'float')]), 'take_half': None}

Forging Arguments

If we want to see example outputs for methods that require positional arguments, we can attempt to use the argument types we inferred above to forge them by looking up sample values for each type. We can even attempt to forge collections if the content type is in the annotation (e.g. List[int]).

from typing import List_sample_args = {
'float': 1.5,
'int': 2,
'str': 'abc',
'List[int]': [1, 2, 3],
}

We will define a ForgeError so that any errors caused by attempting to forge arguments can be handled specifically. This will allow us to attempt to forge arguments for a collection of methods, even if some don’t work.

class ForgeError(ValueError):
pass

The forging function will take a method and look up sample arguments from _sample_args by type from the infer_arg_types output, raising errors if any arguments lacked defaults and types couldn’t be inferred, or if any types are presented that aren’t in _sample_args.

Since this is a fairly complex function, this would be a good place for some unit testing.

Next, we can define a function that takes an object and iterates over all of its methods and uses our forge_args function to attempt to forge the arguments for each using the “leap before you look” approach and noting the reason for any failures.

Let’s give this a try on our Rectangle instance:

>>> forged_outputs = forge_and_eval_methods(rect)
>>> forged_outputs
{'area': '12.0', 'bisect': None, 'scale': 'None'}

The difference between this result and our earlier result is subtle, but notice that scale now outputs ‘None’ rather than ‘requires positional args’. That’s because the method was called successfully with the forged arguments, but rather than returning anything, it modifies the state of rect by changing attributes a and b. It would be nice to track these modifications so that we can understand what methods do, even when they don’t return anything.

Tracking State Modification: Comparison Technique

In this toy example, Rectangle.scale modifies the dimensions, a and b of the Rectangle, but it’s hard for us to tell what happened since the method doesn’t return anything. We can track these modifications by saving a copy of all the objects’ attributes before the method call, then comparing them to the attributes after. We can define a StateComparator object to allow us to save the current attributes using the __dict__ attribute, then check for new additions, deletions, and modifications of attributes after the method call.

The true implementation is a bit more complex because most built-ins don’t have a __dict__ attribute.

Using the forge_and_eval_methods function as a template, we can define a new function which includes state tracking and an option to turn argument forging on or off.

Testing this on our rect:

>>> call_all_tracked(rect)
{'area': '12.0', 'bisect': {'state changes': {'modified': {'a': (3.0, 1.5)}}}, 'scale': {'output': {'state changes': {'modified': {'a': (3.0, 4.5), 'b': (4.0, 6.0)}}}}}

Now any state changes are specified in dictionaries, where modifications are specified by a tuple where the first number represents the initial value, and the second represents the final value.

Unfortunately, forging arguments from keyword arguments and annotations is difficult because most Python code is not type-hinted, and much of it is unsupported by getargspec. In these cases, arguments forgery could also be attempted by brute force or extraction from docstrings, which are planned features for peep dis.

Simply printing out docstrings might be an easier way to understand methods that require arguments in most cases. They can be systematically printed out from the __doc__ attribute.

>>> for x in dir_filtered: 
>>> attr = getattr(x, __doc__, "No docstring")
>>> print(f'{x}: {attr}')

The output is too long to include here, and it’s difficult to decipher since it isn’t color-coded. The output can easily be colorized with termcolor, which is what was used for peep dis.

CLI Object Inspector: Peep Dis

We’ll take a quick look at how peep dis can be useful in two canonical cases.

I. The Mystery Object

We have a simple mystery_obj which contains an array of San Francisco temperatures somewhere within it, but we don’t know where. We could call dir, then iteratively check each method or attribute, or we could just peep the object. We can quickly identify stdtemp as the attribute we need.

Built-ins are filtered out, and outputs for the rest of the attributes and methods without positional arguments are printed. Methods are colored purple, and attributes are cyan. The outputs from methods requiring positional arguments are grayed out to allow us to skim others more quickly.

There are additional keyword arguments to include built-ins, including private methods, print docstrings, and truncate output lengths. Peep dis can also be used in a debugger, Jupyter Notebook, or IDE console.

II. What’s the name of that method?

We have a DataFrame with the columns temp and humidity for San Francisco, which we want to convert to a narrow data model for an API we are building. There’s a one-liner for this, but nothing stands out in dir, and nothing turns up on Stack Overflow. If we peep the DataFrame, we’ll quickly identify melt as the method we need.

To see what this process would have looked like the old fashioned way, see the Appendix below.

Conclusion

Thanks for reading, and please feel free to send me feedback on peep dis. If you like the library, please star it as well so that I know people are interested in its continued development. If you want to contribute, I would love to facilitate that.

Are you interested in working on high-impact projects and transitioning to a career in tech? Sign up to learn more about Insight Fellows programs.

Appendix: Old Fashioned Way

I. The mystery object

>>> dir(myster_obj)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'itemp', 'mtemp', 'stdtemp', 'temp']
>>> obj.mtemp
<bound method WeatherSeries.mtemp of <main.WeatherSeries object at 0x7fdb6ed32748>>
>>> mystery_obj.mtemp()
{'min': 67, 'max': 71, 'index min': 0, 'index max': 4, 'len': 6}
>>> mystery_obj.itemp
<bound method WeatherSeries.itemp of <__main__.WeatherSeries object at 0x7fdb6ed32780>>
>>> mystery_obj.itemp()
TypeError: itemp() missing 1 required positional argument: 'i'
>>> mystery_obj.itemp(0)
67
>>> mystery_obj.temp()
array([ 0, 67],
[ 1, 69],
[ 2, 70],
[ 3, 70],
[ 4, 71],
[ 5, 70]])
>>> mystery_obj.stdtemp()
TypeError: 'numpy.ndarray' object is not callable
>>> mystery_obj.stdtemp
array([67, 69, 70, 70, 71, 70])

II. What’s the name of that method?

>>> df
humitidy temp
0 65 67
1 65 68
2 60 68
3 60 69
4 55 70
>>> dir(df)
['T', '_AXIS_ALIASES', '_AXIS_IALIASES', '_AXIS_LEN', '_AXIS_NAMES', '_AXIS_NUMBERS', '_AXIS_ORDERS', '_AXIS_REVERSED', '_AXIS_SLICEMAP', '__abs__', '__add__', '__and__', '__array__', '__array_wrap__', '__bool__', '__bytes__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__div__', '__doc__', '__eq__', '__finalize__', '__floordiv__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__invert__', '__ipow__', '__isub__', '__iter__', '__itruediv__', '__le__', '__len__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__unicode__', '__weakref__', '__xor__', '_accessors', '_add_numeric_operations', '_add_series_only_operations', '_add_series_or_dataframe_operations', '_agg_by_level', '_agg_doc', '_aggregate', '_aggregate_multiple_funcs', '_align_frame', '_align_series', '_apply_broadcast', '_apply_empty_result', '_apply_raw', '_apply_standard', '_at', '_box_col_values', '_box_item_values', '_builtin_table', '_check_inplace_setting', '_check_is_chained_assignment_possible', '_check_percentile', '_check_setitem_copy', '_clear_item_cache', '_combine_const', '_combine_frame', '_combine_match_columns', '_combine_match_index', '_combine_series', '_combine_series_infer', '_compare_frame', '_compare_frame_evaluate', '_consolidate', '_consolidate_inplace', '_construct_axes_dict', '_construct_axes_dict_for_slice', '_construct_axes_dict_from', '_construct_axes_from_arguments', '_constructor', '_constructor_expanddim', '_constructor_sliced', '_convert', '_count_level', '_create_indexer', '_cython_table', '_dir_additions', '_dir_deletions', '_ensure_valid_index', '_expand_axes', '_flex_compare_frame', '_from_arrays', '_from_axes', '_get_agg_axis', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_axis_resolvers', '_get_block_manager_axis', '_get_bool_data', '_get_cacher', '_get_index_resolvers', '_get_item_cache', '_get_numeric_data', '_get_values', '_getitem_array', '_getitem_column', '_getitem_frame', '_getitem_multilevel', '_getitem_slice', '_gotitem', '_iat', '_iget_item_cache', '_iloc', '_indexed_same', '_info_axis', '_info_axis_name', '_info_axis_number', '_info_repr', '_init_dict', '_init_mgr', '_init_ndarray', '_internal_names', '_internal_names_set', '_is_builtin_func', '_is_cached', '_is_cython_func', '_is_datelike_mixed_type', '_is_mixed_type', '_is_numeric_mixed_type', '_is_view', '_ix', '_ixs', '_join_compat', '_loc', '_maybe_cache_changed', '_maybe_update_cacher', '_metadata', '_needs_reindex_multi', '_obj_with_exclusions', '_protect_consolidate', '_reduce', '_reindex_axes', '_reindex_axis', '_reindex_columns', '_reindex_index', '_reindex_multi', '_reindex_with_indexers', '_repr_data_resource_', '_repr_fits_horizontal_', '_repr_fits_vertical_', '_repr_html_', '_repr_latex_', '_reset_cache', '_reset_cacher', '_sanitize_column', '_selected_obj', '_selection', '_selection_list', '_selection_name', '_series', '_set_as_cached', '_set_axis', '_set_axis_name', '_set_is_copy', '_set_item', '_setitem_array', '_setitem_frame', '_setitem_slice', '_setup_axes', '_shallow_copy', '_slice', '_stat_axis', '_stat_axis_name', '_stat_axis_number', '_try_aggregate_string_function', '_typ', '_unpickle_frame_compat', '_unpickle_matrix_compat', '_update_inplace', '_validate_dtype', '_values', '_where', '_xs', 'a', 'abs', 'add', 'add_prefix', 'add_suffix', 'agg', 'aggregate', 'align', 'all', 'any', 'append', 'apply', 'applymap', 'as_blocks', 'as_matrix', 'asfreq', 'asof', 'assign', 'astype', 'at', 'at_time', 'axes', 'b', 'between_time', 'bfill', 'blocks', 'bool', 'boxplot', 'clip', 'clip_lower', 'clip_upper', 'columns', 'combine', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'corrwith', 'count', 'cov', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'dtypes', 'duplicated', 'empty', 'eq', 'equals', 'eval', 'ewm', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'first_valid_index', 'floordiv', 'from_csv', 'from_dict', 'from_items', 'from_records', 'ftypes', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'head', 'hist', 'iat', 'idxmax', 'idxmin', 'iloc', 'index', 'info', 'insert', 'interpolate', 'is_copy', 'isin', 'isnull', 'items', 'iteritems', 'iterrows', 'itertuples', 'ix', 'join', 'keys', 'kurt', 'kurtosis', 'last', 'last_valid_index', 'le', 'loc', 'lookup', 'lt', 'mad', 'mask', 'max', 'mean', 'median', 'melt', 'memory_usage', 'merge', 'min', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'nlargest', 'notnull', 'nsmallest', 'nunique', 'pct_change', 'pipe', 'pivot', 'pivot_table', 'plot', 'pop', 'pow', 'prod', 'product', 'quantile', 'query', 'radd', 'rank', 'rdiv', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'replace', 'resample', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rolling', 'round', 'rpow', 'rsub', 'rtruediv', 'sample', 'select', 'select_dtypes', 'sem', 'set_axis', 'set_index', 'set_value', 'shape', 'shift', 'size', 'skew', 'slice_shift', 'sort_index', 'sort_values', 'sortlevel', 'squeeze', 'stack', 'std', 'style', 'sub', 'subtract', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_clipboard', 'to_csv', 'to_dense', 'to_dict', 'to_excel', 'to_feather', 'to_gbq', 'to_hdf', 'to_html', 'to_json', 'to_latex', 'to_msgpack', 'to_panel', 'to_period', 'to_pickle', 'to_records', 'to_sparse', 'to_sql', 'to_stata', 'to_string', 'to_timestamp', 'to_xarray', 'transform', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unstack', 'update', 'values', 'var', 'where', 'xs']

It could take you about ten minutes to figure out that all you needed was df.melt.

>>> df.melt()
variable value
0 humitidy 65
1 humitidy 65
2 humitidy 60
3 humitidy 60
4 humitidy 55
5 humitidy 55
6 temp 67
7 temp 69
8 temp 70
9 temp 70
10 temp 71
11 temp 70

--

--