A container class for BIRCH cluster hierarchy
This is a dict like container, that is used to store each subcluster in the cluster hierarchy computed by
freediscovery.cluster.Birch. A given subcluster links to the parent / children subclusters in the hierarchy with the following attributes,
Each subcluster stores the following dictionary keys,
- document_id :
list, a list of document / sample ids contained in this subcluster (excluding its children).
- document_id_accumulated: a list of document / sample ids
contained in this subcluster and its children. Only available when
this class was build using
compute_document_id=Trueparameter. It can be re-computed with the
- cluster_size: int, the number of samples contained in this
subcluster and its children. This corresponds to the length of
document_id_accumulatedproperty. Only available when this class was build using
other keys may be user-computed as necessary.
See User Manual for more details.
This class descends from
freediscovery.externals.jwzthreading.Containeroriginally used to represent e-mail threads obtained with the JWZ algorithm in jwzthreading, though it is general enough to represent other hierarchical stuctures, such as BIRCH cluster hierarchy.
In FreeDiscovery this class is primarily used for documents. As a result the variables/methods containing the term “document” have the same meaning as “sample” in the general scikit-learn context.
Add a child to the container
Parameters: child (Container) – Child to add.
clear() → None. Remove all items from D.¶
copy() → a shallow copy of D¶
Compute the depth in the hierarchy of the current container
Print the content of hierarchical tree below this subcluster
Count of all documents in the children subclusters
Returns list of document / sample ids contained in this subcluster or any of its children.
Return a flatten version of the hierarchical tree
Returns: list – a flat list of containers Return type: Containers
Returns a new dict with keys from iterable and values equal to value.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
Check if ctr is a descendant of this container.
Parameters: ctr (Container) – possible descendant container. Returns: Return type: True if ctr is a descendant of self, else False.
Increment the cluster_id of all children by the given value
Check if the container has some content.
items() → a set-like object providing a view on D's items¶
keys() → a set-like object providing a view on D's keys¶
Truncate the tree to the provided maximum depth
Parameters: max_depth (int) – hierarchy depth to which truncate the tree
pop(k[, d]) → v, remove specified key and return the corresponding value.¶
If key is not found, d is returned if given, otherwise KeyError is raised
popitem() → (k, v), remove and return some (key, value) pair as a¶
2-tuple; but raise KeyError if D is empty.
Remove a child from the container
Parameters: child (Container) – Child to remove.
Get the root container
Returns: Containe Return type: the top most level container
setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
Recursively count the number of children containers. The current container is also included in the count.
update([E, ]**F) → None. Update D from dict/iterable E and F.¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values() → an object providing a view on D's values¶
- document_id :