hypodisc package

hypodisc module

hypodisc.core module

hypodisc.core.sequential.explore(parent: hypodisc.core.structures.GraphPattern, candidates: set, max_length: int, max_width: int, min_support: int) set[source]

Explore all predicate-object pairs which where added by the previous iteration as possible endpoints to expand from.

Parameters
  • pattern (GraphPattern) –

  • candidates (set) –

  • max_length (int) –

  • max_width (int) –

  • min_support (int) –

Return type

set

hypodisc.core.sequential.extend(pattern: hypodisc.core.structures.GraphPattern, endpoint: hypodisc.core.structures.Variable, extension: hypodisc.core.structures.Assertion) hypodisc.core.structures.GraphPattern[source]

Extend a graph_pattern from a given endpoint variable by evaluating all possible candidate extensions on whether they satisfy the minimal support and confidence.

Parameters
Return type

GraphPattern

hypodisc.core.sequential.generate(root_patterns: dict[str, list], depths: range, min_support: int, p_explore: float, p_extend: float, max_length: int, max_width: int, out_writer: Optional[rdf.formats.NTriples], out_prefix_map: Optional[dict[str, str]], out_ns: Optional[rdf.terms.IRIRef], strategy: Literal['BFS', 'DFS']) int[source]
Generate all patterns up to and including a maximum depth which

satisfy a minimal support.

Parameters
  • depths (range) –

  • min_support (int) –

  • p_explore (float) –

  • p_extend (float) –

  • max_length (int) –

  • max_width (int) –

hypodisc.core.sequential.generate_bf(root_patterns: dict[str, list], depths: range, min_support: int, p_explore: float, p_extend: float, max_length: int, max_width: int, out_writer: Optional[rdf.formats.NTriples], out_prefix_map: Optional[dict[str, str]], out_ns: Optional[rdf.terms.IRIRef]) int[source]
Generate all patterns up to and including a maximum depth which

satisfy a minimal support, using a breadth first approach. This approach has the anytime property yet uses more memory.

Parameters
  • depths (range) –

  • min_support (int) –

  • p_explore (float) –

  • p_extend (float) –

  • max_length (int) –

  • max_width (int) –

hypodisc.core.sequential.generate_df(root_patterns: dict[str, list], depths: range, min_support: int, p_explore: float, p_extend: float, max_length: int, max_width: int, out_writer: Optional[rdf.formats.NTriples], out_prefix_map: Optional[dict[str, str]], out_ns: Optional[rdf.terms.IRIRef]) int[source]
Generate all patterns up to and including a maximum depth which

satisfy a minimal support, using a depth first approach. This approach uses less memory but does not have the anytime property.

Parameters
  • depths (range) –

  • min_support (int) –

  • p_explore (float) –

  • p_extend (float) –

  • max_length (int) –

  • max_width (int) –

hypodisc.core.sequential.infer_type(kg: hypodisc.data.graph.KnowledgeGraph, rdf_type_idx: int, node_idx: int) tuple[typing.Union[rdf.terms.IRIRef, str], bool][source]
Infer the (data) type or language tag of a resource. Defaults to

rdfs:Class if none can be inferred.

Parameters
Return type

IRIRef

hypodisc.core.sequential.init_root_patterns(rng: numpy.random._generator.Generator, kg: hypodisc.data.graph.KnowledgeGraph, min_support: float, mode: Literal['A', 'AT', 'T'], textual_support: bool, numerical_support: bool, temporal_support: bool, exclude: list[str]) dict[str, list][source]

Creating all patterns of types which satisfy minimal support.

Parameters
  • rng (np.random.Generator) –

  • kg (KnowledgeGraph) –

  • min_support (float) –

  • mode (Literal["A", "AT", "T"]) –

  • multimodal (bool) –

Returns

Return type

dict[str,list]

hypodisc.core.sequential.new_graph_pattern(root_var, p: rdf.terms.IRIRef, o_value: Union[rdf.terms.IRIRef, rdf.terms.Literal], o_type: rdf.terms.IRIRef, domain: set, inv_assertion_map: dict[int, set[int]]) hypodisc.core.structures.GraphPattern[source]

Create a new graph_pattern and compute members and metrics

Parameters
Return type

GraphPattern

Returns

an instance of type GraphPattern

hypodisc.core.sequential.new_mm_graph_pattern(root_var, var_o: Union[hypodisc.core.structures.ObjectTypeVariable, hypodisc.core.structures.DataTypeVariable], domain: set, p: rdf.terms.IRIRef, inv_assertion_map: dict[int, set[int]]) hypodisc.core.structures.GraphPattern[source]

Create a new multimodal graph_pattern and compute members and metrics

Parameters
Return type

GraphPattern

Returns

an instance of type GraphPattern

hypodisc.core.sequential.new_var_graph_pattern(root_var, var_o: Union[hypodisc.core.structures.ObjectTypeVariable, hypodisc.core.structures.DataTypeVariable], domain: set, p: rdf.terms.IRIRef, inv_assertion_map: dict[int, set[int]]) hypodisc.core.structures.GraphPattern[source]

Create a new variable graph_pattern and compute members and metrics

Parameters
Return type

GraphPattern

Returns

an instance of type GraphPattern

hypodisc.core.sequential.random() x in the interval [0, 1).
class hypodisc.core.structures.Assertion(subject: Union[hypodisc.core.structures.ResourceWrapper, hypodisc.core.structures.Variable], predicate: rdf.terms.Resource, object: Union[hypodisc.core.structures.ResourceWrapper, hypodisc.core.structures.Variable])[source]

Bases: tuple

Assertion class

Wrapper around tuple (an assertion) that gives each instantiation an unique uuid which allows for comparisons between assertions with the same values. This is needed when either lhs or rhs use TypeVariables.

Initialize a new instance of Assertion

Parameters
  • subject – the subject of an assertion

  • predicate – the predicate of an assertion

  • object – the object of an assertion

  • _uuid – identifier of this instance

Returns

None

copy(deep: bool = False) hypodisc.core.structures.Assertion[source]

Return a copy of this object

Parameters

deep – Copy the UUID and HASH as well

Returns

a copy of type Assertion

Return type

Assertion

equal(other: hypodisc.core.structures.Assertion) bool[source]

Return true if assertions are equal content wise

Parameters

other (Assertion) –

Return type

bool

equiv(other: hypodisc.core.structures.Assertion) bool[source]

Return true is assertions are equivalent, which implies that they state the same or have an entity/attribute that is an instance of the type specified by the other.

Parameters

other (Assertion) –

Return type

bool

class hypodisc.core.structures.DataTypeVariable(resource: rdf.terms.IRIRef)[source]

Bases: hypodisc.core.structures.TypeVariable

Data Type Variable class

An unbound variable which can take on any value of a data type class (literal)

Initialize and instance of this class

Params resource

the IRI of the datatype

Returns

None

class hypodisc.core.structures.GraphPattern(assertions: set[hypodisc.core.structures.Assertion] = {}, domain: set = {}, parent: Optional[hypodisc.core.structures.GraphPattern] = None)[source]

Bases: object

GraphPattern class

Holds all assertions of a graph pattern and keeps track of the connections and distances (from the root) of these assertions.

__init__.

Parameters

identity (IdentityAssertion) –

Return type

None

as_dot(prefix_map: dict[str, str]) str[source]

Return a dot representation of this pattern.

Parameters

prefix_map (dict[str, str]) –

Return type

str

as_query(prefix_map: dict[str, str]) str[source]

Return a SPARQL query representation of this pattern.

Parameters

prefix_map (dict[str, str]) –

Return type

str

contains_at_depth(assertion: hypodisc.core.structures.Assertion, depth: int) bool[source]
copy() hypodisc.core.structures.GraphPattern[source]

Create a deep copy, except for the assertions, which remain as pointers.

Return type

GraphPattern

depth() int[source]

Return the length of the longest non-cyclic path

Return type

int

extend(endpoint: hypodisc.core.structures.Variable, extension: hypodisc.core.structures.Assertion) None[source]

Extend the body by appending a new assertion to an existing endpoint.

Parameters
Return type

None

width(depth: Optional[int] = None) int[source]

Return the maximum number of assertions on a certain depth, or the most overall if depth is None.

Return type

int

class hypodisc.core.structures.MultiModalNumericVariable(resource: rdf.terms.IRIRef, crange: tuple[float, float])[source]

Bases: hypodisc.core.structures.MultiModalVariable

Numeric Variable class

Initialize and instance of this class

Returns

None

equiv(other: hypodisc.core.structures.MultiModalNumericVariable) bool[source]
Return true if self and other represent the same

datatype, and have the same mean and variance.

Parameters

other (MultiModalNumericVariable) –

Return type

bool

class hypodisc.core.structures.MultiModalStringVariable(resource, regex)[source]

Bases: hypodisc.core.structures.MultiModalVariable

String Variable class

Initialize and instance of this class

Returns

None

equiv(other: hypodisc.core.structures.MultiModalStringVariable) bool[source]
Return true if self and other represent the same

datatype, and have the same regular expression.

Parameters

other (MultiModalStringVariable) –

Return type

bool

class hypodisc.core.structures.MultiModalVariable(resource: rdf.terms.IRIRef)[source]

Bases: hypodisc.core.structures.DataTypeVariable

Multimodal Variable class

An unbound variable which conveys a description of a cluster of values for node features.

Initialize and instance of this class

Params resource

the IRI of the datatype

Returns

None

equiv(other: hypodisc.core.structures.MultiModalVariable) bool[source]
class hypodisc.core.structures.ObjectTypeVariable(resource: rdf.terms.IRIRef)[source]

Bases: hypodisc.core.structures.TypeVariable

Object Type Variable class

An unbound variable which can be any member of an object type class (entity)

Initialize and instance of this class

Params resource

the IRI of the class

Returns

None

class hypodisc.core.structures.ResourceWrapper(resource: rdf.terms.Resource, type: Optional[rdf.terms.IRIRef] = None)[source]

Bases: rdf.terms.IRIRef

Resource Wrapper class

A wrapper which can take on any value of a certain resource, but which stores additional information

Initialize and instance of this class

Params resource

the IRI or Literal value

Returns

None

class hypodisc.core.structures.TypeVariable(resource: rdf.terms.IRIRef)[source]

Bases: hypodisc.core.structures.Variable

Type Variable class

An unbound variable which can take on any value of a certain object or data type resource. Each instance has an unique ID to allow this object to be used as variable shared between assertions.

Initialize and instance of this class

Params resource

the IRI of the class or datatype

Returns

None

equiv(other: hypodisc.core.structures.TypeVariable) bool[source]
class hypodisc.core.structures.Variable(resource: rdf.terms.IRIRef)[source]

Bases: rdf.terms.IRIRef

Variable class

An unbound variable which can take on any value of a certain object or data type resource. Each instance has an unique ID to allow this object to be used as variable shared between assertions.

Initialize and instance of this class

Params resource

the IRI of the class or datatype

Returns

None

equiv(other: hypodisc.core.structures.Variable) bool[source]
hypodisc.core.utils.floatProbabilityArg(arg: str) float[source]

Custom argument type for probability

Parameters

arg (str) – user provided argument string containing a real-valued value

Return type

float

Returns

a probability in [0, 1]

hypodisc.core.utils.integerRangeArg(arg: str) range[source]

Custom argument type for range

Parameters

arg (str) – user provided argument string of form ‘from:to’, ‘:to’, or ‘to’, with ‘from’ and ‘to’ being positive integers.

Return type

range

Returns

range of values to explore

hypodisc.core.utils.predict_hash(pattern: hypodisc.core.structures.GraphPattern, endpoint: hypodisc.core.structures.Variable, extension: hypodisc.core.structures.Assertion) int[source]
hypodisc.core.utils.read_version(filename: str) str[source]

Parse the project’s version

Parameters

filename (str) – path to ‘pyproject.toml’

Return type

str

Returns

the project’s version as a string

hypodisc.core.utils.rng_set_seed(seed: Optional[int] = None) numpy.random._generator.Generator[source]

Set seed of the random number generators.

Parameters

seed (Optional[int]) – a custom seed (optional)

Return type

np.random.Generator

Returns

a random number generator

hypodisc.core.utils.strNamespaceArg(arg: str) Tuple[str, str][source]

hypodisc.data module

class hypodisc.data.graph.KnowledgeGraph(rng: numpy.random._generator.Generator, paths: list[str])[source]

Bases: object

Knowledge Graph stored in vector representation plus query functions

Knowledge Graph stored in vector representation plus query

functions

Parameters

rng (np.random.Generator) –

Return type

None

parse() None[source]

Parse graph on file level.

Supports plain or gzipped NTriple or NQuad files

Parameters

path (list[str]) –

Return type

None

class hypodisc.data.graph.UniqueLiteral(value: str, datatype: Optional[rdf.terms.IRIRef] = None, language: Optional[str] = None)[source]

Bases: rdf.terms.Literal

An RDF Literal.

Parameters
  • value (str) – The value of the literal.

  • datatype (Optional[IRIRef]) – An optional datatype.

  • language (Optional[str]) – An optional language tag

Return type

None

hypodisc.data.graph.irisplit(e: rdf.terms.IRIRef) Tuple[str, str][source]
hypodisc.data.graph.mkprefixes(namespaces: Set[str], custom_prefix_map: Optional[dict[str, str]] = None) dict[str, str][source]
hypodisc.data.graph.ns2pf(prefix_map: dict[str, str], iri: rdf.terms.IRIRef) str[source]
exception hypodisc.data.utils.UnsupportedSerializationFormat[source]

Bases: Exception

hypodisc.data.utils.mkfile(directory: str, basename: str, extension: str) pathlib.Path[source]
Return path to a new file. Adds numerical suffix if

the file already exists.

Parameters
  • directory (str) –

  • basename (str) –

  • extension (str) –

Return type

Path

hypodisc.data.utils.write_query(f_out: rdf.formats.NTriples, pattern: hypodisc.core.structures.GraphPattern, num_patterns: int, base: rdf.terms.IRIRef, prefix_map: dict[str, str]) int[source]

hypodisc.multimodal module

hypodisc.multimodal.clustering.cast_values(dtype: rdf.terms.IRIRef, values: list) tuple[numpy.ndarray, numpy.ndarray][source]
Cast raw values to a datatype suitable for clustering.

Default to string.

Parameters
  • dtype (IRIRef) –

  • values (list) –

Return type

np.ndarray

hypodisc.multimodal.clustering.cast_values_rev(dtype: rdf.terms.IRIRef, clusters: list[tuple]) list[tuple[set, typing.Any]][source]

Cast clusters to relevant datatypes ranges

Parameters
  • dtype (IRIRef) –

  • clusters (list[tuple]) –

Return type

list[tuple[set, Any]]

hypodisc.multimodal.clustering.cast_values_rev_dist(dtype: rdf.terms.IRIRef, clusters: list[tuple]) list[tuple[set, typing.Any]][source]

Cast clusters to relevant datatypes distributions

Parameters
  • dtype (IRIRef) –

  • clusters (list[tuple]) –

Return type

list[tuple[set, Any]]

hypodisc.multimodal.clustering.compute_GMM(rng: numpy.random._generator.Generator, X: numpy.ndarray, sample: numpy.ndarray, num_components: int, num_tries: int, eps: float) tuple[float, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]
Compute a GMM from different random states and return the best results.

Train the model on the sample but returns the predictions on X

Parameters
  • rng (np.random.Generator) –

  • X (np.ndarray) –

  • num_components (int) –

  • num_tries (int) –

  • eps (float) –

Return type

list

hypodisc.multimodal.clustering.compute_clusters(rng: numpy.random._generator.Generator, dtype: rdf.terms.IRIRef, values: list, values_gidx: numpy.ndarray) list[tuple[set, typing.Any]][source]

Compute clusters from list of values.

Parameters
  • rng (np.random.Generator) –

  • dtype (IRIRef) –

  • values (list[Literal]) –

  • values_gidx (np.ndarray) –

Return type

list[tuple[set, Any]]

hypodisc.multimodal.clustering.compute_numeric_clusters(rng: numpy.random._generator.Generator, X: numpy.ndarray, num_components: range, num_tries: int = 3, eps: float = 0.001, standardize: bool = True, shuffle: bool = True) tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]
Compute numerical cluster means and stdevs for a range of possible

number of components. Also return the cluster assignments.

Parameters
  • rng (np.random.Generator) –

  • X (np.ndarray) –

  • num_components (range) –

  • num_tries (int) –

  • eps (float) –

  • standardize (bool) –

  • shuffle (bool) –

Return type

dict

hypodisc.multimodal.clustering.string_clusters(X: numpy.ndarray, X_gidx: numpy.ndarray, merge_charsets: bool = True, omit_empty: bool = True) list[tuple[hypodisc.multimodal.langutil.RegexPattern, set[int]]][source]
Generate clusters of string by infering regex patterns and by

generalizing these on similarity.

Parameters
  • object_list (np.ndarray) –

  • merge_charsets (bool) –

  • omit_empty (bool) –

Return type

dict[RegexPattern,int]

class hypodisc.multimodal.langutil.RegexCharAtom(charset_str: str)[source]

Bases: hypodisc.multimodal.langutil.RegexCharSet

A regular expression character set, representing a solitary

character (eg ‘s’).

Parameters

value (str) –

Return type

None

add(char: Optional[str]) None[source]

Add character to character set count.

Parameters

char (str) –

Return type

None

copy() hypodisc.multimodal.langutil.RegexCharAtom[source]

Return a deep copy of this character set.

Return type

‘RegexCharAtom’

exact() str[source]
Return true if self and other have the same reduced character set

and quantifiers.

Return type

str

class hypodisc.multimodal.langutil.RegexCharRange(charset: Optional[numpy.ndarray] = None, charset_str: Optional[str] = None, complement: bool = False)[source]

Bases: hypodisc.multimodal.langutil.RegexCharSet

A regular expression character set, representing a group with

optional alternatives (eg ‘(a|b|c)’), or a range (eg ‘[a-z]’).

Parameters
  • charset (Optional[np.ndarray]) –

  • charset_str (Optional[str]) –

  • complement (bool) –

Return type

None

add(char: str) None[source]

Add character to character set count.

Parameters

char (str) –

Return type

None

copy() hypodisc.multimodal.langutil.RegexCharRange[source]

Return a deep copy of this object.

Return type

‘RegexCharRange’

exact() str[source]
Return true if self and other have the same reduced character set

and quantifiers.

Return type

str

class hypodisc.multimodal.langutil.RegexCharSet(complement: bool = False)[source]

Bases: object

A regular expression character set, representing a solitary

character (eg ‘s’), a group with optional alternatives (eg ‘(a|b|c)’), or a range (eg ‘[a-z]’).

Parameters

complement (bool) –

Return type

None

charset() str[source]

Return the full character set without quantifiers.

Return type

str

equiv(other: Union[hypodisc.multimodal.langutil.RegexCharAtom, hypodisc.multimodal.langutil.RegexCharRange]) bool[source]
Return true if self and other have the same full character set,

ignoring any quantifiers.

Parameters

other (Union['RegexCharAtom', 'RegexCharRange']) –

Return type

bool

exact() str[source]
Return a reduced character set that exactly matches the input, but

nothing beyond it.

Return type

str

merge(other: Union[hypodisc.multimodal.langutil.RegexCharRange, hypodisc.multimodal.langutil.RegexCharAtom]) Union[hypodisc.multimodal.langutil.RegexCharRange, hypodisc.multimodal.langutil.RegexCharAtom][source]
Return a new character set object which encompasses both self and

other, by merging the sets and adjusting the quantifiers.

Parameters

other (Union['RegexCharRange', 'RegexCharAtom']) –

Return type

Union[‘RegexCharRange’,’RegexCharAtom’]

weak_match(other: Union[hypodisc.multimodal.langutil.RegexCharAtom, hypodisc.multimodal.langutil.RegexCharRange]) bool[source]
Return true if self and other have the same reduced character set

and quantifiers.

Parameters

other (Union['RegexCharAtom', 'RegexCharRange']) –

Return type

bool

class hypodisc.multimodal.langutil.RegexPattern[source]

Bases: object

A regular expression consisting of solitary characters (eg ‘s’),

groups with optional alternatives (eg ‘(a|b|c)’), and ranges (eg ‘[a-z]’).

Return type

None

add(charset: Union[hypodisc.multimodal.langutil.RegexCharAtom, hypodisc.multimodal.langutil.RegexCharRange]) None[source]
Add a character set to this regular expression. Character sets are

assumed ordered sequentially.

Parameters

value (Union[RegexCharAtom, RegexCharRange]) –

Return type

None

copy() hypodisc.multimodal.langutil.RegexPattern[source]

Return a deep copy of this object.

Return type

‘RegexPattern’

equiv(other: hypodisc.multimodal.langutil.RegexPattern) bool[source]
Return true if self has the same full expression as other,

excluding quantifiers.

Parameters

other ('RegexPattern') –

Return type

bool

exact() str[source]
Return a reduced pattern that exactly matches the input, but

nothing beyond it.

Return type

str

generalize() hypodisc.multimodal.langutil.RegexPattern[source]
Generalize character sets on word level. For example,

‘[A-Z][a-z]{2}s’ would yield ‘[A-Za-z]{3}’. The generalized expression is returned as a new instance.

Return type

‘RegexPattern’

weak_match(other: hypodisc.multimodal.langutil.RegexPattern) bool[source]
Return a reduced regular expresion that exactly matches the input,

but nothing beyond it.

Parameters

other ('RegexPattern') –

Return type

bool

hypodisc.multimodal.langutil.generalize_patterns(patterns: dict[hypodisc.multimodal.langutil.RegexPattern, set[int]], num_recursions: int = 1) dict[hypodisc.multimodal.langutil.RegexPattern, set[int]][source]
Generalize dictionary of regular expressions inplace, by merging

similar patterns and by aligning quantifiers.

Parameters
  • patterns (dict[RegexPattern,int]) –

  • num_recursions (int) –

Return type

dict[RegexPattern,int]

hypodisc.multimodal.langutil.generate_regex(s: str, strip_punctuation: bool = True) hypodisc.multimodal.langutil.RegexPattern[source]

Generate regular expresion that fits string.

Parameters
  • s (str) –

  • strip_punctuation (bool) –

Return type

RegexPattern

hypodisc.multimodal.timeutils.cast_datefrag(dtype: rdf.terms.IRIRef, v: rdf.terms.Literal) float[source]

Cast date fragments to days

Parameters
  • dtype (IRIRef) –

  • v (Literal) –

Return type

float

hypodisc.multimodal.timeutils.cast_datefrag_delta(days: float) str[source]

Cast datefrag delta to dayTimeDuration

Parameters

days (float) –

Return type

str

hypodisc.multimodal.timeutils.cast_datefrag_rev(dtype: rdf.terms.IRIRef, days: float) str[source]

Cast days to months and days

Parameters
  • dtype (IRIRef) –

  • days (float) –

Return type

str

hypodisc.multimodal.timeutils.cast_datetime(dtype: rdf.terms.IRIRef, v: rdf.terms.Literal) float[source]

Cast dates to UNIX timestamps

Parameters
  • dtype (IRIRef) –

  • v (Literal) –

Return type

float

hypodisc.multimodal.timeutils.cast_datetime_delta(timestamp: float) str[source]

Cast datetime delta to duration

Parameters

timestamp (float) –

Return type

str

hypodisc.multimodal.timeutils.cast_datetime_rev(dtype: rdf.terms.IRIRef, timestamp: float) str[source]

Cast date and similar to iso format

Parameters
  • dtype (IRIRef) –

  • timestamp (float) –

Return type

str