hypodisc package
hypodisc module
hypodisc.core module
- hypodisc.core.sequential.explore(parent: hypodisc.core.structures.GraphPattern, candidates: set, max_length: int, max_width: int, min_support: int) set [source]
Explore all predicate-object pairs which where added by the previous iteration as possible endpoints to expand from.
- Parameters
pattern (GraphPattern) –
candidates (set) –
max_length (int) –
max_width (int) –
min_support (int) –
- Return type
set
- hypodisc.core.sequential.extend(pattern: hypodisc.core.structures.GraphPattern, endpoint: hypodisc.core.structures.Variable, extension: hypodisc.core.structures.Assertion) hypodisc.core.structures.GraphPattern [source]
Extend a graph_pattern from a given endpoint variable by evaluating all possible candidate extensions on whether they satisfy the minimal support and confidence.
- Parameters
pattern (GraphPattern) –
a_i (Assertion) –
a_j (Assertion) –
- Return type
- hypodisc.core.sequential.generate(root_patterns: dict[str, list], depths: range, min_support: int, p_explore: float, p_extend: float, max_length: int, max_width: int, out_writer: Optional[rdf.formats.NTriples], out_prefix_map: Optional[dict[str, str]], out_ns: Optional[rdf.terms.IRIRef], strategy: Literal['BFS', 'DFS']) int [source]
- Generate all patterns up to and including a maximum depth which
satisfy a minimal support.
- Parameters
depths (range) –
min_support (int) –
p_explore (float) –
p_extend (float) –
max_length (int) –
max_width (int) –
- hypodisc.core.sequential.generate_bf(root_patterns: dict[str, list], depths: range, min_support: int, p_explore: float, p_extend: float, max_length: int, max_width: int, out_writer: Optional[rdf.formats.NTriples], out_prefix_map: Optional[dict[str, str]], out_ns: Optional[rdf.terms.IRIRef]) int [source]
- Generate all patterns up to and including a maximum depth which
satisfy a minimal support, using a breadth first approach. This approach has the anytime property yet uses more memory.
- Parameters
depths (range) –
min_support (int) –
p_explore (float) –
p_extend (float) –
max_length (int) –
max_width (int) –
- hypodisc.core.sequential.generate_df(root_patterns: dict[str, list], depths: range, min_support: int, p_explore: float, p_extend: float, max_length: int, max_width: int, out_writer: Optional[rdf.formats.NTriples], out_prefix_map: Optional[dict[str, str]], out_ns: Optional[rdf.terms.IRIRef]) int [source]
- Generate all patterns up to and including a maximum depth which
satisfy a minimal support, using a depth first approach. This approach uses less memory but does not have the anytime property.
- Parameters
depths (range) –
min_support (int) –
p_explore (float) –
p_extend (float) –
max_length (int) –
max_width (int) –
- hypodisc.core.sequential.infer_type(kg: hypodisc.data.graph.KnowledgeGraph, rdf_type_idx: int, node_idx: int) tuple[typing.Union[rdf.terms.IRIRef, str], bool] [source]
- Infer the (data) type or language tag of a resource. Defaults to
rdfs:Class if none can be inferred.
- Parameters
kg (KnowledgeGraph) –
rdf_type_idx (int) –
node_idx (int) –
- Return type
IRIRef
- hypodisc.core.sequential.init_root_patterns(rng: numpy.random._generator.Generator, kg: hypodisc.data.graph.KnowledgeGraph, min_support: float, mode: Literal['A', 'AT', 'T'], textual_support: bool, numerical_support: bool, temporal_support: bool, exclude: list[str]) dict[str, list] [source]
Creating all patterns of types which satisfy minimal support.
- Parameters
rng (np.random.Generator) –
kg (KnowledgeGraph) –
min_support (float) –
mode (Literal["A", "AT", "T"]) –
multimodal (bool) –
- Returns
- Return type
dict[str,list]
- hypodisc.core.sequential.new_graph_pattern(root_var, p: rdf.terms.IRIRef, o_value: Union[rdf.terms.IRIRef, rdf.terms.Literal], o_type: rdf.terms.IRIRef, domain: set, inv_assertion_map: dict[int, set[int]]) hypodisc.core.structures.GraphPattern [source]
Create a new graph_pattern and compute members and metrics
- Parameters
kg (KnowledgeGraph) –
parent (Optional[GraphPattern]) –
var (ObjectTypeVariable) –
p_idx (int) –
o_idx (int) –
class_members_idx (np.ndarray) –
- Return type
- Returns
an instance of type GraphPattern
- hypodisc.core.sequential.new_mm_graph_pattern(root_var, var_o: Union[hypodisc.core.structures.ObjectTypeVariable, hypodisc.core.structures.DataTypeVariable], domain: set, p: rdf.terms.IRIRef, inv_assertion_map: dict[int, set[int]]) hypodisc.core.structures.GraphPattern [source]
Create a new multimodal graph_pattern and compute members and metrics
- Parameters
kg –
parent (Optional[GraphPattern]) –
var (ObjectTypeVariable) –
var_o (Union[ObjectTypeVariable, DataTypeVariable]) –
members (set) –
class_members_idx (np.ndarray) –
p_idx (int) –
- Return type
- Returns
an instance of type GraphPattern
- hypodisc.core.sequential.new_var_graph_pattern(root_var, var_o: Union[hypodisc.core.structures.ObjectTypeVariable, hypodisc.core.structures.DataTypeVariable], domain: set, p: rdf.terms.IRIRef, inv_assertion_map: dict[int, set[int]]) hypodisc.core.structures.GraphPattern [source]
Create a new variable graph_pattern and compute members and metrics
- Parameters
kg –
parent (Optional[GraphPattern]) –
var (ObjectTypeVariable) –
var_o (Union[ObjectTypeVariable, DataTypeVariable]) –
class_members_idx (np.ndarray) –
s_idx_list (np.ndarray) –
p_idx (int) –
- Return type
- Returns
an instance of type GraphPattern
- hypodisc.core.sequential.random() x in the interval [0, 1).
- class hypodisc.core.structures.Assertion(subject: Union[hypodisc.core.structures.ResourceWrapper, hypodisc.core.structures.Variable], predicate: rdf.terms.Resource, object: Union[hypodisc.core.structures.ResourceWrapper, hypodisc.core.structures.Variable])[source]
Bases:
tuple
Assertion class
Wrapper around tuple (an assertion) that gives each instantiation an unique uuid which allows for comparisons between assertions with the same values. This is needed when either lhs or rhs use TypeVariables.
Initialize a new instance of Assertion
- Parameters
subject – the subject of an assertion
predicate – the predicate of an assertion
object – the object of an assertion
_uuid – identifier of this instance
- Returns
None
- copy(deep: bool = False) hypodisc.core.structures.Assertion [source]
Return a copy of this object
- Parameters
deep – Copy the UUID and HASH as well
- Returns
a copy of type Assertion
- Return type
- equal(other: hypodisc.core.structures.Assertion) bool [source]
Return true if assertions are equal content wise
- Parameters
other (Assertion) –
- Return type
bool
- equiv(other: hypodisc.core.structures.Assertion) bool [source]
Return true is assertions are equivalent, which implies that they state the same or have an entity/attribute that is an instance of the type specified by the other.
- Parameters
other (Assertion) –
- Return type
bool
- class hypodisc.core.structures.DataTypeVariable(resource: rdf.terms.IRIRef)[source]
Bases:
hypodisc.core.structures.TypeVariable
Data Type Variable class
An unbound variable which can take on any value of a data type class (literal)
Initialize and instance of this class
- Params resource
the IRI of the datatype
- Returns
None
- class hypodisc.core.structures.GraphPattern(assertions: set[hypodisc.core.structures.Assertion] = {}, domain: set = {}, parent: Optional[hypodisc.core.structures.GraphPattern] = None)[source]
Bases:
object
GraphPattern class
Holds all assertions of a graph pattern and keeps track of the connections and distances (from the root) of these assertions.
__init__.
- Parameters
identity (IdentityAssertion) –
- Return type
None
- as_dot(prefix_map: dict[str, str]) str [source]
Return a dot representation of this pattern.
- Parameters
prefix_map (dict[str, str]) –
- Return type
str
- as_query(prefix_map: dict[str, str]) str [source]
Return a SPARQL query representation of this pattern.
- Parameters
prefix_map (dict[str, str]) –
- Return type
str
- contains_at_depth(assertion: hypodisc.core.structures.Assertion, depth: int) bool [source]
- copy() hypodisc.core.structures.GraphPattern [source]
Create a deep copy, except for the assertions, which remain as pointers.
- Return type
- extend(endpoint: hypodisc.core.structures.Variable, extension: hypodisc.core.structures.Assertion) None [source]
Extend the body by appending a new assertion to an existing endpoint.
- class hypodisc.core.structures.MultiModalNumericVariable(resource: rdf.terms.IRIRef, crange: tuple[float, float])[source]
Bases:
hypodisc.core.structures.MultiModalVariable
Numeric Variable class
Initialize and instance of this class
- Returns
None
- equiv(other: hypodisc.core.structures.MultiModalNumericVariable) bool [source]
- Return true if self and other represent the same
datatype, and have the same mean and variance.
- Parameters
other (MultiModalNumericVariable) –
- Return type
bool
- class hypodisc.core.structures.MultiModalStringVariable(resource, regex)[source]
Bases:
hypodisc.core.structures.MultiModalVariable
String Variable class
Initialize and instance of this class
- Returns
None
- equiv(other: hypodisc.core.structures.MultiModalStringVariable) bool [source]
- Return true if self and other represent the same
datatype, and have the same regular expression.
- Parameters
other (MultiModalStringVariable) –
- Return type
bool
- class hypodisc.core.structures.MultiModalVariable(resource: rdf.terms.IRIRef)[source]
Bases:
hypodisc.core.structures.DataTypeVariable
Multimodal Variable class
An unbound variable which conveys a description of a cluster of values for node features.
Initialize and instance of this class
- Params resource
the IRI of the datatype
- Returns
None
- equiv(other: hypodisc.core.structures.MultiModalVariable) bool [source]
- class hypodisc.core.structures.ObjectTypeVariable(resource: rdf.terms.IRIRef)[source]
Bases:
hypodisc.core.structures.TypeVariable
Object Type Variable class
An unbound variable which can be any member of an object type class (entity)
Initialize and instance of this class
- Params resource
the IRI of the class
- Returns
None
- class hypodisc.core.structures.ResourceWrapper(resource: rdf.terms.Resource, type: Optional[rdf.terms.IRIRef] = None)[source]
Bases:
rdf.terms.IRIRef
Resource Wrapper class
A wrapper which can take on any value of a certain resource, but which stores additional information
Initialize and instance of this class
- Params resource
the IRI or Literal value
- Returns
None
- class hypodisc.core.structures.TypeVariable(resource: rdf.terms.IRIRef)[source]
Bases:
hypodisc.core.structures.Variable
Type Variable class
An unbound variable which can take on any value of a certain object or data type resource. Each instance has an unique ID to allow this object to be used as variable shared between assertions.
Initialize and instance of this class
- Params resource
the IRI of the class or datatype
- Returns
None
- equiv(other: hypodisc.core.structures.TypeVariable) bool [source]
- class hypodisc.core.structures.Variable(resource: rdf.terms.IRIRef)[source]
Bases:
rdf.terms.IRIRef
Variable class
An unbound variable which can take on any value of a certain object or data type resource. Each instance has an unique ID to allow this object to be used as variable shared between assertions.
Initialize and instance of this class
- Params resource
the IRI of the class or datatype
- Returns
None
- equiv(other: hypodisc.core.structures.Variable) bool [source]
- hypodisc.core.utils.floatProbabilityArg(arg: str) float [source]
Custom argument type for probability
- Parameters
arg (str) – user provided argument string containing a real-valued value
- Return type
float
- Returns
a probability in [0, 1]
- hypodisc.core.utils.integerRangeArg(arg: str) range [source]
Custom argument type for range
- Parameters
arg (str) – user provided argument string of form ‘from:to’, ‘:to’, or ‘to’, with ‘from’ and ‘to’ being positive integers.
- Return type
range
- Returns
range of values to explore
- hypodisc.core.utils.predict_hash(pattern: hypodisc.core.structures.GraphPattern, endpoint: hypodisc.core.structures.Variable, extension: hypodisc.core.structures.Assertion) int [source]
- hypodisc.core.utils.read_version(filename: str) str [source]
Parse the project’s version
- Parameters
filename (str) – path to ‘pyproject.toml’
- Return type
str
- Returns
the project’s version as a string
hypodisc.data module
- class hypodisc.data.graph.KnowledgeGraph(rng: numpy.random._generator.Generator, paths: list[str])[source]
Bases:
object
Knowledge Graph stored in vector representation plus query functions
- Knowledge Graph stored in vector representation plus query
functions
- Parameters
rng (np.random.Generator) –
- Return type
None
- class hypodisc.data.graph.UniqueLiteral(value: str, datatype: Optional[rdf.terms.IRIRef] = None, language: Optional[str] = None)[source]
Bases:
rdf.terms.Literal
An RDF Literal.
- Parameters
value (str) – The value of the literal.
datatype (Optional[IRIRef]) – An optional datatype.
language (Optional[str]) – An optional language tag
- Return type
None
- hypodisc.data.graph.mkprefixes(namespaces: Set[str], custom_prefix_map: Optional[dict[str, str]] = None) dict[str, str] [source]
- hypodisc.data.utils.mkfile(directory: str, basename: str, extension: str) pathlib.Path [source]
- Return path to a new file. Adds numerical suffix if
the file already exists.
- Parameters
directory (str) –
basename (str) –
extension (str) –
- Return type
Path
- hypodisc.data.utils.write_query(f_out: rdf.formats.NTriples, pattern: hypodisc.core.structures.GraphPattern, num_patterns: int, base: rdf.terms.IRIRef, prefix_map: dict[str, str]) int [source]
hypodisc.multimodal module
- hypodisc.multimodal.clustering.cast_values(dtype: rdf.terms.IRIRef, values: list) tuple[numpy.ndarray, numpy.ndarray] [source]
- Cast raw values to a datatype suitable for clustering.
Default to string.
- Parameters
dtype (IRIRef) –
values (list) –
- Return type
np.ndarray
- hypodisc.multimodal.clustering.cast_values_rev(dtype: rdf.terms.IRIRef, clusters: list[tuple]) list[tuple[set, typing.Any]] [source]
Cast clusters to relevant datatypes ranges
- Parameters
dtype (IRIRef) –
clusters (list[tuple]) –
- Return type
list[tuple[set, Any]]
- hypodisc.multimodal.clustering.cast_values_rev_dist(dtype: rdf.terms.IRIRef, clusters: list[tuple]) list[tuple[set, typing.Any]] [source]
Cast clusters to relevant datatypes distributions
- Parameters
dtype (IRIRef) –
clusters (list[tuple]) –
- Return type
list[tuple[set, Any]]
- hypodisc.multimodal.clustering.compute_GMM(rng: numpy.random._generator.Generator, X: numpy.ndarray, sample: numpy.ndarray, num_components: int, num_tries: int, eps: float) tuple[float, numpy.ndarray, numpy.ndarray, numpy.ndarray] [source]
- Compute a GMM from different random states and return the best results.
Train the model on the sample but returns the predictions on X
- Parameters
rng (np.random.Generator) –
X (np.ndarray) –
num_components (int) –
num_tries (int) –
eps (float) –
- Return type
list
- hypodisc.multimodal.clustering.compute_clusters(rng: numpy.random._generator.Generator, dtype: rdf.terms.IRIRef, values: list, values_gidx: numpy.ndarray) list[tuple[set, typing.Any]] [source]
Compute clusters from list of values.
- Parameters
rng (np.random.Generator) –
dtype (IRIRef) –
values (list[Literal]) –
values_gidx (np.ndarray) –
- Return type
list[tuple[set, Any]]
- hypodisc.multimodal.clustering.compute_numeric_clusters(rng: numpy.random._generator.Generator, X: numpy.ndarray, num_components: range, num_tries: int = 3, eps: float = 0.001, standardize: bool = True, shuffle: bool = True) tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] [source]
- Compute numerical cluster means and stdevs for a range of possible
number of components. Also return the cluster assignments.
- Parameters
rng (np.random.Generator) –
X (np.ndarray) –
num_components (range) –
num_tries (int) –
eps (float) –
standardize (bool) –
shuffle (bool) –
- Return type
dict
- hypodisc.multimodal.clustering.string_clusters(X: numpy.ndarray, X_gidx: numpy.ndarray, merge_charsets: bool = True, omit_empty: bool = True) list[tuple[hypodisc.multimodal.langutil.RegexPattern, set[int]]] [source]
- Generate clusters of string by infering regex patterns and by
generalizing these on similarity.
- Parameters
object_list (np.ndarray) –
merge_charsets (bool) –
omit_empty (bool) –
- Return type
dict[RegexPattern,int]
- class hypodisc.multimodal.langutil.RegexCharAtom(charset_str: str)[source]
Bases:
hypodisc.multimodal.langutil.RegexCharSet
- A regular expression character set, representing a solitary
character (eg ‘s’).
- Parameters
value (str) –
- Return type
None
- add(char: Optional[str]) None [source]
Add character to character set count.
- Parameters
char (str) –
- Return type
None
- copy() hypodisc.multimodal.langutil.RegexCharAtom [source]
Return a deep copy of this character set.
- Return type
‘RegexCharAtom’
- class hypodisc.multimodal.langutil.RegexCharRange(charset: Optional[numpy.ndarray] = None, charset_str: Optional[str] = None, complement: bool = False)[source]
Bases:
hypodisc.multimodal.langutil.RegexCharSet
- A regular expression character set, representing a group with
optional alternatives (eg ‘(a|b|c)’), or a range (eg ‘[a-z]’).
- Parameters
charset (Optional[np.ndarray]) –
charset_str (Optional[str]) –
complement (bool) –
- Return type
None
- add(char: str) None [source]
Add character to character set count.
- Parameters
char (str) –
- Return type
None
- copy() hypodisc.multimodal.langutil.RegexCharRange [source]
Return a deep copy of this object.
- Return type
‘RegexCharRange’
- class hypodisc.multimodal.langutil.RegexCharSet(complement: bool = False)[source]
Bases:
object
- A regular expression character set, representing a solitary
character (eg ‘s’), a group with optional alternatives (eg ‘(a|b|c)’), or a range (eg ‘[a-z]’).
- Parameters
complement (bool) –
- Return type
None
- equiv(other: Union[hypodisc.multimodal.langutil.RegexCharAtom, hypodisc.multimodal.langutil.RegexCharRange]) bool [source]
- Return true if self and other have the same full character set,
ignoring any quantifiers.
- Parameters
other (Union['RegexCharAtom', 'RegexCharRange']) –
- Return type
bool
- exact() str [source]
- Return a reduced character set that exactly matches the input, but
nothing beyond it.
- Return type
str
- merge(other: Union[hypodisc.multimodal.langutil.RegexCharRange, hypodisc.multimodal.langutil.RegexCharAtom]) Union[hypodisc.multimodal.langutil.RegexCharRange, hypodisc.multimodal.langutil.RegexCharAtom] [source]
- Return a new character set object which encompasses both self and
other, by merging the sets and adjusting the quantifiers.
- Parameters
other (Union['RegexCharRange', 'RegexCharAtom']) –
- Return type
Union[‘RegexCharRange’,’RegexCharAtom’]
- weak_match(other: Union[hypodisc.multimodal.langutil.RegexCharAtom, hypodisc.multimodal.langutil.RegexCharRange]) bool [source]
- Return true if self and other have the same reduced character set
and quantifiers.
- Parameters
other (Union['RegexCharAtom', 'RegexCharRange']) –
- Return type
bool
- class hypodisc.multimodal.langutil.RegexPattern[source]
Bases:
object
- A regular expression consisting of solitary characters (eg ‘s’),
groups with optional alternatives (eg ‘(a|b|c)’), and ranges (eg ‘[a-z]’).
- Return type
None
- add(charset: Union[hypodisc.multimodal.langutil.RegexCharAtom, hypodisc.multimodal.langutil.RegexCharRange]) None [source]
- Add a character set to this regular expression. Character sets are
assumed ordered sequentially.
- Parameters
value (Union[RegexCharAtom, RegexCharRange]) –
- Return type
None
- copy() hypodisc.multimodal.langutil.RegexPattern [source]
Return a deep copy of this object.
- Return type
‘RegexPattern’
- equiv(other: hypodisc.multimodal.langutil.RegexPattern) bool [source]
- Return true if self has the same full expression as other,
excluding quantifiers.
- Parameters
other ('RegexPattern') –
- Return type
bool
- exact() str [source]
- Return a reduced pattern that exactly matches the input, but
nothing beyond it.
- Return type
str
- generalize() hypodisc.multimodal.langutil.RegexPattern [source]
- Generalize character sets on word level. For example,
‘[A-Z][a-z]{2}s’ would yield ‘[A-Za-z]{3}’. The generalized expression is returned as a new instance.
- Return type
‘RegexPattern’
- weak_match(other: hypodisc.multimodal.langutil.RegexPattern) bool [source]
- Return a reduced regular expresion that exactly matches the input,
but nothing beyond it.
- Parameters
other ('RegexPattern') –
- Return type
bool
- hypodisc.multimodal.langutil.generalize_patterns(patterns: dict[hypodisc.multimodal.langutil.RegexPattern, set[int]], num_recursions: int = 1) dict[hypodisc.multimodal.langutil.RegexPattern, set[int]] [source]
- Generalize dictionary of regular expressions inplace, by merging
similar patterns and by aligning quantifiers.
- Parameters
patterns (dict[RegexPattern,int]) –
num_recursions (int) –
- Return type
dict[RegexPattern,int]
- hypodisc.multimodal.langutil.generate_regex(s: str, strip_punctuation: bool = True) hypodisc.multimodal.langutil.RegexPattern [source]
Generate regular expresion that fits string.
- Parameters
s (str) –
strip_punctuation (bool) –
- Return type
- hypodisc.multimodal.timeutils.cast_datefrag(dtype: rdf.terms.IRIRef, v: rdf.terms.Literal) float [source]
Cast date fragments to days
- Parameters
dtype (IRIRef) –
v (Literal) –
- Return type
float
- hypodisc.multimodal.timeutils.cast_datefrag_delta(days: float) str [source]
Cast datefrag delta to dayTimeDuration
- Parameters
days (float) –
- Return type
str
- hypodisc.multimodal.timeutils.cast_datefrag_rev(dtype: rdf.terms.IRIRef, days: float) str [source]
Cast days to months and days
- Parameters
dtype (IRIRef) –
days (float) –
- Return type
str
- hypodisc.multimodal.timeutils.cast_datetime(dtype: rdf.terms.IRIRef, v: rdf.terms.Literal) float [source]
Cast dates to UNIX timestamps
- Parameters
dtype (IRIRef) –
v (Literal) –
- Return type
float