Triangulation API

pen_compare.triangulation

Triangulator: compare claims across the 4 PEN-STACK packages.

Each discrepancy category corresponds to a rule in config/triangulation_rules_v3.yaml. Rules are applied to every row in the unified editor universe; results are returned as DiscrepancyRecord objects that can be serialised to a Parquet file.

class pen_compare.triangulation.triangulator.DiscrepancyRecord(entity_id: str, source: str, category: str, severity: str, sources_involved: str, details: str)[source]

Bases: object

Parameters:
  • entity_id (str)

  • source (str)

  • category (str)

  • severity (str)

  • sources_involved (str)

  • details (str)

entity_id: str
source: str
category: str
severity: str
sources_involved: str
details: str
class pen_compare.triangulation.triangulator.Triangulator(rules_path=PosixPath('config/triangulation_rules_v3.yaml'))[source]

Bases: object

Parameters:

rules_path (Path)

audit(entity_id, universe)[source]

Return all discrepancy records for one entity.

Parameters:
  • entity_id (str)

  • universe (DataFrame)

Return type:

list[DiscrepancyRecord]

run_full(universe)[source]

Apply all rules to the full universe; return a flat DataFrame.

Parameters:

universe (DataFrame)

Return type:

DataFrame