Importing data (deprecated)

Table Data Import – deprecated

Warning: The approach described here is unnecessarily complex, because it distributes one task over several classes. Simply use the TableReader and then import by iterating over the DataRow array in a POST service. This creates one linear piece of code. Easier to write, easier to read. Subject identification can be controlled with PHP functions.

API reference

DataRow

This section describes how to import data organized in columns and rows. Topincs currently supports CSV, Excel and Microsoft Access as source formats. We use the term table for the data held in a csv file or Excel file. If you export an Excel sheet into a CSV file, both hold the same table.

Fundamental terms and assumptions

Hint: It is possible to create invalid instance data in the topic map. It is the responsibility of the administrator to be aware of this and take care of it.

We consider a table to be composed of named columns and rows holding statements about any number of subjects. We do not make any assumptions on how the statements about one subject are distributed over the table(s). In most cases statements on one subject are within one row, but sometimes they are distributed over more than one row. When the data comes from one relational database system, statements on one subject are usually distributed over more than one table (otherwise there would be no need for foreign keys).

Experience has shown that the most common use case is a table held in an excel sheet. We make the following helpful distinction:

A row holds statements about one or more primary subjects. These are usually not referred to in other rows.
Secondary subjects are referred to in more than one row.

Primary subjects are generally easier to identify than secondary subjects. Many times they come with an usuable identifier. In end user administered data secondary subjects are many times referred to by their name in natural language. In data that stems from other computational systems most subjects have some sort of strong identification. Primary subjects share a resemblance with movement data (high frequency, little to no referrals), while secondary subjects are similar to master data (low frequency, many referrals).

Implementation outline

In order to actually import a table you need to do the following:

Create a file topic type in order to upload the source data files.
Create a POST service to perform the import.
If you want users to be able to import, you will need a topic menu for the data file topic type.
A topic type to hold the import report has proven useful. It should hold the time when the import started and finished, the result report and the errors.

Data file

At this point we have uploaded a data file into the store and we want to read the data. In our simple example we import the employment information of a few people. Our topic types are Person, Company, and Employment. Our schema in this example connects the Employment to exactly one person (employee) and one company (employer). Companies and people can be associated with any number of employments. Our CSV import looks like this:

Examplatory employee csv

"First name", "Date of birth", "Company", "Position", "Entry date", "Exit date"
"Joel", "1980-01-01", "Cola Coca, Inc", "Manager", "2000-01-01", "2003-05-31"
"Mary", "1990-01-01", "Orange, Inc", "President", "2002-01-01", "2009-01-31"
"Huey", "1990-03-01", "Orange, Inc", "Vice-President", "2005-01-01", "2012-01-31"
"Mary", "1990-01-01", "Nova, Inc", "Vice-President", "1999-05-01", "2001-12-31"

Service and Domain class

Best practise is to create a domain class for the data file topic type and do everthing in a method import returning the report tobject. To persist the import results in the database has proven very useful, in particular if users are performing imports independently. The computational component of the service is then as easy as in the following code example.

stores/hr/php/services/import/POST.php

<?php
require_once("domain/EmploymentFile.php");

$file   = $p->get("file");
$report = $file->import();

redirect($report->href());

The code in the domain class is still quite generic. It reads the data, extracts the subjects and then imports them. The main work is done in the Extractors described in the next section.

stores/hr/php/domain/EmploymentFile.php

<?php

require_once("api/import/TableReader.php");
require_once("api/import/Importer.php");
require_once("EmploymentExtractor.php");
require_once("CompanyExtractor.php");
require_once("PersonExtractor.php");

class EmploymentFile extends Tobject {

  private function read() {
    $options = ["rename" => ["Date of birth" => "dob"]];

    return TableReader::read_csv($this->file()->path, $options);
  }

  function import() {
    $start   = new DateTime();
    $results = [];
    $errors  = [];


    $table = $this->read();
    // $table holds an array of PHP objects of type DataRow.
    // The data is conceptually a m x n matrix of strings.
    // m is the number of rows, n the number of columns.


    // Extracting is first. Order might be sigificant.
    $companies   = CompanyExtractor::extract($table, $errors);
    $people      = PersonExtractor::extract($table, $errors);
    $employments = EmploymentExtractor::extract($table, $errors);

    // Importing is second. Order is sigificant!
    Importer::import($companies, $results, $errors,["duplicates" => "ignore"]);
    Importer::import($people, $results, $errors,["duplicates" => "ignore"]);
    Importer::import($exployments,$results, $errors,["duplicates" => "ignore"]);

    // This is optional, but recommended when users import.
    // Very useful for the admin in case of problems.
    // You need to model the report topic type yourself.
    return Tobject::make("id:9284")
      ->set_start($start)
      ->set_end(new DateTime())
      ->set_import_file($this)
      ->add_all_errors($errors)
      ->add_all_results($results);
  }
}

Tobject::register("EmploymentFile", "id:2288");

Extracting

API reference

Extractor

API reference

DataRow

API reference

ImportSubject

The extractors play the biggest role in the Topincs import framework. Their main purpose is to identify subjects and collect what statements should be added to the topic. Addiitonally they clean up strings and convert them to match the datatype of the occurrence type. They are the guardians of form, which is indispensible if you want to perform computations on the data.

An extractor implements two methods: si and convert. Both are given a PHP object of type DataRow.

si computes a subject identifier for the import subject. If it is starting with a ~, it will not be persisted. If si returns null, no subject of this type will be created for this data row. Implementing si is optional but necessary, if an import is repeated with different data set.
convert transfers information from the data row to the import subject.

stores/hr/php/PersonExtractor.php

<?php

require_once("api/import/Extractor.php");

abstract class PersonExtractor extends Extractor {

  // The serialization name of the topic type.
  const TYPE = "person";

  static function si(DataRow $row) : ?string {
    $key = sprintf("%s|%s", $row->mand("first name"), $row->mand("dob"));
    return sha1(strttolower($key));
  }

  static function convert(DataRow $row, ImportSubject $subject) : void {
    $subject
      ->set("name", $row->mand("first name"))
      ->set("date-of-birth", $row->mand("dob", "PersonExtractor::date"));
  }

  static function date($value) {
    return DateTime::createFromFormat("Y-m-d", $value);
  }
}

stores/hr/php/EmploymentExtractor.php

<?php

require_once("api/import/Extractor.php");
require_once("PersonExtractor.php");

abstract class EmploymentExtractor extends Extractor {

  const TYPE = "employment";

  static function convert(DataRow $row, ImportSubject $subject) : void {
    $subject
      ->set("employee", $row->get_import_subject("person"))
      ->set("employer", $row->get_import_subject("company"))
      ->set("position", $row->mand("positon"))
      ->set("start", $row->mand("entry date", "PersonExtractor::date"))
      ->set("start", $row->opt("exit date", "PersonExtractor::date"));
  }
}

Importing

After extracting we have all data assembled in a way that is in line with our schema. So far it is only in memory. Now we need to persist it in the store. For this we need to hand over the import subjects to the importer. Before the import subjects are actually imported, some checks are performed:

It is checked whether the import subject is already represented by a topic in the store.
If yes, it is checked whether the topic is frozen. It is an error, if it is, since frozen topics cannot be modified.

Errors related to one import subject only affect the import of other subjects, if there is an association between them.

Matching

Generally it is good practise to use one or more subject identifiers per topic. If a mapping of import subjects onto the persisted set of subject identifiers is not possible, using a Matcher avoids duplication. Duplication means creating a new topic for an import subject that is already represented in the store.

A matcher performs the task of aligning existing information in the store with the intermediate representation of import subjects in memory.

The method hash computes a string value from the relevant properties.
The method tobject fetches the relevant information from all existing topics of the respective topic type.

stores/hr/php/PersonMatcher.php

API reference

Matcher

<?php

require_once("api/import/Matcher.php");

class PersonMatcher extends Matcher {

  // The serialization name of the topic type.
  const TYPE = "person";

  static function hash(array $data) : string {
    return strtolower($data["name"] . $data["data-of-birth"]->format("Ymd"));
  }

  static function tobject(Tobject $tobject) : array {
    return [
      "name" => $tobject->get_name(),
      "date-of-birth"  => $tobject->get_date_of_birth()
    ];
  }
}

Table Data Import – deprecated

Fundamental terms and assumptions

Implementation outline

Data file

Examplatory employee csv

Service and Domain class

stores/hr/php/services/import/POST.php

stores/hr/php/domain/EmploymentFile.php

Extracting

stores/hr/php/PersonExtractor.php

stores/hr/php/EmploymentExtractor.php

Importing

Matching

stores/hr/php/PersonMatcher.php

We are sorry

Refreshed unknown