Tobject cache

Issue description

Tobjects should be cached in the file system. It should be possible to fill this cache in a flexible manner. This is in particular important, since it is unclear which tobjects are accessed and how they are accessed. The main motivation for this is, that in the current statement based caching system, loading the index takes longer and longer while much of the information is not used in the service. With this approach the caching becomes more targeted since topic oriented.

Developer comments

Once a tobject is cached in the file system, it must be possible to opcache it. This way all frequently accessed information resides in memory after a while. When a tobject is modfied, its corresponding cache file is deleted from the file system and the opcached product of the file is invalidated. At all times the tobject cache file must be in sync with the information in the database. This has the consequence, that it is unclear in a set of given tobjects, which ones have to be built and which ones are already cached. Currently iteration only accesses tobjects one by one. If the are all loaded into memory at once, this could bring the advantage, that the tobjects can be rebuilt all together.

It should be an advantage that information is held in very little tables. We can load any amount of heterogenous data in a few reads. Also over topics of different type. I think a good implementation of these ideas can have an dramatic impact on performance. Additionally it would no longer be necessary to use Tobject::accelerate to preload statement information for topic types.

Instead of deleting the cache file of the tobject, it could be better to write the changes and incorporate them on the next read. Not sure about this since it is quite complicated (but well understood as already working in the current statement based indices). Needs to be compared with builk loading heterogenous tobjects into memory with a minimal amount of database reads. See previous comment.

This approach is now in production. It can be enabled on a per store or per service basis. There is now a pure data representation for the topic called _dobject_. It holds *all the information for one topic in an associative array*. Dobjects are currently written as PHP scripts (opcache) into %index/dobjects%. This speeds up a certain class of services by a great deal: services where *some* instances of topic types are processed, but not every instance, therefor the instance *as a whole* is of interest. So lots of statements are processed for some topics. Usually there is a single entry topic from which all actual interesting topics are gotten. The processing time is fairly independent of the source location of the dobjects (memory or file system). In the case at hand it reduced the response time from 3.5 seconds to less than 0.15 seconds. Dobjects per se currently do not help with up another class of services: search services. In this case all instances of a type are iterated over. A significant reduction of the response time can here only be observed once all dobjects found their place in memory.

This issue means a big step towards the in-memory representation of the topic map. Currently using the PHP opache for this seems too good to be true, because the expensive data structure parsing step can be circumvented. All other in-memory approaches would suffer from this. There remain a few doubts whether this will be the ultimate solution to speed up the Topincs API: * The opcache never releases memory until it is full. Thus there will be numerous instances of old version of the dobject for altered topics. There is solutions for this problem: ** An opcache restart means the system will run slower for a few requests, but ultimately the gain will make up for this. And it can be toned down by supporting the opcache with an additional file opcache. ** Is it really necessary to dobject everything? Maybe limit it to certains topic types. Topics where no API calls are made do not have to pulled into memory. * The opcache hash table has currently an upper limit of entries (file paths): 1.000.000. This means that on one server only 1.000.000 topics can reside in memory over all stores at one point in time. Currently this suffices, but it is not ideal.

Reporting date

2019-09-08

Reported by

Robert Cerny

Resolving date

2019-09-27

Resolved by

Robert Cerny

Resolved in version

Topincs 8.5.0

Related to2

Speed up the Topincs API(Enhancement)

Dobjects out of sync after deleting(Bug)

Caused

After form trigger receives stale representation of tobject(Bug)

Work sessions16

Start	2019-09-09T10:18:53
End	2019-09-09T12:18:13
Participant	Robert Cerny

Start	2019-09-09T12:28:25
End	2019-09-09T18:06:59
Participant	Robert Cerny

Start	2019-09-09T21:00:00
End	2019-09-09T21:20:15
Participant	Robert Cerny

Start	2019-09-10T20:55:43
End	2019-09-10T21:56:00
Participant	Robert Cerny

Start	2019-09-18T10:15:13
End	2019-09-18T13:05:58
Participant	Robert Cerny

Start	2019-09-18T16:50:35
End	2019-09-18T18:40:03
Participant	Robert Cerny

Start	2019-09-18T21:27:51
End	2019-09-18T22:28:08
Participant	Robert Cerny

Start	2019-09-18T23:31:45
End	2019-09-19T00:19:36
Participant	Robert Cerny

Start	2019-09-19T07:24:53
End	2019-09-19T15:13:06
Participant	Robert Cerny

Start	2019-09-19T16:56:46
End	2019-09-19T18:27:58
Participant	Robert Cerny

Start	2019-09-20T06:06:00
End	2019-09-20T17:58:01
Participant	Robert Cerny

Start	2019-09-21T05:03:27
End	2019-09-21T08:07:21
Participant	Robert Cerny

Start	2019-09-21T18:37:48
End	2019-09-21T19:05:28
Participant	Robert Cerny

Start	2019-09-22T05:35:46
End	2019-09-22T07:15:27
Participant	Robert Cerny

Start	2019-09-22T17:40:37
End	2019-09-22T18:19:14
Participant	Robert Cerny

Start	2019-09-25T07:45:55
End	2019-09-25T11:42:46
Participant	Robert Cerny

Helpful webpages2

medium.com/…s-of-files-on-ext4-cac1000ca28

tideways.com/…h-file-based-opcache-in-php7