Bug
Character encoding of text files not present
Issue description
While Topincs files may have an encoding specified, this information is actually never present. Upload is the most common way of archiving a file to Topincs. In this process, the encoding is never inspected nor presisted. It is always null. It should be there for text files.
Developer comments
All files are binary files. Some files are text files. They predominatly contain bytes or byte sequences which represent characters in one or more human scripts. Their content may be organized in lines in which case the line ending is encoded by a common byte or byte sequence used for this purpose.
The best way to distinguish text and binary (non-text) files is most likely a (partial) distribution analysis, since text files use only a limited subset of the byte domain where as binary files use the whole domain. In any case: very short files might be hard to classify.
Also: why has this never been a problem?
In text files byte sequence frequencies are determined by an external (non-digital) rule system. In binary, there is a software creating/parsing the data, which determines the byte sequences.
|
|
We are sorry
This page cannot be displayed in your browser. Use Firefox, Opera, Safari, or Chrome instead.