UCSUR Charts File Format

Revision 1.0
Authors Ian Jacobi
Date 2013-02-05
Latest Version http://www.kreativekorp.com/ucsur/UNIDATA/Charts.html

 

Summary

This file describes the format and contents of Charts.txt

Status

The file and the files described herein are part of the Under-ConScript Unicode Registry (UCSUR).


Section Headers

Charts.txt contains a series of sections separated by a line like:

@@	E000	Tengwar	E07F

(with tabs). This marks a Unicode block from U+E000 to U+E07F inclusive, named Tengwar (if I recall correctly, this has to match the name of the block in Blocks.txt).

Directives

Then what follows is a set of directives like:

@font=Tengwar Parmaite, 24pt

This directive sets the default font for the characters, and the size is that used in the main table (the character lists are scaled down appropriately).

@font:Alt=Tengwar Parmaite Alt, 24pt

This directive sets an alternate font for a character that may be used by name (in this case "Alt").

@replacement-position=-10%

This directive changes the vertical offset of the replacement character (the dotted circle) in the off chance that the diacritics of the font are too high or too low.

@style:Special Punctuation Character=<style:text-properties fo:letter-spacing="-2pt"/>

This directive adds a special text style according to style:text-properties ODF element, which resembles, in some ways, a CSS style definition. These styles can then be referenced in the actual codepoints.

@margin-top=0pt
@margin-bottom=0pt

These two directives are helpful if the font has sufficient margins built in (i.e. its baselines are far enough apart that the "E000" codepoint ends up being pushed off the page if the font is sized appropriately).

If you play around with these directives on your own, you should get a good idea how they affect rendering.

Codepoints

The mapping between the characters in a font and the codepoints in the tables are the last part of a section. These are simple semi-colon delimited lines of the form "codepoint;character".

For example:

E000;1

maps the codepoint U+E000 to the character "1" in the default font (Tengwar Parmaite). Alternate fonts can be specified like:

E037;&#196;@Alt

which maps the codepoint U+E037 to the character 0xC4 (Ä) in the font declared in the directive @font:Alt (i.e. Tengwar Parmaite Alt). Keep in mind that any alternate font name may be given (i.e. @Alt doesn't mean that "Alt" is added to the end of the default font). See the Engsvanyali section for an example.

The text portion of the codepoint map can include some ODF elements. For example, to place a replacement character, the text "<text:span text:style-name="Replacement_Character">&#x25CC;</text:span>" is used so as to pick a consistent font (Arial) for the replacement character. Likewise any of the custom styles defined by the @style directive may be referenced (e.g. "<text:span text:style-name="Special_Punctuation_Character">&#x2005;=-=</text:span>"). If you do a search for <text:span text:style-name in the file, you'll see more examples.

Different spaces may be of help in positioning diacritics more appropriately over the dotted circle based on the negative offset of the diacritic; U+200A HAIR SPACE, U+2009 THIN SPACE, and U+2006 SIX-PER-EM SPACE are among the best options. It's also a good idea to place the same space on either side of the replacement character, so that the dotted circle remains centered (which is why you can see "&#x200A;" on either side of "&#x25CC;" sometimes.