Friday, February 11, 2011

Wide-character API of glibc

I've had to write an application that deals with text in virtually any language, at least, any latin-based language. The source of the text is in files that are encoded in UTF-8, so the question was "how do I deal with this new type of text data?".


Glibc (and others) provide the wide char (wchar_t) data type, which is able to represent any character by using a fixed-width representation of a character, in my platform, 32 bits wide.


There are functions used to manipulate text data in wide char format, and most of them are equivalent to the classic 8-bit string functions: fwprintf, swprintf, wcslen, fputwc, fgetwc, etc...


However, there some tricks that must be taken into account. There is a new operator "L" used to generate wide-char text constants, so the statement:


char   mystring[] = "hello";


in the wide-char form must be expresed as:


wchar_t  mystring[] = L"hello";


Note that the "format" argument used in all wprintf functions is also a wide-char text, so it needs to be converted to wide-char text using the "L" operator as well.


Once the data is in wchar format, all operations are similar to the classic equivalents, but now the point is how to convert to/from the classic string format. Well, to be able to convert to a wchar character which has more bits than the standard 8-bit representation, some kind of encoding must be used. For example, UTF-8 or UTF-16 are methods to represent wide chars using 8-bit octets. It is important to note that the type of encoding used is independent of the code or the functions used in the code. All the code needs are a couple of functions to convert to/from byte sequences from/to wide chars. The particular type of encoding is not considered here and depends on the operating system's resources and their configuration, usually in the internationalization settings.

For example, the mbsrtowcs and wcsrtombs functions are used to convert from multibyte sequences to wide char strings, and from wide char strings to multibyte sequences, respectively. 

The setlocale function must be used to select a particular internationalizaton setting used to convert between wide-char and multibyte sequence strings, for example:


setlocale(LC_CTYPE, "en_US.UTF-8");


This setting is used by glibc to do the conversion in the mbs/wc functions. Note that the proper conversion files are required. In Linux, these are under the /usr/lib/locale directory, organized by locale name. In the example above, the files searched would be, in precedence order:


/usr/lib/locale/en_US.UTF-8/LC_CTYPE
/usr/lib/locale/en_US.utf8/LC_CTYPE
/usr/lib/locale/en_US/LC_CTYPE


The file must be generated with the 'localedef' utility, provded by the glibc installation:


localedef -i en_US -f UTF-8 /tmp


generates the internationalization files for the locale in the exmaple under the /tmp directory. Embedded system developers should note that the LC_CTYPE file is 256 KB big!, which is too much for some systems. The problem is that, without these files, the mbs/wc and fput/fget functions simply don't work, indicating that "an invalid byte sequence cannot be converted".


In this case, one has to write his own conversion functions at the price of losing some portability of the code. For example, it is cheaper in terms of disk space, to write our own UTF-8 to/from wide-char conversion functions, but we are limited to this type of encoding.




Generation of simple PDF files

The goal is to produce a very simple PDF file that contains unformatted text without using very heavy resources, such as PDF converters, graphics libraries, etc, and match the limited resources available on an embedded platform.

PDF generation in this simple way is quite straightforward. The Adobe's PDF specification provides a couple of examples of simple PDF files that can be used as a template for a very simple PDF rendering library.

Basically, a PDF file is a collection of objects linked from a global object table (xref) that lists the offsets where each object can be located in the file. Some objects have references to other objects as well.


Problems come when the text to be converted to PDF contains special characters, such as those used in the eastern european ones. The PDF specification states that all characters in the text must have 8 bits only, and PDF doesn't know anything about UTF nor Unicode encodings. For each 8-bit character in the text, a reader will fetch the corresponding glyph from the font file selected in the font-type PDF object. The fetch is done using the 8-bit value of the character and taking into account the encoding of the font. Only two types of font encodings are supported : WinAnsiEncoding and MacExpertEncoding, which are used in Windows and MAC, respectively. This encoding is defined in a "Font"  PDF object, or in an "Encoding" object referenced by a "Font" object. However, the windows font encoding is always fixed to the CP-1252 code table. Unfortunately, this encoding only covers some non-ASCII characters, but most of them are not. For example, most of the chars used in eastern european languages (polish, hungarian, etc) are not defined in CP-1252, but in CP-1250 code page.

With these limitations, it is not possible to use non-CP1252 chars in a text stream of a PDF file. There is, however, a workaround that may work in some cases. PDF provides the so-called "font encoding differences", which is an optional entry of the font encoding dictionary that allows the writer to map a given set of character codes in the text to a set of glyphs in the selected font. Glyphs are defined by a standard name, and a list of official glyph names can be retrieved from the Adobe Glyph name list. However, not all glyphs are available in all fonts, of course.


This is an example of this mapping:

10 0 obj
<
/Subtype /Type1
/Name /F1
/BaseFont /MyriadPro
/Encoding 11 0 R
>>
endobj
11 0 obj
<
/BaseEncoding /WinAnsiEncoding
/Differences [ 156 /sacute 230 /cacute
241 /nacute 179 /lslash
]
>>
endobj


In this example, there are two objects: a Font object (10) and an Encoding object (11). The font object uses the Encoding object to define the encoding of the font, which in turn includes the encoding differences. In this example, char code 156 is mapped to the "ś" character, code 230 is mapped to "ć", code 241 is mapped to "ń" and code 179 is mapped to "ł", which are some of the characters of the polish alphabet, and the codes are taken from the CP-1250 code table, which is NOT supported by PDF.

Note that the font type used (MyriadPro in this case) must contain the glyphs used in the Encoding object: sacute, cacute, nacute and lslash. Otherwise, the .notdef glyph would be displayed by the reader. For exmaple, the basic "Courier", "Arial" and "Helvetica" fonts I tried, did NOT contain these glyphs. In particular, I couldn't find any standard proportional font containing these glyphs.


So, for this method to work, it requires that the operating system where the PDF reader is running, provides the font defined in the PDF file, which is a non-standard font. A better solution would be embedding these fonts in the PDF file, but this is not a easy task. 
Unfortunately, this seems to be only safe solution, that is guaranteed to work in any system or environment.


The next step is to understand and learn how to embed font programs in PDF files, in an easy way.

USB File Storage Gadget

Today I have enabled the USB gadget support for file storage. The intention is to be able to export files via the USB device interface to a PC.

The file storage gadget must be enabled at the kernel config menu:
USB support -> Support for USB gadgets -> File-backed storage gadget

Note that only one USB gadget may be enabled at the same time. If multiple gadgets must be supported, all of them must be configured as modules, so I had to remove built-in support for ethernet gadget from the kernel. Switching the USB function requires removing and installing the proper modules.

The module for file storage is g_file_storage.o and is installed this way:
insmod g_file_storage.o file=/results.bin stall=0

The 'stall' argument is necessary for the USB disk to be properly detected by windows. Linux does not require this argument and the drive can be mounted without problem. If 'stall' is not set to zero and the gadget is connected to a windows PC, the following messages appear:
g_file_storage pxa2xx_udc: full speed config #1
udc: pxa2xx_ep_disable, ep1in-bulk not enabled
udc: pxa2xx_ep_disable, ep2out-bulk not enabled
udc: USB reset
udc: USB reset

repeating every few seconds.

For the 'stall' option to be available, it is necessary to enable the 'file-backed storage gadget in test mode' option in the kernel configuration.

Multiple files may be specified when the gadget module is installed, thus creating multiple drives visible to the remote host.


Any volume size may be created but it seems that Windows assigns floppy drive letters if the volume size is similar to a floppy device size. I have tested 720KB and 1440KB only.

The volume may be declared as read-only by using the "ro=1" parameter at the insmod.

The backend file may be either a disk partition or a file image.
An initial filesystem image can be created this way:

# dd if=/dev/zero of=results.bin bs=512 count=2880
# mkdosfs results.bin

Then, loop-mount the image file and populate the filsystem. Here is where problems came: if a process writes a new file to the loop filesystem, the host side of the USB connection (where the file browser runs) does not see the new file, even if the file browser is refreshed. The only workaround is to unplug/plug again the USB cable. This happens even if a 'sync' command is run on the tester device.



Also, some inconsistences happen if the USB host side writes to the device. The device doesn't see the new files, and vice-versa.


In conclusion, it is quite a good method to export only file from a Linux device, but with some limitations on "live" filesystems.




Wednesday, February 3, 2010

Multiple gateways on the same host

Having two gateways on the same host, where some processes send outgoing traffic over one gateway while the rest use the other gateway, requires a virtual network interface to be set up, and have a separate routing table so that all traffic to/from this virtual interface uses the secondary
routing table where the other gateway is in the default route.

First, I'll describe the steps one by one and later I'll explain how to make this setup persistent so that the system boots correctly the next time. Let's assume we have two gateways:

gw1 : 172.26.2.100
gw2 : 172.26.3.100

Create a virtual interface which will be used by processes that need to send traffic to gateway gw2:


$ ifconfig eth0:1 172.26.3.209

Create a definition and give a name to the new routing table (index 1, name 'test'):
$ echo "1 test" >> /etc/iproute2/rt_tables

Show the main routing table:

$ ip route show table main
172.26.0.0/16 dev eth0 proto kernel scope link src 172.26.3.206
169.254.0.0/16 dev eth0 scope link metric 1002
default via 172.26.2.100 dev eth0

Clear the secondary routing table:

$ ip route flush table test

Copy all rules from main table to secondary table, but the default gateway

$ ip route show table main | egrep -Ev "^default" | while read route; do
ip route add table test $route
done

Add the gateway for the secondary routing table:

$ ip route add table test default via 172.26.3.100

List the secondary routing table:

$ ip route show table test
172.26.0.0/16 dev eth0 proto kernel scope link src 172.26.3.206
169.254.0.0/16 dev eth0 scope link metric 1002
default via 172.26.3.100 dev eth0

Add a rule so that for any packet to/from the virtual interface, the secondary routing table is applied:

$ ip rule add from 172.267.3.209 lookup test
$ ip rule add to 172.267.3.209 lookup test

At this point, traffic originated from interface eth0:1 will use gateway gw2 (172.26.3.100) and traffic from interface eth0 enroutes via the default gateway gw1 (172.26.2.100). Try and compare:

$ traceroute -s 172.26.3.206 www.google.com
$ traceroute -s 172.26.3.209 www.google.com

In order to make the changes above persistent, edit/create the following files:

1. Add a description for the 'test' routing table (already done):

$ echo "1 test" >> /etc/iproute2/rt_tables

2. Create file '/etc/sysconfig/network-scripts/ifcfg-eth0:1' containing the
configuration of the virtual interface:

DEVICE=eth0:1
ONBOOT=yes
SEARCH="mydomain.biz"
DOMAIN="mydomain.biz"
DNS1=172.26.2.200
DNS2=172.26.2.201
BOOTPROTO=none
NETMASK=255.255.0.0
IPADDR=172.26.3.209
TYPE=Ethernet
USERCTL=no
PEERDNS=yes
IPV6INIT=no
NM_CONTROLLED=no

3. Create file '/etc/sysconfig/network-scripts/route-eth0:1' containing the 'test' routing table:

table test 172.26.0.0/16 dev eth0 proto kernel scope link src 172.26.3.206
table test 169.254.0.0/16 dev eth0 scope link metric 1002
table test default via 172.26.3.100

4. Create file '/etc/sysconfig/network-scripts/rule-eth0:1' containing the rules for the virtual interface:

from 172.26.3.209 lookup test
to 172.26.3.209 lookup test

The above explanations have been tested on a Fedora 10 distribution.

Friday, January 15, 2010

Extracting the contents of a ramdisk image

Sometimes it is necessary to examine the contents of a ramdisk image (initrd). An initrd image is basically a gzip-compressed cpio archive.

Here are the steps used to extract the files contained in a ramdisk:

gunzip initrd
mkdir tmp
cd tmp
cpio -i

Similarly, to build an initial ramdisk image from a directory:

cd tmp
find . | cpio -o -H newc | gzip -9 ../initrd.img

Where 'newc' is the name of the format used in the cpio archive.

In a Fedora distribution, the ramdisk init script is a nash shell-script. Nash is a very reduced footprint shell with built-in commands targetted to ramdisk operations such as device node creation, module loading, root device creation, root pivot, etc.

One of the most important nash commands is 'mkrootdev', which creates the root device specified as an argument. After this command is run, the "mount /sysroot" command is executed, which mounts the root filesystem on the root device.

If the root device is not correctly specified here, the following error will show up:

"mount: error mounting /dev/root on /sysroot as ext3: No such file or
directory"

mkinitrd is the responsible for creating the init script contained in a initrd image, and the name of the root device is obtained from the /etc/fstab file, when the kernel rpm is installed (not when it is built!). So, it is important to have a correct fstab file in the filesystem before the kernel is installed, otherwise mkinitrd won't be able to figure out the root device and set the correct arguments to the mkrootdev command.


Thursday, January 14, 2010

Add a new config patch to a Linux kernel RPM

Sometimes it is necessary to patch a kernel and rebuild the original rpm so that the installation of the new kernel and its modules is easier. I will not explain here how to patch a kernel, so I assume that we already have a patch available that has been tested and applies correctly.

We must have the source rpm of the kernel we intend to patch and rebuild. The following explanation has been tested on a kernel rpm from a Fedora 10 distro and, in particular, I will focus on patching the kernel config files rather than patching the source files.

First of all, an explanation of how kernel configuration is achieved by the spec file of a Fedora kernel rpm:

1. Some extra config files are provided along with the kernel source tarball. These files are named 'config-*' and are declared in the spec file using the 'sourceNN:' directives. The config files are hierchical, that is, the definitions add hierchichally one on top the other. For example, the configuration file for the i686 architecture is produced by adding the config-generic, config-x86-generic and config-i686 config files. The merge.pl perl script, also provided as an extra source file to the rpm, is responsible for doing the merge of the config files.

2. The config-* files are copied into the buildroot directory of the RPM, that is under BUILD/kernel- directory.

3. A 'make configs' rule is executed so that the config-* files are merged and produce a set of kernel-*.config files also in the build root directory. All configs are generated, even if they are not intended for the architecture that is going to be built.

4. At this point, all patches are applied using the 'ApplyPatch' macro in the spec file. Note that any patch that changes the kernel configuration must change the kernel*.config files, Patching the config-* files would not work as these are not taken into account for the build process after the patches are applied. Similarly, patching the .config file is not an option since this file will be overwritten by each kernel-*.config file during the build process.

5. The config files not intended for the architecture that is being built are deleted.

6. 'make oldconfig' is run on all the remaining kernel-*.config files, and the resulting config file is saved in the configs/ directory and removed from the root build directory. This is why the kernel-*.config files are not present even if we only run the patch stage of rpmbuild (-bp).

So, to have a patch that changes kernel config we have two options:

A. Edit the SOURCE/config-* files provided by the kernel source RPM and rebuild the source rpm with the new ones.

B. Generate a patch that modifies the kernel*.config files.

There some pros and cons to each option. Option B is more modular, so in case we choose to remove some kernel configuration in the future, all we have to do is remove the patch from the spec file, whereas if we use option A, the original source file is modified and it makes more difficult to revert some config changes.

On the contrary, option A is easier than option B because the kernel*.config files that must be patched are autogenerated during the rpm build process and are not available before then.

In order to apply option B, we should have a pristine kernel build root containing the patched kernel and the kernel*.config files so that these can be patched. Then, make a copy of the entire root and edit the kernel config files for the intended architectures, generate a patch by diff'ing both directories, copy it into the SOURCE directory and declare it in the SPEC file. Here is an example:

1. Generate a pristine build root of the linux kernel rpm

$ rpm -i kernel-2.6.27.5.src.rpm
$ cd ~/rpmbuild

Edit the SPEC file, by adding an 'exit 1' sentence after the last patch is applied (a call to the 'ApplyPatch' macro). This causes the patch process to be interrupted after the kernel*.config files are merged but before they are oldconfig'ed and moved to the configs/ directory. Perhaps there is a more elegant way to do this, but I don't know of it.

Extract the sources and apply the patches:

$ rpmbuild --target=i686 -bp SPECS/kernel.spec

Note the -bp option tells rpmbuild to stop after the source tarball is extracted and all patches are applied. Because we added the 'exit 1' statement, rpmbuild will return an error which of course can be ignored.

2. Make the changes in the kernel config files

$ cd BUILD/kernel-2.6.27.5
$ cp -a linux-2.6.27.5.i686 linux-2.6.27.5.i686.new
$ cd linux-2.6.27.5.i686.new

Edit each kernel-*.config files that applies to the intended architectures.

Changing the config files may be a bit painful, depending on the dependencies of the config items that we change. To make sure that the changes are ok, at the end we must run 'make nonint_oldconfig' with does some sort of check on our config file. Another option is to copy each kernel*.config file as .config and configure the kernel manually using 'make menuconfig', and then copying it back to the original file.

3. Generate the patch

$ cd ..
$ diff -urN linux-2.6.27.5.i686/ linux-2.6.27.5.i686.new/ > newpatch

Edit newpatch and make sure that only the changes to the kernel-*.config files are included. Then, rename and move the new patch to the SOURCES directory:

$ mv newpatch ../../SOURCES/linux-2.6.27.5-newconfig.patch

And declare the new patch in the SPEC file, by adding this line after the last call to the ApplyPatch macro:

ApplyPatch linux-2.6.27.5-newconfig.patch

# END OF PATCH APPLICATIONS

And, of course, remove the 'exit 1' statement.

4. Rebuild the RPM

As usual, run:

$ rpmbuild -ba --target=i686 SPECS/kernel.spec

After these steps, both a binary RPM and a source RPM will be present under the RPMS/i686 and SRPMS directories, respectively.

Of course, if we already have a patch for the kernel, all we have do is copy it to the SOURCES directory with a proper name, add it to the spec file by using the ApplyPatch macro, and rebuild the rpm (rpmbuild -ba).

Note the Fedora kernel spec file builds several flavours of the same kernel architecture, for instance: with or without PAE, debug, SMP, and several combinations of these. So many combinations may take a lot of time to build the rpms. If we are interested in only the base kernel without PAE, debug, etc, all we have to do is add the "--with baseonly" option to the rpmbuild command. Other variants are allowed, --with / --without pae, debug, etc....



Monday, July 6, 2009

Some problems with GPS and NTPD

I found some problems when trying to set up the ntpd server connected to a GPS.

First, I had to solve how to create the devices required by ntpd: /dev/gps0 and /dev/gpspps0. This is easy: just create a udev rules file, for example /etc/udev/rules.d/90-gps.rules:

SUBSYSTEM=="pps", MODE="0660" GROUP="uucp" SYMLINK="gps%k"
KERNEL=="ttyS0", SYMLINK="gps0"

the above creates the following links:
gps0 -> ttyS0
gpspps0 -> pps0

and the target devices are accessible to members of the uucp group. Therefore, the 'ntp' user must belong to the uucp group:

# usermod -G ntp,uucp ntp

Then ntpd experienced some serious jitter problems in the system clock, and I found out that the ntpd driver was not using the PPS signal, but only the NMEA output. Running ntpd in debug mode (-d flag) showed a 'permission denied' error when opening /dev/gpspps0. The file permissions are ok, but the problem was in the selinux layer. I inadvertently had the selinux enabled. Disabling it fixed the problem (set to 'disable' in /etc/selinux/config).

Also, the serial port had to be carefully set up before ntpd was started, otherwise, some interactions in the tty driver would be executed, and the DCD interrupt detection would be lost, as explained in the previous posting. The minimum settings of the serial port at start-up are:

# stty -F /dev/gps0 raw ispeed 4800 ospeed 4800 -hupcl

In particular, finding the selinux problem was quite confusing sometimes. If ntpd is started from the command-line as root, the "permission denied" error does not show up. However, if started using the 'service ntpd start' command, the error appears.