mirror of
https://github.com/openbsd/ports.git
synced 2026-06-18 07:24:23 +02:00
459a30b174
XML is slow to parse and strings inside the document cannot be memory mapped as they do not have a trailing NUL char. The libxmlb library takes XML source, and converts it to a structured binary representation with a deduplicated string table -- where the strings have the NULs included. <...> ok robert@
21 lines
1.2 KiB
Plaintext
21 lines
1.2 KiB
Plaintext
XML is slow to parse and strings inside the document cannot be memory mapped as
|
|
they do not have a trailing NUL char. The libxmlb library takes XML source, and
|
|
converts it to a structured binary representation with a deduplicated string
|
|
table -- where the strings have the NULs included.
|
|
|
|
This allows an application to mmap the binary XML file, do an XPath query and
|
|
return some strings without actually parsing the entire document. This is all
|
|
done using (almost) zero allocations and no actual copying of the binary data.
|
|
|
|
As each node in the binary XML file encodes the 'next' node at the same level
|
|
it makes skipping whole subtrees trivial. A 10Mb binary XML file can be loaded
|
|
from disk **and** queried in less than a few milliseconds.
|
|
|
|
The binary XML is not supposed to be small. It's usually about half the size of
|
|
the text XML data where a lot of the tag content is duplicated, but can actually
|
|
be larger than the original XML file. This isn't important; the fast query speed
|
|
and the ability to mmap strings without copies more than makes up for the larger
|
|
on-disk size. If you want to compress your XML, this library probably isn't for
|
|
you -- just use gzip -- it gives you an almost a perfect compression ratio for
|
|
data like this.
|