662 lines
28 KiB
HTML
662 lines
28 KiB
HTML
<!DOCTYPE html>
|
|
<html><head>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
|
|
<link href="sqlite.css" rel="stylesheet">
|
|
<title>What If OpenDocument Used SQLite?</title>
|
|
<!-- path= -->
|
|
</head>
|
|
<body>
|
|
<div class=nosearch>
|
|
<a href="index.html">
|
|
<img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
|
|
</a>
|
|
<div><!-- IE hack to prevent disappearing logo --></div>
|
|
<div class="tagline desktoponly">
|
|
Small. Fast. Reliable.<br>Choose any three.
|
|
</div>
|
|
<div class="menu mainmenu">
|
|
<ul>
|
|
<li><a href="index.html">Home</a>
|
|
<li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
|
|
<li class='wideonly'><a href='about.html'>About</a>
|
|
<li class='desktoponly'><a href="docs.html">Documentation</a>
|
|
<li class='desktoponly'><a href="download.html">Download</a>
|
|
<li class='wideonly'><a href='copyright.html'>License</a>
|
|
<li class='desktoponly'><a href="support.html">Support</a>
|
|
<li class='desktoponly'><a href="prosupport.html">Purchase</a>
|
|
<li class='search' id='search_menubutton'>
|
|
<a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
|
|
</ul>
|
|
</div>
|
|
<div class="menu submenu" id="submenu">
|
|
<ul>
|
|
<li><a href='about.html'>About</a>
|
|
<li><a href='docs.html'>Documentation</a>
|
|
<li><a href='download.html'>Download</a>
|
|
<li><a href='support.html'>Support</a>
|
|
<li><a href='prosupport.html'>Purchase</a>
|
|
</ul>
|
|
</div>
|
|
<div class="searchmenu" id="searchmenu">
|
|
<form method="GET" action="search">
|
|
<select name="s" id="searchtype">
|
|
<option value="d">Search Documentation</option>
|
|
<option value="c">Search Changelog</option>
|
|
</select>
|
|
<input type="text" name="q" id="searchbox" value="">
|
|
<input type="submit" value="Go">
|
|
</form>
|
|
</div>
|
|
</div>
|
|
<script>
|
|
function toggle_div(nm) {
|
|
var w = document.getElementById(nm);
|
|
if( w.style.display=="block" ){
|
|
w.style.display = "none";
|
|
}else{
|
|
w.style.display = "block";
|
|
}
|
|
}
|
|
function toggle_search() {
|
|
var w = document.getElementById("searchmenu");
|
|
if( w.style.display=="block" ){
|
|
w.style.display = "none";
|
|
} else {
|
|
w.style.display = "block";
|
|
setTimeout(function(){
|
|
document.getElementById("searchbox").focus()
|
|
}, 30);
|
|
}
|
|
}
|
|
function div_off(nm){document.getElementById(nm).style.display="none";}
|
|
window.onbeforeunload = function(e){div_off("submenu");}
|
|
/* Disable the Search feature if we are not operating from CGI, since */
|
|
/* Search is accomplished using CGI and will not work without it. */
|
|
if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
|
|
document.getElementById("search_menubutton").style.display = "none";
|
|
}
|
|
/* Used by the Hide/Show button beside syntax diagrams, to toggle the */
|
|
function hideorshow(btn,obj){
|
|
var x = document.getElementById(obj);
|
|
var b = document.getElementById(btn);
|
|
if( x.style.display!='none' ){
|
|
x.style.display = 'none';
|
|
b.innerHTML='show';
|
|
}else{
|
|
x.style.display = '';
|
|
b.innerHTML='hide';
|
|
}
|
|
return false;
|
|
}
|
|
</script>
|
|
</div>
|
|
|
|
|
|
|
|
<h1 align="center">
|
|
What If OpenDocument Used SQLite?</h1>
|
|
|
|
<h2>Introduction</h2>
|
|
|
|
<p>Suppose the
|
|
<a href="http://en.wikipedia.org/wiki/OpenDocument">OpenDocument</a> file format,
|
|
and specifically the "ODP" OpenDocument Presentation format, were
|
|
built around SQLite. Benefits would include:
|
|
<ul>
|
|
<li>Smaller documents
|
|
<li>Faster File/Save times
|
|
<li>Faster startup times
|
|
<li>Less memory used
|
|
<li>Document versioning
|
|
<li>A better user experience
|
|
</ul>
|
|
|
|
<p>
|
|
Note that this is only a thought experiment.
|
|
We are not suggesting that OpenDocument be changed.
|
|
Nor is this article a criticism of the current OpenDocument
|
|
design. The point of this essay is to suggest ways to improve
|
|
future file format designs.
|
|
|
|
<h2>About OpenDocument And OpenDocument Presentation</h2>
|
|
|
|
<p>
|
|
The OpenDocument file format is used for office applications:
|
|
word processors, spreadsheets, and presentations. It was originally
|
|
designed for the OpenOffice suite but has since been incorporated into
|
|
other desktop application suites. The OpenOffice application has been
|
|
forked and renamed a few times. This author's primary use for OpenDocument is
|
|
building slide presentations with either
|
|
<a href="https://www.neooffice.org/neojava/en/index.php">NeoOffice</a> on Mac, or
|
|
<a href="http://www.libreoffice.org/">LibreOffice</a> on Linux and Windows.
|
|
|
|
<p>
|
|
An OpenDocument Presentation or "ODP" file is a
|
|
<a href="http://en.wikipedia.org/wiki/Zip_%28file_format%29">ZIP archive</a> containing
|
|
XML files describing presentation slides and separate image files for the
|
|
various images that are included as part of the presentation.
|
|
(OpenDocument word processor and spreadsheet files are similarly
|
|
structured but are not considered by this article.) The reader can
|
|
easily see the content of an ODP file by using the "zip -l" command.
|
|
For example, the following is the "zip -l" output from a 49-slide presentation
|
|
about SQLite from the 2014
|
|
<a href="http://southeastlinuxfest.org/">SouthEast LinuxFest</a>
|
|
conference:
|
|
|
|
<blockquote><pre>
|
|
Archive: self2014.odp
|
|
Length Date Time Name
|
|
--------- ---------- ----- ----
|
|
47 2014-06-21 12:34 mimetype
|
|
0 2014-06-21 12:34 Configurations2/statusbar/
|
|
0 2014-06-21 12:34 Configurations2/accelerator/current.xml
|
|
0 2014-06-21 12:34 Configurations2/floater/
|
|
0 2014-06-21 12:34 Configurations2/popupmenu/
|
|
0 2014-06-21 12:34 Configurations2/progressbar/
|
|
0 2014-06-21 12:34 Configurations2/menubar/
|
|
0 2014-06-21 12:34 Configurations2/toolbar/
|
|
0 2014-06-21 12:34 Configurations2/images/Bitmaps/
|
|
54702 2014-06-21 12:34 Pictures/10000000000001F40000018C595A5A3D.png
|
|
46269 2014-06-21 12:34 Pictures/100000000000012C000000A8ED96BFD9.png
|
|
<i>... 58 other pictures omitted...</i>
|
|
13013 2014-06-21 12:34 Pictures/10000000000000EE0000004765E03BA8.png
|
|
1005059 2014-06-21 12:34 Pictures/10000000000004760000034223EACEFD.png
|
|
211831 2014-06-21 12:34 content.xml
|
|
46169 2014-06-21 12:34 styles.xml
|
|
1001 2014-06-21 12:34 meta.xml
|
|
9291 2014-06-21 12:34 Thumbnails/thumbnail.png
|
|
38705 2014-06-21 12:34 Thumbnails/thumbnail.pdf
|
|
9664 2014-06-21 12:34 settings.xml
|
|
9704 2014-06-21 12:34 META-INF/manifest.xml
|
|
--------- -------
|
|
10961006 78 files
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
The ODP ZIP archive contains four different XML files:
|
|
content.xml, styles.xml, meta.xml, and settings.xml. Those four files
|
|
define the slide layout, text content, and styling. This particular
|
|
presentation contains 62 images, ranging from full-screen pictures to
|
|
tiny icons, each stored as a separate file in the Pictures
|
|
folder. The "mimetype" file contains a single line of text that says:
|
|
|
|
<blockquote><pre>
|
|
application/vnd.oasis.opendocument.presentation
|
|
</pre></blockquote>
|
|
|
|
<p>The purpose of the other files and folders is presently
|
|
unknown to the author but is probably not difficult to figure out.
|
|
|
|
<h2>Limitations Of The OpenDocument Presentation Format</h2>
|
|
|
|
<p>
|
|
The use of a ZIP archive to encapsulate XML files plus resources is an
|
|
elegant approach to an application file format.
|
|
It is clearly superior to a custom binary file format.
|
|
But using an SQLite database as the
|
|
container, instead of ZIP, would be more elegant still.
|
|
|
|
<p>A ZIP archive is basically a key/value database, optimized for
|
|
the case of write-once/read-many and for a relatively small number
|
|
of distinct keys (a few hundred to a few thousand) each with a large BLOB
|
|
as its value. A ZIP archive can be viewed as a "pile-of-files"
|
|
database. This works, but it has some shortcomings relative to an
|
|
SQLite database, as follows:
|
|
|
|
<ol>
|
|
<li><p><b>Incremental update is hard.</b>
|
|
<p>
|
|
It is difficult to update individual entries in a ZIP archive.
|
|
It is especially difficult to update individual entries in a ZIP
|
|
archive in a way that does not destroy
|
|
the entire document if the computer loses power and/or crashes
|
|
in the middle of the update. It is not impossible to do this, but
|
|
it is sufficiently difficult that nobody actually does it. Instead, whenever
|
|
the user selects "File/Save", the entire ZIP archive is rewritten.
|
|
Hence, "File/Save" takes longer than it ought, especially on
|
|
older hardware. Newer machines are faster, but it is still bothersome
|
|
that changing a single character in a 50 megabyte presentation causes one
|
|
to burn through 50 megabytes of the finite write life on the SSD.
|
|
|
|
<li><p><b>Startup is slow.</b>
|
|
<p>
|
|
In keeping with the pile-of-files theme, OpenDocument stores all slide
|
|
content in a single big XML file named "content.xml".
|
|
LibreOffice reads and parses this entire file just to display
|
|
the first slide.
|
|
LibreOffice also seems to
|
|
read all images into memory as well, which makes sense seeing as when
|
|
the user does "File/Save" it is going to have to write them all back out
|
|
again, even though none of them changed. The net effect is that
|
|
start-up is slow. Double-clicking an OpenDocument file brings up a
|
|
progress bar rather than the first slide.
|
|
This results in a bad user experience.
|
|
The situation grows ever more annoying as
|
|
the document size increases.
|
|
|
|
<li><p><b>More memory is required.</b>
|
|
<p>
|
|
Because ZIP archives are optimized for storing big chunks of content, they
|
|
encourage a style of programming where the entire document is read into
|
|
memory at startup, all editing occurs in memory, then the entire document
|
|
is written to disk during "File/Save". OpenOffice and its descendants
|
|
embrace that pattern.
|
|
|
|
<p>
|
|
One might argue that it is ok, in this era of multi-gigabyte desktops, to
|
|
read the entire document into memory.
|
|
But it is not ok.
|
|
For one, the amount of memory used far exceeds the (compressed) file size
|
|
on disk. So a 50MB presentation might take 200MB or more RAM.
|
|
That still is not a problem if one only edits a single document at a time.
|
|
But when working on a talk, this author will typically have 10 or 15 different
|
|
presentations up all at the same
|
|
time (to facilitate copy/paste of slides from past presentation) and so
|
|
gigabytes of memory are required.
|
|
Add in an open web browser or two and a few other
|
|
desktop apps, and suddenly the disk is whirling and the machine is swapping.
|
|
And even having just a single document is a problem when working
|
|
on an inexpensive Chromebook retrofitted with Ubuntu.
|
|
Using less memory is always better.
|
|
</p>
|
|
|
|
<li><p><b>Crash recovery is difficult.</b>
|
|
<p>
|
|
The descendants of OpenOffice tend to segfault more often than commercial
|
|
competitors. Perhaps for this reason, the OpenOffice forks make
|
|
periodic backups of their in-memory documents so that users do not lose
|
|
all pending edits when the inevitable application crash does occur.
|
|
This causes frustrating pauses in the application for the few seconds
|
|
while each backup is being made.
|
|
After restarting from a crash, the user is presented with a dialog box
|
|
that walks them through the recovery process. Managing the crash
|
|
recovery this way involves lots of extra application logic and is
|
|
generally an annoyance to the user.
|
|
|
|
<li><p><b>Content is inaccessible.</b>
|
|
<p>
|
|
One cannot easily view, change, or extract the content of an
|
|
OpenDocument presentation using generic tools.
|
|
The only reasonable way to view or edit an OpenDocument document is to open
|
|
it up using an application that is specifically designed to read or write
|
|
OpenDocument (read: LibreOffice or one of its cousins). The situation
|
|
could be worse. One can extract and view individual images (say) from
|
|
a presentation using just the "zip" archiver tool. But it is not reasonable
|
|
try to extract the text from a slide. Remember that all content is stored
|
|
in a single "context.xml" file. That file is XML, so it is a text file.
|
|
But it is not a text file that can be managed with an ordinary text
|
|
editor. For the example presentation above, the content.xml file
|
|
consist of exactly two lines. The first line of the file is just:
|
|
|
|
<blockquote><pre>
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
</pre></blockquote>
|
|
|
|
<p>The second line of the file contains 211792 characters of
|
|
impenetrable XML. Yes, 211792 characters all on one line.
|
|
This file is a good stress-test for a text editor.
|
|
Thankfully, the file is not some obscure
|
|
binary format, but in terms of accessibility, it might as well be
|
|
written in Klingon.
|
|
</ol>
|
|
|
|
<h2>First Improvement: Replace ZIP with SQLite</h2>
|
|
|
|
<p>
|
|
Let us suppose that instead of using a ZIP archive to store its files,
|
|
OpenDocument used a very simple SQLite database with the following
|
|
single-table schema:
|
|
|
|
<blockquote><pre>
|
|
CREATE TABLE OpenDocTree(
|
|
filename TEXT PRIMARY KEY, -- Name of file
|
|
filesize BIGINT, -- Size of file after decompression
|
|
content BLOB -- Compressed file content
|
|
);
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
For this first experiment, nothing else about the file format is changed.
|
|
The OpenDocument is still a pile-of-files, only now each file is a row
|
|
in an SQLite database rather than an entry in a ZIP archive.
|
|
This simple change does not use the power of a relational
|
|
database. Even so, this simple change shows some improvements.
|
|
|
|
<a name="smaller"></a>
|
|
|
|
<p>
|
|
Surprisingly, using SQLite in place of ZIP makes the presentation
|
|
file smaller. Really. One would think that a relational database file
|
|
would be larger than a ZIP archive, but at least in the case of NeoOffice
|
|
that is not so. The following is an actual screen-scrape showing
|
|
the sizes of the same NeoOffice presentation, both in its original
|
|
ZIP archive format as generated by NeoOffice (self2014.odp), and
|
|
as repacked as an SQLite database using the
|
|
<a href="http://www.sqlite.org/sqlar/doc/trunk/README.md">SQLAR</a> utility:
|
|
|
|
<blockquote><pre>
|
|
-rw-r--r-- 1 drh staff 10514994 Jun 8 14:32 self2014.odp
|
|
-rw-r--r-- 1 drh staff 10464256 Jun 8 14:37 self2014.sqlar
|
|
-rw-r--r-- 1 drh staff 10416644 Jun 8 14:40 zip.odp
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
The SQLite database file ("self2014.sqlar") is about a
|
|
half percent smaller than the equivalent ODP file! How can this be?
|
|
Apparently the ZIP archive generator logic in NeoOffice
|
|
is not as efficient as it could be, because when the same pile-of-files
|
|
is recompressed using the command-line "zip" utility, one gets a file
|
|
("zip.odp") that is smaller still, by another half percent, as seen
|
|
in the third line above. So, a well-written ZIP archive
|
|
can be slightly smaller than the equivalent SQLite database, as one would
|
|
expect. But the difference is slight. The key take-away is that an
|
|
SQLite database is size-competitive with a ZIP archive.
|
|
|
|
<p>
|
|
The other advantage to using SQLite in place of
|
|
ZIP is that the document can now be updated incrementally, without risk
|
|
of corrupting the document if a power loss or other crash occurs in the
|
|
middle of the update. (Remember that writes to
|
|
<a href="atomiccommit.html">SQLite databases are atomic</a>.) True, all the
|
|
content is still kept in a single big XML file ("content.xml") which must
|
|
be completely rewritten if so much as a single character changes. But
|
|
with SQLite, only that one file needs to change. The other 77 files in the
|
|
repository can remain unaltered. They do not all have to be rewritten,
|
|
which in turn makes "File/Save" run much faster and saves wear on SSDs.
|
|
|
|
<h2>Second Improvement: Split content into smaller pieces</h2>
|
|
|
|
<p>
|
|
A pile-of-files encourages content to be stored in a few large chunks.
|
|
In the case of ODP, there are just four XML files that define the layout
|
|
off all slides in a presentation. An SQLite database allows storing
|
|
information in a few large chunks, but SQLite is also adept and efficient
|
|
at storing information in numerous smaller pieces.
|
|
|
|
<p>
|
|
So then, instead of storing all content for all slides in a single
|
|
oversized XML file ("content.xml"), suppose there was a separate table
|
|
for storing the content of each slide separately. The table schema
|
|
might look something like this:
|
|
|
|
<blockquote><pre>
|
|
CREATE TABLE slide(
|
|
pageNumber INTEGER, -- The slide page number
|
|
slideContent TEXT -- Slide content as XML or JSON
|
|
);
|
|
CREATE INDEX slide_pgnum ON slide(pageNumber); -- Optional
|
|
</pre></blockquote>
|
|
|
|
<p>The content of each slide could still be stored as compressed XML.
|
|
But now each page is stored separately. So when opening a new document,
|
|
the application could simply run:
|
|
|
|
<blockquote><pre>
|
|
SELECT slideContent FROM slide WHERE pageNumber=1;
|
|
</pre></blockquote>
|
|
|
|
<p>This query will quickly and efficiently return the content of the first
|
|
slide, which could then be speedily parsed and displayed to the user.
|
|
Only one page needs to be read and parsed in order render the first screen,
|
|
which means that the first screen appears much faster and
|
|
there is no longer a need for an annoying progress bar.
|
|
|
|
<p>If the application wanted
|
|
to keep all content in memory, it could continue reading and parsing the
|
|
other pages using a background thread after drawing the first page. Or,
|
|
since reading from SQLite is so efficient, the application might
|
|
instead choose to reduce its memory footprint and only keep a single
|
|
slide in memory at a time. Or maybe it keeps the current slide and the
|
|
next slide in memory, to facility rapid transitions to the next slide.
|
|
|
|
<p>
|
|
Notice that dividing up the content into smaller pieces using an SQLite
|
|
table gives flexibility to the implementation. The application can choose
|
|
to read all content into memory at startup. Or it can read just a
|
|
few pages into memory and keep the rest on disk. Or it can read just
|
|
single page into memory at a time. And different versions of the application
|
|
can make different choices without having to make any changes to the
|
|
file format. Such options are not available when all content is in
|
|
a single big XML file in a ZIP archive.
|
|
|
|
<p>
|
|
Splitting content into smaller pieces also helps File/Save operations
|
|
to go faster. Instead of having to write back the content of all pages
|
|
when doing a File/Save, the application only has to write back those
|
|
pages that have actually changed.
|
|
|
|
<p>
|
|
One minor downside of splitting content into smaller pieces is that
|
|
compression does not work as well on shorter texts and so the size of
|
|
the document might increase. But as the bulk of the document space
|
|
is used to store images, a small reduction in the compression efficiency
|
|
of the text content will hardly be noticeable, and is a small price
|
|
to pay for an improved user experience.
|
|
|
|
<h2>Third Improvement: Versioning</h2>
|
|
|
|
<p>
|
|
Once one is comfortable with the concept of storing each slide separately,
|
|
it is a small step to support versioning of the presentation. Consider
|
|
the following schema:
|
|
|
|
<blockquote><pre>
|
|
CREATE TABLE slide(
|
|
slideId INTEGER PRIMARY KEY,
|
|
derivedFrom INTEGER REFERENCES slide,
|
|
content TEXT -- XML or JSON or whatever
|
|
);
|
|
CREATE TABLE version(
|
|
versionId INTEGER PRIMARY KEY,
|
|
priorVersion INTEGER REFERENCES version,
|
|
checkinTime DATETIME, -- When this version was saved
|
|
comment TEXT, -- Description of this version
|
|
manifest TEXT -- List of integer slideIds
|
|
);
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
In this schema, instead of each slide having a page number that determines
|
|
its order within the presentation, each slide has a unique
|
|
integer identifier that is unrelated to where it occurs in sequence.
|
|
The order of slides in the presentation is determined by a list of
|
|
slideIds, stored as a text string in the MANIFEST column of the VERSION
|
|
table.
|
|
Since multiple entries are allowed in the VERSION table, that means that
|
|
multiple presentations can be stored in the same document.
|
|
|
|
<p>
|
|
On startup, the application first decides which version it
|
|
wants to display. Since the versionId will naturally increase in time
|
|
and one would normally want to see the latest version, an appropriate
|
|
query might be:
|
|
|
|
<blockquote><pre>
|
|
SELECT manifest, versionId FROM version ORDER BY versionId DESC LIMIT 1;
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
Or perhaps the application would rather use the
|
|
most recent checkinTime:
|
|
|
|
<blockquote><pre>
|
|
SELECT manifest, versionId, max(checkinTime) FROM version;
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
Using a single query such as the above, the application obtains a list
|
|
of the slideIds for all slides in the presentation. The application then
|
|
queries for the content of the first slide, and parses and displays that
|
|
content, as before.
|
|
|
|
<p>(Aside: Yes, that second query above that uses "max(checkinTime)"
|
|
really does work and really does return a well-defined answer in SQLite.
|
|
Such a query either returns an undefined answer or generates an error
|
|
in many other SQL database engines, but in SQLite it does what you would
|
|
expect: it returns the manifest and versionId of the entry that has the
|
|
maximum checkinTime.)
|
|
|
|
<p>When the user does a "File/Save", instead of overwriting the modified
|
|
slides, the application can now make new entries in the SLIDE table for
|
|
just those slides that have been added or altered. Then it creates a
|
|
new entry in the VERSION table containing the revised manifest.
|
|
|
|
<p>The VERSION table shown above has columns to record a check-in comment
|
|
(presumably supplied by the user) and the time and date at which the File/Save
|
|
action occurred. It also records the parent version to record the history
|
|
of changes. Perhaps the manifest could be stored as a delta from the
|
|
parent version, though typically the manifest will be small enough that
|
|
storing a delta might be more trouble than it is worth. The SLIDE table
|
|
also contains a derivedFrom column which could be used for delta encoding
|
|
if it is determined that saving the slide content as a delta from its
|
|
previous version is a worthwhile optimization.
|
|
|
|
<p>So with this simple change, the ODP file now stores not just the most
|
|
recent edit to the presentation, but a history of all historic edits. The
|
|
user would normally want to see just the most recent edition of the
|
|
presentation, but if desired, the user can now go backwards in time to
|
|
see historical versions of the same presentation.
|
|
|
|
<p>Or, multiple presentations could be stored within the same document.
|
|
|
|
<p>With such a schema, the application would no longer need to make
|
|
periodic backups of the unsaved changes to a separate file to avoid lost
|
|
work in the event of a crash. Instead, a special "pending" version could
|
|
be allocated and unsaved changes could be written into the pending version.
|
|
Because only changes would need to be written, not the entire document,
|
|
saving the pending changes would only involve writing a few kilobytes of
|
|
content, not multiple megabytes, and would take milliseconds instead of
|
|
seconds, and so it could be done frequently and silently in the background.
|
|
Then when a crash occurs and the user reboots, all (or almost all)
|
|
of their work is retained. If the user decides to discard unsaved changes,
|
|
they simply go back to the previous version.
|
|
|
|
<p>
|
|
There are details to fill in here.
|
|
Perhaps a screen can be provided that displays a history changes
|
|
(perhaps with a graph) allowing the user to select which version they
|
|
want to view or edit. Perhaps some facility can be provided to merge
|
|
forks that might occur in the version history. And perhaps the
|
|
application should provide a means to purge old and unwanted versions.
|
|
The key point is that using an SQLite database to store the content,
|
|
rather than a ZIP archive, makes all of these features much, much easier
|
|
to implement, which increases the possibility that they will eventually
|
|
get implemented.
|
|
|
|
<h2>And So Forth...</h2>
|
|
|
|
<p>
|
|
In the previous sections, we have seen how moving from a key/value
|
|
store implemented as a ZIP archive to a simple SQLite database
|
|
with just three tables can add significant capabilities to an application
|
|
file format.
|
|
We could continue to enhance the schema with new tables, with indexes
|
|
added for performance, with triggers and views for programming convenience,
|
|
and constraints to enforce consistency of content even in the face of
|
|
programming errors. Further enhancement ideas include:
|
|
<ul>
|
|
<li> Store an <a href="undoredo.html">automated undo/redo stack</a> in a database table so that
|
|
Undo could go back into prior edit sessions.
|
|
<li> Add <a href="fts3.html#fts4">full text search</a> capabilities to the slide deck, or across
|
|
multiple slide decks.
|
|
<li> Decompose the "settings.xml" file into an SQL table that
|
|
is more easily viewed and edited by separate applications.
|
|
<li> Break out the "Presentor Notes" from each slide into a separate
|
|
table, for easier access from third-party applications and/or scripts.
|
|
<li> Enhance the presentation concept beyond the simple linear sequence of
|
|
slides to allow for side-tracks and excursions to be taken depending on
|
|
how the audience is responding.
|
|
</ul>
|
|
|
|
<p>
|
|
An SQLite database has a lot of capability, which
|
|
this essay has only begun to touch upon. But hopefully this quick glimpse
|
|
has convinced some readers that using an SQL database as an application
|
|
file format is worth a second look.
|
|
|
|
<p>
|
|
Some readers might resist using SQLite as an application
|
|
file format due to prior exposure to enterprise SQL databases and
|
|
the caveats and limitations of those other systems.
|
|
For example, many enterprise database
|
|
engines advise against storing large strings or BLOBs in the database
|
|
and instead suggest that large strings and BLOBs be stored as separate
|
|
files and the filename stored in the database. But SQLite
|
|
is not like that. Any column of an SQLite database can hold
|
|
a string or BLOB up to about a gigabyte in size. And for strings and
|
|
BLOBs of 100 kilobytes or less,
|
|
<a href="intern-v-extern-blob.html">I/O performance is better</a> than using separate
|
|
files.
|
|
|
|
<p>
|
|
Some readers might be reluctant to consider SQLite as an application
|
|
file format because they have been inculcated with the idea that all
|
|
SQL database schemas must be factored into third normal form and store
|
|
only small primitive data types such as strings and integers. Certainly
|
|
relational theory is important and designers should strive to understand
|
|
it. But, as demonstrated above, it is often quite acceptable to store
|
|
complex information as XML or JSON in text fields of a database.
|
|
Do what works, not what your database professor said you ought to do.
|
|
|
|
<h2>Review Of The Benefits Of Using SQLite</h2>
|
|
|
|
<p>
|
|
In summary,
|
|
the claim of this essay is that using SQLite as a container for an application
|
|
file format like OpenDocument
|
|
and storing lots of smaller objects in that container
|
|
works out much better than using a ZIP archive holding a few larger objects.
|
|
To wit:
|
|
|
|
<ol>
|
|
<li><p>
|
|
An SQLite database file is approximately the same size, and in some cases
|
|
smaller, than a ZIP archive holding the same information.
|
|
|
|
<li><p>
|
|
The <a href="atomiccommit.html">atomic update capabilities</a>
|
|
of SQLite allow small incremental changes
|
|
to be safely written into the document. This reduces total disk I/O
|
|
and improves File/Save performance, enhancing the user experience.
|
|
|
|
<li><p>
|
|
Startup time is reduced by allowing the application to read in only the
|
|
content shown for the initial screen. This largely eliminates the
|
|
need to show a progress bar when opening a new document. The document
|
|
just pops up immediately, further enhancing the user experience.
|
|
|
|
<li><p>
|
|
The memory footprint of the application can be dramatically reduced by
|
|
only loading content that is relevant to the current display and keeping
|
|
the bulk of the content on disk. The fast query capability of SQLite
|
|
make this a viable alternative to keeping all content in memory at all times.
|
|
And when applications use less memory, it makes the entire computer more
|
|
responsive, further enhancing the user experience.
|
|
|
|
<li><p>
|
|
The schema of an SQL database is able to represent information more directly
|
|
and succinctly than a key/value database such as a ZIP archive. This makes
|
|
the document content more accessible to third-party applications and scripts
|
|
and facilitates advanced features such as built-in document versioning, and
|
|
incremental saving of work in progress for recovery after a crash.
|
|
</ol>
|
|
|
|
<p>
|
|
These are just a few of the benefits of using SQLite as an application file
|
|
format — the benefits that seem most likely to improve the user
|
|
experience for applications like OpenOffice. Other applications might
|
|
benefit from SQLite in different ways. See the <a href="appfileformat.html">Application File Format</a>
|
|
document for additional ideas.
|
|
|
|
<p>
|
|
Finally, let us reiterate that this essay is a thought experiment.
|
|
The OpenDocument format is well-established and already well-designed.
|
|
Nobody really believes that OpenDocument should be changed to use SQLite
|
|
as its container instead of ZIP. Nor is this article a criticism of
|
|
OpenDocument for not choosing SQLite as its container since OpenDocument
|
|
predates SQLite. Rather, the point of this article is to use OpenDocument
|
|
as a concrete example of how SQLite can be used to build better
|
|
application file formats for future projects.
|
|
|