662 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			662 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<!DOCTYPE html>
 | 
						|
<html><head>
 | 
						|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
 | 
						|
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
 | 
						|
<link href="sqlite.css" rel="stylesheet">
 | 
						|
<title>What If OpenDocument Used SQLite?</title>
 | 
						|
<!-- path= -->
 | 
						|
</head>
 | 
						|
<body>
 | 
						|
<div class=nosearch>
 | 
						|
<a href="index.html">
 | 
						|
<img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
 | 
						|
</a>
 | 
						|
<div><!-- IE hack to prevent disappearing logo --></div>
 | 
						|
<div class="tagline desktoponly">
 | 
						|
Small. Fast. Reliable.<br>Choose any three.
 | 
						|
</div>
 | 
						|
<div class="menu mainmenu">
 | 
						|
<ul>
 | 
						|
<li><a href="index.html">Home</a>
 | 
						|
<li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
 | 
						|
<li class='wideonly'><a href='about.html'>About</a>
 | 
						|
<li class='desktoponly'><a href="docs.html">Documentation</a>
 | 
						|
<li class='desktoponly'><a href="download.html">Download</a>
 | 
						|
<li class='wideonly'><a href='copyright.html'>License</a>
 | 
						|
<li class='desktoponly'><a href="support.html">Support</a>
 | 
						|
<li class='desktoponly'><a href="prosupport.html">Purchase</a>
 | 
						|
<li class='search' id='search_menubutton'>
 | 
						|
<a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
 | 
						|
</ul>
 | 
						|
</div>
 | 
						|
<div class="menu submenu" id="submenu">
 | 
						|
<ul>
 | 
						|
<li><a href='about.html'>About</a>
 | 
						|
<li><a href='docs.html'>Documentation</a>
 | 
						|
<li><a href='download.html'>Download</a>
 | 
						|
<li><a href='support.html'>Support</a>
 | 
						|
<li><a href='prosupport.html'>Purchase</a>
 | 
						|
</ul>
 | 
						|
</div>
 | 
						|
<div class="searchmenu" id="searchmenu">
 | 
						|
<form method="GET" action="search">
 | 
						|
<select name="s" id="searchtype">
 | 
						|
<option value="d">Search Documentation</option>
 | 
						|
<option value="c">Search Changelog</option>
 | 
						|
</select>
 | 
						|
<input type="text" name="q" id="searchbox" value="">
 | 
						|
<input type="submit" value="Go">
 | 
						|
</form>
 | 
						|
</div>
 | 
						|
</div>
 | 
						|
<script>
 | 
						|
function toggle_div(nm) {
 | 
						|
var w = document.getElementById(nm);
 | 
						|
if( w.style.display=="block" ){
 | 
						|
w.style.display = "none";
 | 
						|
}else{
 | 
						|
w.style.display = "block";
 | 
						|
}
 | 
						|
}
 | 
						|
function toggle_search() {
 | 
						|
var w = document.getElementById("searchmenu");
 | 
						|
if( w.style.display=="block" ){
 | 
						|
w.style.display = "none";
 | 
						|
} else {
 | 
						|
w.style.display = "block";
 | 
						|
setTimeout(function(){
 | 
						|
document.getElementById("searchbox").focus()
 | 
						|
}, 30);
 | 
						|
}
 | 
						|
}
 | 
						|
function div_off(nm){document.getElementById(nm).style.display="none";}
 | 
						|
window.onbeforeunload = function(e){div_off("submenu");}
 | 
						|
/* Disable the Search feature if we are not operating from CGI, since */
 | 
						|
/* Search is accomplished using CGI and will not work without it. */
 | 
						|
if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
 | 
						|
document.getElementById("search_menubutton").style.display = "none";
 | 
						|
}
 | 
						|
/* Used by the Hide/Show button beside syntax diagrams, to toggle the */
 | 
						|
function hideorshow(btn,obj){
 | 
						|
var x = document.getElementById(obj);
 | 
						|
var b = document.getElementById(btn);
 | 
						|
if( x.style.display!='none' ){
 | 
						|
x.style.display = 'none';
 | 
						|
b.innerHTML='show';
 | 
						|
}else{
 | 
						|
x.style.display = '';
 | 
						|
b.innerHTML='hide';
 | 
						|
}
 | 
						|
return false;
 | 
						|
}
 | 
						|
</script>
 | 
						|
</div>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<h1 align="center">
 | 
						|
What If OpenDocument Used SQLite?</h1>
 | 
						|
 | 
						|
<h2>Introduction</h2>
 | 
						|
 | 
						|
<p>Suppose the
 | 
						|
<a href="http://en.wikipedia.org/wiki/OpenDocument">OpenDocument</a> file format,
 | 
						|
and specifically the "ODP" OpenDocument Presentation format, were
 | 
						|
built around SQLite.  Benefits would include:
 | 
						|
<ul>
 | 
						|
<li>Smaller documents
 | 
						|
<li>Faster File/Save times
 | 
						|
<li>Faster startup times
 | 
						|
<li>Less memory used
 | 
						|
<li>Document versioning
 | 
						|
<li>A better user experience
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>
 | 
						|
Note that this is only a thought experiment.
 | 
						|
We are not suggesting that OpenDocument be changed.
 | 
						|
Nor is this article a criticism of the current OpenDocument
 | 
						|
design.  The point of this essay is to suggest ways to improve
 | 
						|
future file format designs.
 | 
						|
 | 
						|
<h2>About OpenDocument And OpenDocument Presentation</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
The OpenDocument file format is used for office applications:
 | 
						|
word processors, spreadsheets, and presentations.  It was originally
 | 
						|
designed for the OpenOffice suite but has since been incorporated into
 | 
						|
other desktop application suites.  The OpenOffice application has been
 | 
						|
forked and renamed a few times.  This author's primary use for OpenDocument is 
 | 
						|
building slide presentations with either 
 | 
						|
<a href="https://www.neooffice.org/neojava/en/index.php">NeoOffice</a> on Mac, or
 | 
						|
<a href="http://www.libreoffice.org/">LibreOffice</a> on Linux and Windows.
 | 
						|
 | 
						|
<p>
 | 
						|
An OpenDocument Presentation or "ODP" file is a
 | 
						|
<a href="http://en.wikipedia.org/wiki/Zip_%28file_format%29">ZIP archive</a> containing
 | 
						|
XML files describing presentation slides and separate image files for the
 | 
						|
various images that are included as part of the presentation.
 | 
						|
(OpenDocument word processor and spreadsheet files are similarly
 | 
						|
structured but are not considered by this article.) The reader can
 | 
						|
easily see the content of an ODP file by using the "zip -l" command.
 | 
						|
For example, the following is the "zip -l" output from a 49-slide presentation
 | 
						|
about SQLite from the 2014
 | 
						|
<a href="http://southeastlinuxfest.org/">SouthEast LinuxFest</a>
 | 
						|
conference:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
Archive:  self2014.odp
 | 
						|
  Length      Date    Time    Name
 | 
						|
---------  ---------- -----   ----
 | 
						|
       47  2014-06-21 12:34   mimetype
 | 
						|
        0  2014-06-21 12:34   Configurations2/statusbar/
 | 
						|
        0  2014-06-21 12:34   Configurations2/accelerator/current.xml
 | 
						|
        0  2014-06-21 12:34   Configurations2/floater/
 | 
						|
        0  2014-06-21 12:34   Configurations2/popupmenu/
 | 
						|
        0  2014-06-21 12:34   Configurations2/progressbar/
 | 
						|
        0  2014-06-21 12:34   Configurations2/menubar/
 | 
						|
        0  2014-06-21 12:34   Configurations2/toolbar/
 | 
						|
        0  2014-06-21 12:34   Configurations2/images/Bitmaps/
 | 
						|
    54702  2014-06-21 12:34   Pictures/10000000000001F40000018C595A5A3D.png
 | 
						|
    46269  2014-06-21 12:34   Pictures/100000000000012C000000A8ED96BFD9.png
 | 
						|
<i>... 58 other pictures omitted...</i>
 | 
						|
    13013  2014-06-21 12:34   Pictures/10000000000000EE0000004765E03BA8.png
 | 
						|
  1005059  2014-06-21 12:34   Pictures/10000000000004760000034223EACEFD.png
 | 
						|
   211831  2014-06-21 12:34   content.xml
 | 
						|
    46169  2014-06-21 12:34   styles.xml
 | 
						|
     1001  2014-06-21 12:34   meta.xml
 | 
						|
     9291  2014-06-21 12:34   Thumbnails/thumbnail.png
 | 
						|
    38705  2014-06-21 12:34   Thumbnails/thumbnail.pdf
 | 
						|
     9664  2014-06-21 12:34   settings.xml
 | 
						|
     9704  2014-06-21 12:34   META-INF/manifest.xml
 | 
						|
---------                     -------
 | 
						|
 10961006                     78 files
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>
 | 
						|
The ODP ZIP archive contains four different XML files:
 | 
						|
content.xml, styles.xml, meta.xml, and settings.xml.  Those four files
 | 
						|
define the slide layout, text content, and styling.  This particular
 | 
						|
presentation contains 62 images, ranging from full-screen pictures to
 | 
						|
tiny icons, each stored as a separate file in the Pictures
 | 
						|
folder.  The "mimetype" file contains a single line of text that says:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
application/vnd.oasis.opendocument.presentation
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>The purpose of the other files and folders is presently 
 | 
						|
unknown to the author but is probably not difficult to figure out.
 | 
						|
 | 
						|
<h2>Limitations Of The OpenDocument Presentation Format</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
The use of a ZIP archive to encapsulate XML files plus resources is an
 | 
						|
elegant approach to an application file format.
 | 
						|
It is clearly superior to a custom binary file format.
 | 
						|
But using an SQLite database as the
 | 
						|
container, instead of ZIP, would be more elegant still.
 | 
						|
 | 
						|
<p>A ZIP archive is basically a key/value database, optimized for
 | 
						|
the case of write-once/read-many and for a relatively small number
 | 
						|
of distinct keys (a few hundred to a few thousand) each with a large BLOB
 | 
						|
as its value.  A ZIP archive can be viewed as a "pile-of-files"
 | 
						|
database.  This works, but it has some shortcomings relative to an
 | 
						|
SQLite database, as follows:
 | 
						|
 | 
						|
<ol>
 | 
						|
<li><p><b>Incremental update is hard.</b>
 | 
						|
<p>
 | 
						|
It is difficult to update individual entries in a ZIP archive.
 | 
						|
It is especially difficult to update individual entries in a ZIP
 | 
						|
archive in a way that does not destroy
 | 
						|
the entire document if the computer loses power and/or crashes
 | 
						|
in the middle of the update.  It is not impossible to do this, but
 | 
						|
it is sufficiently difficult that nobody actually does it.  Instead, whenever
 | 
						|
the user selects "File/Save", the entire ZIP archive is rewritten.  
 | 
						|
Hence, "File/Save" takes longer than it ought, especially on
 | 
						|
older hardware.  Newer machines are faster, but it is still bothersome
 | 
						|
that changing a single character in a 50 megabyte presentation causes one
 | 
						|
to burn through 50 megabytes of the finite write life on the SSD.
 | 
						|
 | 
						|
<li><p><b>Startup is slow.</b>
 | 
						|
<p>
 | 
						|
In keeping with the pile-of-files theme, OpenDocument stores all slide 
 | 
						|
content in a single big XML file named "content.xml".  
 | 
						|
LibreOffice reads and parses this entire file just to display
 | 
						|
the first slide.
 | 
						|
LibreOffice also seems to
 | 
						|
read all images into memory as well, which makes sense seeing as when
 | 
						|
the user does "File/Save" it is going to have to write them all back out
 | 
						|
again, even though none of them changed.  The net effect is that
 | 
						|
start-up is slow.  Double-clicking an OpenDocument file brings up a
 | 
						|
progress bar rather than the first slide.
 | 
						|
This results in a bad user experience.
 | 
						|
The situation grows ever more annoying as
 | 
						|
the document size increases.
 | 
						|
 | 
						|
<li><p><b>More memory is required.</b>
 | 
						|
<p>
 | 
						|
Because ZIP archives are optimized for storing big chunks of content, they
 | 
						|
encourage a style of programming where the entire document is read into
 | 
						|
memory at startup, all editing occurs in memory, then the entire document
 | 
						|
is written to disk during "File/Save".  OpenOffice and its descendants
 | 
						|
embrace that pattern.
 | 
						|
 | 
						|
<p>
 | 
						|
One might argue that it is ok, in this era of multi-gigabyte desktops, to
 | 
						|
read the entire document into memory.
 | 
						|
But it is not ok.
 | 
						|
For one, the amount of memory used far exceeds the (compressed) file size
 | 
						|
on disk.  So a 50MB presentation might take 200MB or more RAM.  
 | 
						|
That still is not a problem if one only edits a single document at a time.  
 | 
						|
But when working on a talk, this author will typically have 10 or 15 different 
 | 
						|
presentations up all at the same
 | 
						|
time (to facilitate copy/paste of slides from past presentation) and so
 | 
						|
gigabytes of memory are required.
 | 
						|
Add in an open web browser or two and a few other 
 | 
						|
desktop apps, and suddenly the disk is whirling and the machine is swapping.
 | 
						|
And even having just a single document is a problem when working
 | 
						|
on an inexpensive Chromebook retrofitted with Ubuntu.
 | 
						|
Using less memory is always better.
 | 
						|
</p>
 | 
						|
 | 
						|
<li><p><b>Crash recovery is difficult.</b>
 | 
						|
<p>
 | 
						|
The descendants of OpenOffice tend to segfault more often than commercial
 | 
						|
competitors.  Perhaps for this reason, the OpenOffice forks make
 | 
						|
periodic backups of their in-memory documents so that users do not lose
 | 
						|
all pending edits when the inevitable application crash does occur.
 | 
						|
This causes frustrating pauses in the application for the few seconds
 | 
						|
while each backup is being made.
 | 
						|
After restarting from a crash, the user is presented with a dialog box
 | 
						|
that walks them through the recovery process.  Managing the crash
 | 
						|
recovery this way involves lots of extra application logic and is
 | 
						|
generally an annoyance to the user.
 | 
						|
 | 
						|
<li><p><b>Content is inaccessible.</b>
 | 
						|
<p>
 | 
						|
One cannot easily view, change, or extract the content of an 
 | 
						|
OpenDocument presentation using generic tools.
 | 
						|
The only reasonable way to view or edit an OpenDocument document is to open
 | 
						|
it up using an application that is specifically designed to read or write
 | 
						|
OpenDocument (read: LibreOffice or one of its cousins).  The situation
 | 
						|
could be worse.  One can extract and view individual images (say) from
 | 
						|
a presentation using just the "zip" archiver tool.  But it is not reasonable
 | 
						|
try to extract the text from a slide.  Remember that all content is stored
 | 
						|
in a single "context.xml" file.  That file is XML, so it is a text file.
 | 
						|
But it is not a text file that can be managed with an ordinary text
 | 
						|
editor.  For the example presentation above, the content.xml file
 | 
						|
consist of exactly two lines. The first line of the file is just:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
<?xml version="1.0" encoding="UTF-8"?>
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>The second line of the file contains 211792 characters of
 | 
						|
impenetrable XML.  Yes, 211792 characters all on one line.
 | 
						|
This file is a good stress-test for a text editor.
 | 
						|
Thankfully, the file is not some obscure
 | 
						|
binary format, but in terms of accessibility, it might as well be
 | 
						|
written in Klingon.
 | 
						|
</ol>
 | 
						|
 | 
						|
<h2>First Improvement:  Replace ZIP with SQLite</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
Let us suppose that instead of using a ZIP archive to store its files,
 | 
						|
OpenDocument used a very simple SQLite database with the following
 | 
						|
single-table schema:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
CREATE TABLE OpenDocTree(
 | 
						|
  filename TEXT PRIMARY KEY,  -- Name of file
 | 
						|
  filesize BIGINT,            -- Size of file after decompression
 | 
						|
  content BLOB                -- Compressed file content
 | 
						|
);
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>
 | 
						|
For this first experiment, nothing else about the file format is changed.
 | 
						|
The OpenDocument is still a pile-of-files, only now each file is a row
 | 
						|
in an SQLite database rather than an entry in a ZIP archive.
 | 
						|
This simple change does not use the power of a relational
 | 
						|
database.  Even so, this simple change shows some improvements.
 | 
						|
 | 
						|
<a name="smaller"></a>
 | 
						|
 | 
						|
<p>
 | 
						|
Surprisingly, using SQLite in place of ZIP makes the presentation
 | 
						|
file smaller.  Really.  One would think that a relational database file
 | 
						|
would be larger than a ZIP archive, but at least in the case of NeoOffice
 | 
						|
that is not so.  The following is an actual screen-scrape showing
 | 
						|
the sizes of the same NeoOffice presentation, both in its original 
 | 
						|
ZIP archive format as generated by NeoOffice (self2014.odp), and 
 | 
						|
as repacked as an SQLite database using the 
 | 
						|
<a href="http://www.sqlite.org/sqlar/doc/trunk/README.md">SQLAR</a> utility:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
-rw-r--r--  1 drh  staff  10514994 Jun  8 14:32 self2014.odp
 | 
						|
-rw-r--r--  1 drh  staff  10464256 Jun  8 14:37 self2014.sqlar
 | 
						|
-rw-r--r--  1 drh  staff  10416644 Jun  8 14:40 zip.odp
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>
 | 
						|
The SQLite database file ("self2014.sqlar") is about a
 | 
						|
half percent smaller than the equivalent ODP file!  How can this be?
 | 
						|
Apparently the ZIP archive generator logic in NeoOffice
 | 
						|
is not as efficient as it could be, because when the same pile-of-files
 | 
						|
is recompressed using the command-line "zip" utility, one gets a file
 | 
						|
("zip.odp") that is smaller still, by another half percent, as seen
 | 
						|
in the third line above.  So, a well-written ZIP archive
 | 
						|
can be slightly smaller than the equivalent SQLite database, as one would
 | 
						|
expect.  But the difference is slight.  The key take-away is that an
 | 
						|
SQLite database is size-competitive with a ZIP archive.
 | 
						|
 | 
						|
<p>
 | 
						|
The other advantage to using SQLite in place of
 | 
						|
ZIP is that the document can now be updated incrementally, without risk
 | 
						|
of corrupting the document if a power loss or other crash occurs in the
 | 
						|
middle of the update.  (Remember that writes to 
 | 
						|
<a href="atomiccommit.html">SQLite databases are atomic</a>.)   True, all the
 | 
						|
content is still kept in a single big XML file ("content.xml") which must
 | 
						|
be completely rewritten if so much as a single character changes.  But
 | 
						|
with SQLite, only that one file needs to change.  The other 77 files in the
 | 
						|
repository can remain unaltered.  They do not all have to be rewritten,
 | 
						|
which in turn makes "File/Save" run much faster and saves wear on SSDs.
 | 
						|
 | 
						|
<h2>Second Improvement:  Split content into smaller pieces</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
A pile-of-files encourages content to be stored in a few large chunks.
 | 
						|
In the case of ODP, there are just four XML files that define the layout
 | 
						|
off all slides in a presentation.  An SQLite database allows storing
 | 
						|
information in a few large chunks, but SQLite is also adept and efficient
 | 
						|
at storing information in numerous smaller pieces.
 | 
						|
 | 
						|
<p>
 | 
						|
So then, instead of storing all content for all slides in a single
 | 
						|
oversized XML file ("content.xml"), suppose there was a separate table
 | 
						|
for storing the content of each slide separately.  The table schema
 | 
						|
might look something like this:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
CREATE TABLE slide(
 | 
						|
  pageNumber INTEGER,   -- The slide page number
 | 
						|
  slideContent TEXT     -- Slide content as XML or JSON
 | 
						|
);
 | 
						|
CREATE INDEX slide_pgnum ON slide(pageNumber); -- Optional
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>The content of each slide could still be stored as compressed XML.
 | 
						|
But now each page is stored separately.  So when opening a new document,
 | 
						|
the application could simply run:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
SELECT slideContent FROM slide WHERE pageNumber=1;
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>This query will quickly and efficiently return the content of the first
 | 
						|
slide, which could then be speedily parsed and displayed to the user.
 | 
						|
Only one page needs to be read and parsed in order render the first screen,
 | 
						|
which means that the first screen appears much faster and
 | 
						|
there is no longer a need for an annoying progress bar.
 | 
						|
 | 
						|
<p>If the application wanted
 | 
						|
to keep all content in memory, it could continue reading and parsing the
 | 
						|
other pages using a background thread after drawing the first page.  Or,
 | 
						|
since reading from SQLite is so efficient, the application might 
 | 
						|
instead choose to reduce its memory footprint and only keep a single
 | 
						|
slide in memory at a time.  Or maybe it keeps the current slide and the
 | 
						|
next slide in memory, to facility rapid transitions to the next slide.
 | 
						|
 | 
						|
<p>
 | 
						|
Notice that dividing up the content into smaller pieces using an SQLite
 | 
						|
table gives flexibility to the implementation.  The application can choose
 | 
						|
to read all content into memory at startup.  Or it can read just a
 | 
						|
few pages into memory and keep the rest on disk.  Or it can read just
 | 
						|
single page into memory at a time.  And different versions of the application
 | 
						|
can make different choices without having to make any changes to the
 | 
						|
file format.  Such options are not available when all content is in
 | 
						|
a single big XML file in a ZIP archive.
 | 
						|
 | 
						|
<p>
 | 
						|
Splitting content into smaller pieces also helps File/Save operations
 | 
						|
to go faster.  Instead of having to write back the content of all pages
 | 
						|
when doing a File/Save, the application only has to write back those
 | 
						|
pages that have actually changed.
 | 
						|
 | 
						|
<p>
 | 
						|
One minor downside of splitting content into smaller pieces is that
 | 
						|
compression does not work as well on shorter texts and so the size of
 | 
						|
the document might increase.  But as the bulk of the document space 
 | 
						|
is used to store images, a small reduction in the compression efficiency 
 | 
						|
of the text content will hardly be noticeable, and is a small price 
 | 
						|
to pay for an improved user experience.
 | 
						|
 | 
						|
<h2>Third Improvement:  Versioning</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
Once one is comfortable with the concept of storing each slide separately,
 | 
						|
it is a small step to support versioning of the presentation.  Consider
 | 
						|
the following schema:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
CREATE TABLE slide(
 | 
						|
  slideId INTEGER PRIMARY KEY,
 | 
						|
  derivedFrom INTEGER REFERENCES slide,
 | 
						|
  content TEXT     -- XML or JSON or whatever
 | 
						|
);
 | 
						|
CREATE TABLE version(
 | 
						|
  versionId INTEGER PRIMARY KEY,
 | 
						|
  priorVersion INTEGER REFERENCES version,
 | 
						|
  checkinTime DATETIME,   -- When this version was saved
 | 
						|
  comment TEXT,           -- Description of this version
 | 
						|
  manifest TEXT           -- List of integer slideIds
 | 
						|
);
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>
 | 
						|
In this schema, instead of each slide having a page number that determines
 | 
						|
its order within the presentation, each slide has a unique
 | 
						|
integer identifier that is unrelated to where it occurs in sequence.
 | 
						|
The order of slides in the presentation is determined by a list of
 | 
						|
slideIds, stored as a text string in the MANIFEST column of the VERSION
 | 
						|
table.
 | 
						|
Since multiple entries are allowed in the VERSION table, that means that
 | 
						|
multiple presentations can be stored in the same document.
 | 
						|
 | 
						|
<p>
 | 
						|
On startup, the application first decides which version it
 | 
						|
wants to display.  Since the versionId will naturally increase in time
 | 
						|
and one would normally want to see the latest version, an appropriate
 | 
						|
query might be:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
SELECT manifest, versionId FROM version ORDER BY versionId DESC LIMIT 1;
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>
 | 
						|
Or perhaps the application would rather use the
 | 
						|
most recent checkinTime:
 | 
						|
 | 
						|
<blockquote><pre>
 | 
						|
SELECT manifest, versionId, max(checkinTime) FROM version;
 | 
						|
</pre></blockquote>
 | 
						|
 | 
						|
<p>
 | 
						|
Using a single query such as the above, the application obtains a list
 | 
						|
of the slideIds for all slides in the presentation.  The application then
 | 
						|
queries for the content of the first slide, and parses and displays that
 | 
						|
content, as before.
 | 
						|
 | 
						|
<p>(Aside:  Yes, that second query above that uses "max(checkinTime)"
 | 
						|
really does work and really does return a well-defined answer in SQLite.
 | 
						|
Such a query either returns an undefined answer or generates an error
 | 
						|
in many other SQL database engines, but in SQLite it does what you would 
 | 
						|
expect: it returns the manifest and versionId of the entry that has the
 | 
						|
maximum checkinTime.)
 | 
						|
 | 
						|
<p>When the user does a "File/Save", instead of overwriting the modified
 | 
						|
slides, the application can now make new entries in the SLIDE table for
 | 
						|
just those slides that have been added or altered.  Then it creates a
 | 
						|
new entry in the VERSION table containing the revised manifest.
 | 
						|
 | 
						|
<p>The VERSION table shown above has columns to record a check-in comment
 | 
						|
(presumably supplied by the user) and the time and date at which the File/Save
 | 
						|
action occurred.  It also records the parent version to record the history
 | 
						|
of changes.  Perhaps the manifest could be stored as a delta from the
 | 
						|
parent version, though typically the manifest will be small enough that
 | 
						|
storing a delta might be more trouble than it is worth.  The SLIDE table
 | 
						|
also contains a derivedFrom column which could be used for delta encoding
 | 
						|
if it is determined that saving the slide content as a delta from its
 | 
						|
previous version is a worthwhile optimization.
 | 
						|
 | 
						|
<p>So with this simple change, the ODP file now stores not just the most
 | 
						|
recent edit to the presentation, but a history of all historic edits.  The
 | 
						|
user would normally want to see just the most recent edition of the
 | 
						|
presentation, but if desired, the user can now go backwards in time to 
 | 
						|
see historical versions of the same presentation.
 | 
						|
 | 
						|
<p>Or, multiple presentations could be stored within the same document.
 | 
						|
 | 
						|
<p>With such a schema, the application would no longer need to make
 | 
						|
periodic backups of the unsaved changes to a separate file to avoid lost
 | 
						|
work in the event of a crash.  Instead, a special "pending" version could
 | 
						|
be allocated and unsaved changes could be written into the pending version.
 | 
						|
Because only changes would need to be written, not the entire document,
 | 
						|
saving the pending changes would only involve writing a few kilobytes of
 | 
						|
content, not multiple megabytes, and would take milliseconds instead of
 | 
						|
seconds, and so it could be done frequently and silently in the background.
 | 
						|
Then when a crash occurs and the user reboots, all (or almost all)
 | 
						|
of their work is retained.  If the user decides to discard unsaved changes, 
 | 
						|
they simply go back to the previous version.
 | 
						|
 | 
						|
<p>
 | 
						|
There are details to fill in here.
 | 
						|
Perhaps a screen can be provided that displays a history changes
 | 
						|
(perhaps with a graph) allowing the user to select which version they
 | 
						|
want to view or edit.  Perhaps some facility can be provided to merge
 | 
						|
forks that might occur in the version history.  And perhaps the
 | 
						|
application should provide a means to purge old and unwanted versions.
 | 
						|
The key point is that using an SQLite database to store the content,
 | 
						|
rather than a ZIP archive, makes all of these features much, much easier
 | 
						|
to implement, which increases the possibility that they will eventually
 | 
						|
get implemented.
 | 
						|
 | 
						|
<h2>And So Forth...</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
In the previous sections, we have seen how moving from a key/value
 | 
						|
store implemented as a ZIP archive to a simple SQLite database
 | 
						|
with just three tables can add significant capabilities to an application
 | 
						|
file format.
 | 
						|
We could continue to enhance the schema with new tables, with indexes
 | 
						|
added for performance, with triggers and views for programming convenience,
 | 
						|
and constraints to enforce consistency of content even in the face of
 | 
						|
programming errors.  Further enhancement ideas include:
 | 
						|
<ul>
 | 
						|
<li> Store an <a href="undoredo.html">automated undo/redo stack</a> in a database table so that
 | 
						|
     Undo could go back into prior edit sessions.
 | 
						|
<li> Add <a href="fts3.html#fts4">full text search</a> capabilities to the slide deck, or across
 | 
						|
     multiple slide decks.
 | 
						|
<li> Decompose the "settings.xml" file into an SQL table that
 | 
						|
     is more easily viewed and edited by separate applications.
 | 
						|
<li> Break out the "Presentor Notes" from each slide into a separate
 | 
						|
     table, for easier access from third-party applications and/or scripts.
 | 
						|
<li> Enhance the presentation concept beyond the simple linear sequence of
 | 
						|
     slides to allow for side-tracks and excursions to be taken depending on
 | 
						|
     how the audience is responding.
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>
 | 
						|
An SQLite database has a lot of capability, which
 | 
						|
this essay has only begun to touch upon.  But hopefully this quick glimpse
 | 
						|
has convinced some readers that using an SQL database as an application
 | 
						|
file format is worth a second look.
 | 
						|
 | 
						|
<p>
 | 
						|
Some readers might resist using SQLite as an application
 | 
						|
file format due to prior exposure to enterprise SQL databases and
 | 
						|
the caveats and limitations of those other systems.  
 | 
						|
For example, many enterprise database
 | 
						|
engines advise against storing large strings or BLOBs in the database
 | 
						|
and instead suggest that large strings and BLOBs be stored as separate
 | 
						|
files and the filename stored in the database.  But SQLite 
 | 
						|
is not like that.  Any column of an SQLite database can hold
 | 
						|
a string or BLOB up to about a gigabyte in size.  And for strings and
 | 
						|
BLOBs of 100 kilobytes or less, 
 | 
						|
<a href="intern-v-extern-blob.html">I/O performance is better</a> than using separate
 | 
						|
files.
 | 
						|
 | 
						|
<p>
 | 
						|
Some readers might be reluctant to consider SQLite as an application
 | 
						|
file format because they have been inculcated with the idea that all
 | 
						|
SQL database schemas must be factored into third normal form and store
 | 
						|
only small primitive data types such as strings and integers.  Certainly
 | 
						|
relational theory is important and designers should strive to understand
 | 
						|
it.  But, as demonstrated above, it is often quite acceptable to store
 | 
						|
complex information as XML or JSON in text fields of a database.
 | 
						|
Do what works, not what your database professor said you ought to do.
 | 
						|
 | 
						|
<h2>Review Of The Benefits Of Using SQLite</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
In summary,
 | 
						|
the claim of this essay is that using SQLite as a container for an application
 | 
						|
file format like OpenDocument
 | 
						|
and storing lots of smaller objects in that container
 | 
						|
works out much better than using a ZIP archive holding a few larger objects.
 | 
						|
To wit:
 | 
						|
 | 
						|
<ol>
 | 
						|
<li><p>
 | 
						|
An SQLite database file is approximately the same size, and in some cases
 | 
						|
smaller, than a ZIP archive holding the same information.
 | 
						|
 | 
						|
<li><p>
 | 
						|
The <a href="atomiccommit.html">atomic update capabilities</a>
 | 
						|
of SQLite allow small incremental changes
 | 
						|
to be safely written into the document.  This reduces total disk I/O
 | 
						|
and improves File/Save performance, enhancing the user experience.
 | 
						|
 | 
						|
<li><p>
 | 
						|
Startup time is reduced by allowing the application to read in only the
 | 
						|
content shown for the initial screen.  This largely eliminates the
 | 
						|
need to show a progress bar when opening a new document.  The document
 | 
						|
just pops up immediately, further enhancing the user experience.
 | 
						|
 | 
						|
<li><p>
 | 
						|
The memory footprint of the application can be dramatically reduced by
 | 
						|
only loading content that is relevant to the current display and keeping
 | 
						|
the bulk of the content on disk.  The fast query capability of SQLite
 | 
						|
make this a viable alternative to keeping all content in memory at all times.
 | 
						|
And when applications use less memory, it makes the entire computer more
 | 
						|
responsive, further enhancing the user experience.
 | 
						|
 | 
						|
<li><p>
 | 
						|
The schema of an SQL database is able to represent information more directly
 | 
						|
and succinctly than a key/value database such as a ZIP archive.  This makes
 | 
						|
the document content more accessible to third-party applications and scripts
 | 
						|
and facilitates advanced features such as built-in document versioning, and
 | 
						|
incremental saving of work in progress for recovery after a crash.
 | 
						|
</ol>
 | 
						|
 | 
						|
<p>
 | 
						|
These are just a few of the benefits of using SQLite as an application file
 | 
						|
format — the benefits that seem most likely to improve the user
 | 
						|
experience for applications like OpenOffice.  Other applications might
 | 
						|
benefit from SQLite in different ways. See the <a href="appfileformat.html">Application File Format</a>
 | 
						|
document for additional ideas.
 | 
						|
 | 
						|
<p>
 | 
						|
Finally, let us reiterate that this essay is a thought experiment.
 | 
						|
The OpenDocument format is well-established and already well-designed.
 | 
						|
Nobody really believes that OpenDocument should be changed to use SQLite
 | 
						|
as its container instead of ZIP.  Nor is this article a criticism of
 | 
						|
OpenDocument for not choosing SQLite as its container since OpenDocument
 | 
						|
predates SQLite.  Rather, the point of this article is to use OpenDocument
 | 
						|
as a concrete example of how SQLite can be used to build better 
 | 
						|
application file formats for future projects.
 | 
						|
 |