662 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			662 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE html>
 | |
| <html><head>
 | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0">
 | |
| <meta http-equiv="content-type" content="text/html; charset=UTF-8">
 | |
| <link href="sqlite.css" rel="stylesheet">
 | |
| <title>What If OpenDocument Used SQLite?</title>
 | |
| <!-- path= -->
 | |
| </head>
 | |
| <body>
 | |
| <div class=nosearch>
 | |
| <a href="index.html">
 | |
| <img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
 | |
| </a>
 | |
| <div><!-- IE hack to prevent disappearing logo --></div>
 | |
| <div class="tagline desktoponly">
 | |
| Small. Fast. Reliable.<br>Choose any three.
 | |
| </div>
 | |
| <div class="menu mainmenu">
 | |
| <ul>
 | |
| <li><a href="index.html">Home</a>
 | |
| <li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
 | |
| <li class='wideonly'><a href='about.html'>About</a>
 | |
| <li class='desktoponly'><a href="docs.html">Documentation</a>
 | |
| <li class='desktoponly'><a href="download.html">Download</a>
 | |
| <li class='wideonly'><a href='copyright.html'>License</a>
 | |
| <li class='desktoponly'><a href="support.html">Support</a>
 | |
| <li class='desktoponly'><a href="prosupport.html">Purchase</a>
 | |
| <li class='search' id='search_menubutton'>
 | |
| <a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
 | |
| </ul>
 | |
| </div>
 | |
| <div class="menu submenu" id="submenu">
 | |
| <ul>
 | |
| <li><a href='about.html'>About</a>
 | |
| <li><a href='docs.html'>Documentation</a>
 | |
| <li><a href='download.html'>Download</a>
 | |
| <li><a href='support.html'>Support</a>
 | |
| <li><a href='prosupport.html'>Purchase</a>
 | |
| </ul>
 | |
| </div>
 | |
| <div class="searchmenu" id="searchmenu">
 | |
| <form method="GET" action="search">
 | |
| <select name="s" id="searchtype">
 | |
| <option value="d">Search Documentation</option>
 | |
| <option value="c">Search Changelog</option>
 | |
| </select>
 | |
| <input type="text" name="q" id="searchbox" value="">
 | |
| <input type="submit" value="Go">
 | |
| </form>
 | |
| </div>
 | |
| </div>
 | |
| <script>
 | |
| function toggle_div(nm) {
 | |
| var w = document.getElementById(nm);
 | |
| if( w.style.display=="block" ){
 | |
| w.style.display = "none";
 | |
| }else{
 | |
| w.style.display = "block";
 | |
| }
 | |
| }
 | |
| function toggle_search() {
 | |
| var w = document.getElementById("searchmenu");
 | |
| if( w.style.display=="block" ){
 | |
| w.style.display = "none";
 | |
| } else {
 | |
| w.style.display = "block";
 | |
| setTimeout(function(){
 | |
| document.getElementById("searchbox").focus()
 | |
| }, 30);
 | |
| }
 | |
| }
 | |
| function div_off(nm){document.getElementById(nm).style.display="none";}
 | |
| window.onbeforeunload = function(e){div_off("submenu");}
 | |
| /* Disable the Search feature if we are not operating from CGI, since */
 | |
| /* Search is accomplished using CGI and will not work without it. */
 | |
| if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
 | |
| document.getElementById("search_menubutton").style.display = "none";
 | |
| }
 | |
| /* Used by the Hide/Show button beside syntax diagrams, to toggle the */
 | |
| function hideorshow(btn,obj){
 | |
| var x = document.getElementById(obj);
 | |
| var b = document.getElementById(btn);
 | |
| if( x.style.display!='none' ){
 | |
| x.style.display = 'none';
 | |
| b.innerHTML='show';
 | |
| }else{
 | |
| x.style.display = '';
 | |
| b.innerHTML='hide';
 | |
| }
 | |
| return false;
 | |
| }
 | |
| </script>
 | |
| </div>
 | |
| 
 | |
| 
 | |
| 
 | |
| <h1 align="center">
 | |
| What If OpenDocument Used SQLite?</h1>
 | |
| 
 | |
| <h2>Introduction</h2>
 | |
| 
 | |
| <p>Suppose the
 | |
| <a href="http://en.wikipedia.org/wiki/OpenDocument">OpenDocument</a> file format,
 | |
| and specifically the "ODP" OpenDocument Presentation format, were
 | |
| built around SQLite.  Benefits would include:
 | |
| <ul>
 | |
| <li>Smaller documents
 | |
| <li>Faster File/Save times
 | |
| <li>Faster startup times
 | |
| <li>Less memory used
 | |
| <li>Document versioning
 | |
| <li>A better user experience
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| Note that this is only a thought experiment.
 | |
| We are not suggesting that OpenDocument be changed.
 | |
| Nor is this article a criticism of the current OpenDocument
 | |
| design.  The point of this essay is to suggest ways to improve
 | |
| future file format designs.
 | |
| 
 | |
| <h2>About OpenDocument And OpenDocument Presentation</h2>
 | |
| 
 | |
| <p>
 | |
| The OpenDocument file format is used for office applications:
 | |
| word processors, spreadsheets, and presentations.  It was originally
 | |
| designed for the OpenOffice suite but has since been incorporated into
 | |
| other desktop application suites.  The OpenOffice application has been
 | |
| forked and renamed a few times.  This author's primary use for OpenDocument is 
 | |
| building slide presentations with either 
 | |
| <a href="https://www.neooffice.org/neojava/en/index.php">NeoOffice</a> on Mac, or
 | |
| <a href="http://www.libreoffice.org/">LibreOffice</a> on Linux and Windows.
 | |
| 
 | |
| <p>
 | |
| An OpenDocument Presentation or "ODP" file is a
 | |
| <a href="http://en.wikipedia.org/wiki/Zip_%28file_format%29">ZIP archive</a> containing
 | |
| XML files describing presentation slides and separate image files for the
 | |
| various images that are included as part of the presentation.
 | |
| (OpenDocument word processor and spreadsheet files are similarly
 | |
| structured but are not considered by this article.) The reader can
 | |
| easily see the content of an ODP file by using the "zip -l" command.
 | |
| For example, the following is the "zip -l" output from a 49-slide presentation
 | |
| about SQLite from the 2014
 | |
| <a href="http://southeastlinuxfest.org/">SouthEast LinuxFest</a>
 | |
| conference:
 | |
| 
 | |
| <blockquote><pre>
 | |
| Archive:  self2014.odp
 | |
|   Length      Date    Time    Name
 | |
| ---------  ---------- -----   ----
 | |
|        47  2014-06-21 12:34   mimetype
 | |
|         0  2014-06-21 12:34   Configurations2/statusbar/
 | |
|         0  2014-06-21 12:34   Configurations2/accelerator/current.xml
 | |
|         0  2014-06-21 12:34   Configurations2/floater/
 | |
|         0  2014-06-21 12:34   Configurations2/popupmenu/
 | |
|         0  2014-06-21 12:34   Configurations2/progressbar/
 | |
|         0  2014-06-21 12:34   Configurations2/menubar/
 | |
|         0  2014-06-21 12:34   Configurations2/toolbar/
 | |
|         0  2014-06-21 12:34   Configurations2/images/Bitmaps/
 | |
|     54702  2014-06-21 12:34   Pictures/10000000000001F40000018C595A5A3D.png
 | |
|     46269  2014-06-21 12:34   Pictures/100000000000012C000000A8ED96BFD9.png
 | |
| <i>... 58 other pictures omitted...</i>
 | |
|     13013  2014-06-21 12:34   Pictures/10000000000000EE0000004765E03BA8.png
 | |
|   1005059  2014-06-21 12:34   Pictures/10000000000004760000034223EACEFD.png
 | |
|    211831  2014-06-21 12:34   content.xml
 | |
|     46169  2014-06-21 12:34   styles.xml
 | |
|      1001  2014-06-21 12:34   meta.xml
 | |
|      9291  2014-06-21 12:34   Thumbnails/thumbnail.png
 | |
|     38705  2014-06-21 12:34   Thumbnails/thumbnail.pdf
 | |
|      9664  2014-06-21 12:34   settings.xml
 | |
|      9704  2014-06-21 12:34   META-INF/manifest.xml
 | |
| ---------                     -------
 | |
|  10961006                     78 files
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>
 | |
| The ODP ZIP archive contains four different XML files:
 | |
| content.xml, styles.xml, meta.xml, and settings.xml.  Those four files
 | |
| define the slide layout, text content, and styling.  This particular
 | |
| presentation contains 62 images, ranging from full-screen pictures to
 | |
| tiny icons, each stored as a separate file in the Pictures
 | |
| folder.  The "mimetype" file contains a single line of text that says:
 | |
| 
 | |
| <blockquote><pre>
 | |
| application/vnd.oasis.opendocument.presentation
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>The purpose of the other files and folders is presently 
 | |
| unknown to the author but is probably not difficult to figure out.
 | |
| 
 | |
| <h2>Limitations Of The OpenDocument Presentation Format</h2>
 | |
| 
 | |
| <p>
 | |
| The use of a ZIP archive to encapsulate XML files plus resources is an
 | |
| elegant approach to an application file format.
 | |
| It is clearly superior to a custom binary file format.
 | |
| But using an SQLite database as the
 | |
| container, instead of ZIP, would be more elegant still.
 | |
| 
 | |
| <p>A ZIP archive is basically a key/value database, optimized for
 | |
| the case of write-once/read-many and for a relatively small number
 | |
| of distinct keys (a few hundred to a few thousand) each with a large BLOB
 | |
| as its value.  A ZIP archive can be viewed as a "pile-of-files"
 | |
| database.  This works, but it has some shortcomings relative to an
 | |
| SQLite database, as follows:
 | |
| 
 | |
| <ol>
 | |
| <li><p><b>Incremental update is hard.</b>
 | |
| <p>
 | |
| It is difficult to update individual entries in a ZIP archive.
 | |
| It is especially difficult to update individual entries in a ZIP
 | |
| archive in a way that does not destroy
 | |
| the entire document if the computer loses power and/or crashes
 | |
| in the middle of the update.  It is not impossible to do this, but
 | |
| it is sufficiently difficult that nobody actually does it.  Instead, whenever
 | |
| the user selects "File/Save", the entire ZIP archive is rewritten.  
 | |
| Hence, "File/Save" takes longer than it ought, especially on
 | |
| older hardware.  Newer machines are faster, but it is still bothersome
 | |
| that changing a single character in a 50 megabyte presentation causes one
 | |
| to burn through 50 megabytes of the finite write life on the SSD.
 | |
| 
 | |
| <li><p><b>Startup is slow.</b>
 | |
| <p>
 | |
| In keeping with the pile-of-files theme, OpenDocument stores all slide 
 | |
| content in a single big XML file named "content.xml".  
 | |
| LibreOffice reads and parses this entire file just to display
 | |
| the first slide.
 | |
| LibreOffice also seems to
 | |
| read all images into memory as well, which makes sense seeing as when
 | |
| the user does "File/Save" it is going to have to write them all back out
 | |
| again, even though none of them changed.  The net effect is that
 | |
| start-up is slow.  Double-clicking an OpenDocument file brings up a
 | |
| progress bar rather than the first slide.
 | |
| This results in a bad user experience.
 | |
| The situation grows ever more annoying as
 | |
| the document size increases.
 | |
| 
 | |
| <li><p><b>More memory is required.</b>
 | |
| <p>
 | |
| Because ZIP archives are optimized for storing big chunks of content, they
 | |
| encourage a style of programming where the entire document is read into
 | |
| memory at startup, all editing occurs in memory, then the entire document
 | |
| is written to disk during "File/Save".  OpenOffice and its descendants
 | |
| embrace that pattern.
 | |
| 
 | |
| <p>
 | |
| One might argue that it is ok, in this era of multi-gigabyte desktops, to
 | |
| read the entire document into memory.
 | |
| But it is not ok.
 | |
| For one, the amount of memory used far exceeds the (compressed) file size
 | |
| on disk.  So a 50MB presentation might take 200MB or more RAM.  
 | |
| That still is not a problem if one only edits a single document at a time.  
 | |
| But when working on a talk, this author will typically have 10 or 15 different 
 | |
| presentations up all at the same
 | |
| time (to facilitate copy/paste of slides from past presentation) and so
 | |
| gigabytes of memory are required.
 | |
| Add in an open web browser or two and a few other 
 | |
| desktop apps, and suddenly the disk is whirling and the machine is swapping.
 | |
| And even having just a single document is a problem when working
 | |
| on an inexpensive Chromebook retrofitted with Ubuntu.
 | |
| Using less memory is always better.
 | |
| </p>
 | |
| 
 | |
| <li><p><b>Crash recovery is difficult.</b>
 | |
| <p>
 | |
| The descendants of OpenOffice tend to segfault more often than commercial
 | |
| competitors.  Perhaps for this reason, the OpenOffice forks make
 | |
| periodic backups of their in-memory documents so that users do not lose
 | |
| all pending edits when the inevitable application crash does occur.
 | |
| This causes frustrating pauses in the application for the few seconds
 | |
| while each backup is being made.
 | |
| After restarting from a crash, the user is presented with a dialog box
 | |
| that walks them through the recovery process.  Managing the crash
 | |
| recovery this way involves lots of extra application logic and is
 | |
| generally an annoyance to the user.
 | |
| 
 | |
| <li><p><b>Content is inaccessible.</b>
 | |
| <p>
 | |
| One cannot easily view, change, or extract the content of an 
 | |
| OpenDocument presentation using generic tools.
 | |
| The only reasonable way to view or edit an OpenDocument document is to open
 | |
| it up using an application that is specifically designed to read or write
 | |
| OpenDocument (read: LibreOffice or one of its cousins).  The situation
 | |
| could be worse.  One can extract and view individual images (say) from
 | |
| a presentation using just the "zip" archiver tool.  But it is not reasonable
 | |
| try to extract the text from a slide.  Remember that all content is stored
 | |
| in a single "context.xml" file.  That file is XML, so it is a text file.
 | |
| But it is not a text file that can be managed with an ordinary text
 | |
| editor.  For the example presentation above, the content.xml file
 | |
| consist of exactly two lines. The first line of the file is just:
 | |
| 
 | |
| <blockquote><pre>
 | |
| <?xml version="1.0" encoding="UTF-8"?>
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>The second line of the file contains 211792 characters of
 | |
| impenetrable XML.  Yes, 211792 characters all on one line.
 | |
| This file is a good stress-test for a text editor.
 | |
| Thankfully, the file is not some obscure
 | |
| binary format, but in terms of accessibility, it might as well be
 | |
| written in Klingon.
 | |
| </ol>
 | |
| 
 | |
| <h2>First Improvement:  Replace ZIP with SQLite</h2>
 | |
| 
 | |
| <p>
 | |
| Let us suppose that instead of using a ZIP archive to store its files,
 | |
| OpenDocument used a very simple SQLite database with the following
 | |
| single-table schema:
 | |
| 
 | |
| <blockquote><pre>
 | |
| CREATE TABLE OpenDocTree(
 | |
|   filename TEXT PRIMARY KEY,  -- Name of file
 | |
|   filesize BIGINT,            -- Size of file after decompression
 | |
|   content BLOB                -- Compressed file content
 | |
| );
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>
 | |
| For this first experiment, nothing else about the file format is changed.
 | |
| The OpenDocument is still a pile-of-files, only now each file is a row
 | |
| in an SQLite database rather than an entry in a ZIP archive.
 | |
| This simple change does not use the power of a relational
 | |
| database.  Even so, this simple change shows some improvements.
 | |
| 
 | |
| <a name="smaller"></a>
 | |
| 
 | |
| <p>
 | |
| Surprisingly, using SQLite in place of ZIP makes the presentation
 | |
| file smaller.  Really.  One would think that a relational database file
 | |
| would be larger than a ZIP archive, but at least in the case of NeoOffice
 | |
| that is not so.  The following is an actual screen-scrape showing
 | |
| the sizes of the same NeoOffice presentation, both in its original 
 | |
| ZIP archive format as generated by NeoOffice (self2014.odp), and 
 | |
| as repacked as an SQLite database using the 
 | |
| <a href="http://www.sqlite.org/sqlar/doc/trunk/README.md">SQLAR</a> utility:
 | |
| 
 | |
| <blockquote><pre>
 | |
| -rw-r--r--  1 drh  staff  10514994 Jun  8 14:32 self2014.odp
 | |
| -rw-r--r--  1 drh  staff  10464256 Jun  8 14:37 self2014.sqlar
 | |
| -rw-r--r--  1 drh  staff  10416644 Jun  8 14:40 zip.odp
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>
 | |
| The SQLite database file ("self2014.sqlar") is about a
 | |
| half percent smaller than the equivalent ODP file!  How can this be?
 | |
| Apparently the ZIP archive generator logic in NeoOffice
 | |
| is not as efficient as it could be, because when the same pile-of-files
 | |
| is recompressed using the command-line "zip" utility, one gets a file
 | |
| ("zip.odp") that is smaller still, by another half percent, as seen
 | |
| in the third line above.  So, a well-written ZIP archive
 | |
| can be slightly smaller than the equivalent SQLite database, as one would
 | |
| expect.  But the difference is slight.  The key take-away is that an
 | |
| SQLite database is size-competitive with a ZIP archive.
 | |
| 
 | |
| <p>
 | |
| The other advantage to using SQLite in place of
 | |
| ZIP is that the document can now be updated incrementally, without risk
 | |
| of corrupting the document if a power loss or other crash occurs in the
 | |
| middle of the update.  (Remember that writes to 
 | |
| <a href="atomiccommit.html">SQLite databases are atomic</a>.)   True, all the
 | |
| content is still kept in a single big XML file ("content.xml") which must
 | |
| be completely rewritten if so much as a single character changes.  But
 | |
| with SQLite, only that one file needs to change.  The other 77 files in the
 | |
| repository can remain unaltered.  They do not all have to be rewritten,
 | |
| which in turn makes "File/Save" run much faster and saves wear on SSDs.
 | |
| 
 | |
| <h2>Second Improvement:  Split content into smaller pieces</h2>
 | |
| 
 | |
| <p>
 | |
| A pile-of-files encourages content to be stored in a few large chunks.
 | |
| In the case of ODP, there are just four XML files that define the layout
 | |
| off all slides in a presentation.  An SQLite database allows storing
 | |
| information in a few large chunks, but SQLite is also adept and efficient
 | |
| at storing information in numerous smaller pieces.
 | |
| 
 | |
| <p>
 | |
| So then, instead of storing all content for all slides in a single
 | |
| oversized XML file ("content.xml"), suppose there was a separate table
 | |
| for storing the content of each slide separately.  The table schema
 | |
| might look something like this:
 | |
| 
 | |
| <blockquote><pre>
 | |
| CREATE TABLE slide(
 | |
|   pageNumber INTEGER,   -- The slide page number
 | |
|   slideContent TEXT     -- Slide content as XML or JSON
 | |
| );
 | |
| CREATE INDEX slide_pgnum ON slide(pageNumber); -- Optional
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>The content of each slide could still be stored as compressed XML.
 | |
| But now each page is stored separately.  So when opening a new document,
 | |
| the application could simply run:
 | |
| 
 | |
| <blockquote><pre>
 | |
| SELECT slideContent FROM slide WHERE pageNumber=1;
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>This query will quickly and efficiently return the content of the first
 | |
| slide, which could then be speedily parsed and displayed to the user.
 | |
| Only one page needs to be read and parsed in order render the first screen,
 | |
| which means that the first screen appears much faster and
 | |
| there is no longer a need for an annoying progress bar.
 | |
| 
 | |
| <p>If the application wanted
 | |
| to keep all content in memory, it could continue reading and parsing the
 | |
| other pages using a background thread after drawing the first page.  Or,
 | |
| since reading from SQLite is so efficient, the application might 
 | |
| instead choose to reduce its memory footprint and only keep a single
 | |
| slide in memory at a time.  Or maybe it keeps the current slide and the
 | |
| next slide in memory, to facility rapid transitions to the next slide.
 | |
| 
 | |
| <p>
 | |
| Notice that dividing up the content into smaller pieces using an SQLite
 | |
| table gives flexibility to the implementation.  The application can choose
 | |
| to read all content into memory at startup.  Or it can read just a
 | |
| few pages into memory and keep the rest on disk.  Or it can read just
 | |
| single page into memory at a time.  And different versions of the application
 | |
| can make different choices without having to make any changes to the
 | |
| file format.  Such options are not available when all content is in
 | |
| a single big XML file in a ZIP archive.
 | |
| 
 | |
| <p>
 | |
| Splitting content into smaller pieces also helps File/Save operations
 | |
| to go faster.  Instead of having to write back the content of all pages
 | |
| when doing a File/Save, the application only has to write back those
 | |
| pages that have actually changed.
 | |
| 
 | |
| <p>
 | |
| One minor downside of splitting content into smaller pieces is that
 | |
| compression does not work as well on shorter texts and so the size of
 | |
| the document might increase.  But as the bulk of the document space 
 | |
| is used to store images, a small reduction in the compression efficiency 
 | |
| of the text content will hardly be noticeable, and is a small price 
 | |
| to pay for an improved user experience.
 | |
| 
 | |
| <h2>Third Improvement:  Versioning</h2>
 | |
| 
 | |
| <p>
 | |
| Once one is comfortable with the concept of storing each slide separately,
 | |
| it is a small step to support versioning of the presentation.  Consider
 | |
| the following schema:
 | |
| 
 | |
| <blockquote><pre>
 | |
| CREATE TABLE slide(
 | |
|   slideId INTEGER PRIMARY KEY,
 | |
|   derivedFrom INTEGER REFERENCES slide,
 | |
|   content TEXT     -- XML or JSON or whatever
 | |
| );
 | |
| CREATE TABLE version(
 | |
|   versionId INTEGER PRIMARY KEY,
 | |
|   priorVersion INTEGER REFERENCES version,
 | |
|   checkinTime DATETIME,   -- When this version was saved
 | |
|   comment TEXT,           -- Description of this version
 | |
|   manifest TEXT           -- List of integer slideIds
 | |
| );
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>
 | |
| In this schema, instead of each slide having a page number that determines
 | |
| its order within the presentation, each slide has a unique
 | |
| integer identifier that is unrelated to where it occurs in sequence.
 | |
| The order of slides in the presentation is determined by a list of
 | |
| slideIds, stored as a text string in the MANIFEST column of the VERSION
 | |
| table.
 | |
| Since multiple entries are allowed in the VERSION table, that means that
 | |
| multiple presentations can be stored in the same document.
 | |
| 
 | |
| <p>
 | |
| On startup, the application first decides which version it
 | |
| wants to display.  Since the versionId will naturally increase in time
 | |
| and one would normally want to see the latest version, an appropriate
 | |
| query might be:
 | |
| 
 | |
| <blockquote><pre>
 | |
| SELECT manifest, versionId FROM version ORDER BY versionId DESC LIMIT 1;
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>
 | |
| Or perhaps the application would rather use the
 | |
| most recent checkinTime:
 | |
| 
 | |
| <blockquote><pre>
 | |
| SELECT manifest, versionId, max(checkinTime) FROM version;
 | |
| </pre></blockquote>
 | |
| 
 | |
| <p>
 | |
| Using a single query such as the above, the application obtains a list
 | |
| of the slideIds for all slides in the presentation.  The application then
 | |
| queries for the content of the first slide, and parses and displays that
 | |
| content, as before.
 | |
| 
 | |
| <p>(Aside:  Yes, that second query above that uses "max(checkinTime)"
 | |
| really does work and really does return a well-defined answer in SQLite.
 | |
| Such a query either returns an undefined answer or generates an error
 | |
| in many other SQL database engines, but in SQLite it does what you would 
 | |
| expect: it returns the manifest and versionId of the entry that has the
 | |
| maximum checkinTime.)
 | |
| 
 | |
| <p>When the user does a "File/Save", instead of overwriting the modified
 | |
| slides, the application can now make new entries in the SLIDE table for
 | |
| just those slides that have been added or altered.  Then it creates a
 | |
| new entry in the VERSION table containing the revised manifest.
 | |
| 
 | |
| <p>The VERSION table shown above has columns to record a check-in comment
 | |
| (presumably supplied by the user) and the time and date at which the File/Save
 | |
| action occurred.  It also records the parent version to record the history
 | |
| of changes.  Perhaps the manifest could be stored as a delta from the
 | |
| parent version, though typically the manifest will be small enough that
 | |
| storing a delta might be more trouble than it is worth.  The SLIDE table
 | |
| also contains a derivedFrom column which could be used for delta encoding
 | |
| if it is determined that saving the slide content as a delta from its
 | |
| previous version is a worthwhile optimization.
 | |
| 
 | |
| <p>So with this simple change, the ODP file now stores not just the most
 | |
| recent edit to the presentation, but a history of all historic edits.  The
 | |
| user would normally want to see just the most recent edition of the
 | |
| presentation, but if desired, the user can now go backwards in time to 
 | |
| see historical versions of the same presentation.
 | |
| 
 | |
| <p>Or, multiple presentations could be stored within the same document.
 | |
| 
 | |
| <p>With such a schema, the application would no longer need to make
 | |
| periodic backups of the unsaved changes to a separate file to avoid lost
 | |
| work in the event of a crash.  Instead, a special "pending" version could
 | |
| be allocated and unsaved changes could be written into the pending version.
 | |
| Because only changes would need to be written, not the entire document,
 | |
| saving the pending changes would only involve writing a few kilobytes of
 | |
| content, not multiple megabytes, and would take milliseconds instead of
 | |
| seconds, and so it could be done frequently and silently in the background.
 | |
| Then when a crash occurs and the user reboots, all (or almost all)
 | |
| of their work is retained.  If the user decides to discard unsaved changes, 
 | |
| they simply go back to the previous version.
 | |
| 
 | |
| <p>
 | |
| There are details to fill in here.
 | |
| Perhaps a screen can be provided that displays a history changes
 | |
| (perhaps with a graph) allowing the user to select which version they
 | |
| want to view or edit.  Perhaps some facility can be provided to merge
 | |
| forks that might occur in the version history.  And perhaps the
 | |
| application should provide a means to purge old and unwanted versions.
 | |
| The key point is that using an SQLite database to store the content,
 | |
| rather than a ZIP archive, makes all of these features much, much easier
 | |
| to implement, which increases the possibility that they will eventually
 | |
| get implemented.
 | |
| 
 | |
| <h2>And So Forth...</h2>
 | |
| 
 | |
| <p>
 | |
| In the previous sections, we have seen how moving from a key/value
 | |
| store implemented as a ZIP archive to a simple SQLite database
 | |
| with just three tables can add significant capabilities to an application
 | |
| file format.
 | |
| We could continue to enhance the schema with new tables, with indexes
 | |
| added for performance, with triggers and views for programming convenience,
 | |
| and constraints to enforce consistency of content even in the face of
 | |
| programming errors.  Further enhancement ideas include:
 | |
| <ul>
 | |
| <li> Store an <a href="undoredo.html">automated undo/redo stack</a> in a database table so that
 | |
|      Undo could go back into prior edit sessions.
 | |
| <li> Add <a href="fts3.html#fts4">full text search</a> capabilities to the slide deck, or across
 | |
|      multiple slide decks.
 | |
| <li> Decompose the "settings.xml" file into an SQL table that
 | |
|      is more easily viewed and edited by separate applications.
 | |
| <li> Break out the "Presentor Notes" from each slide into a separate
 | |
|      table, for easier access from third-party applications and/or scripts.
 | |
| <li> Enhance the presentation concept beyond the simple linear sequence of
 | |
|      slides to allow for side-tracks and excursions to be taken depending on
 | |
|      how the audience is responding.
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| An SQLite database has a lot of capability, which
 | |
| this essay has only begun to touch upon.  But hopefully this quick glimpse
 | |
| has convinced some readers that using an SQL database as an application
 | |
| file format is worth a second look.
 | |
| 
 | |
| <p>
 | |
| Some readers might resist using SQLite as an application
 | |
| file format due to prior exposure to enterprise SQL databases and
 | |
| the caveats and limitations of those other systems.  
 | |
| For example, many enterprise database
 | |
| engines advise against storing large strings or BLOBs in the database
 | |
| and instead suggest that large strings and BLOBs be stored as separate
 | |
| files and the filename stored in the database.  But SQLite 
 | |
| is not like that.  Any column of an SQLite database can hold
 | |
| a string or BLOB up to about a gigabyte in size.  And for strings and
 | |
| BLOBs of 100 kilobytes or less, 
 | |
| <a href="intern-v-extern-blob.html">I/O performance is better</a> than using separate
 | |
| files.
 | |
| 
 | |
| <p>
 | |
| Some readers might be reluctant to consider SQLite as an application
 | |
| file format because they have been inculcated with the idea that all
 | |
| SQL database schemas must be factored into third normal form and store
 | |
| only small primitive data types such as strings and integers.  Certainly
 | |
| relational theory is important and designers should strive to understand
 | |
| it.  But, as demonstrated above, it is often quite acceptable to store
 | |
| complex information as XML or JSON in text fields of a database.
 | |
| Do what works, not what your database professor said you ought to do.
 | |
| 
 | |
| <h2>Review Of The Benefits Of Using SQLite</h2>
 | |
| 
 | |
| <p>
 | |
| In summary,
 | |
| the claim of this essay is that using SQLite as a container for an application
 | |
| file format like OpenDocument
 | |
| and storing lots of smaller objects in that container
 | |
| works out much better than using a ZIP archive holding a few larger objects.
 | |
| To wit:
 | |
| 
 | |
| <ol>
 | |
| <li><p>
 | |
| An SQLite database file is approximately the same size, and in some cases
 | |
| smaller, than a ZIP archive holding the same information.
 | |
| 
 | |
| <li><p>
 | |
| The <a href="atomiccommit.html">atomic update capabilities</a>
 | |
| of SQLite allow small incremental changes
 | |
| to be safely written into the document.  This reduces total disk I/O
 | |
| and improves File/Save performance, enhancing the user experience.
 | |
| 
 | |
| <li><p>
 | |
| Startup time is reduced by allowing the application to read in only the
 | |
| content shown for the initial screen.  This largely eliminates the
 | |
| need to show a progress bar when opening a new document.  The document
 | |
| just pops up immediately, further enhancing the user experience.
 | |
| 
 | |
| <li><p>
 | |
| The memory footprint of the application can be dramatically reduced by
 | |
| only loading content that is relevant to the current display and keeping
 | |
| the bulk of the content on disk.  The fast query capability of SQLite
 | |
| make this a viable alternative to keeping all content in memory at all times.
 | |
| And when applications use less memory, it makes the entire computer more
 | |
| responsive, further enhancing the user experience.
 | |
| 
 | |
| <li><p>
 | |
| The schema of an SQL database is able to represent information more directly
 | |
| and succinctly than a key/value database such as a ZIP archive.  This makes
 | |
| the document content more accessible to third-party applications and scripts
 | |
| and facilitates advanced features such as built-in document versioning, and
 | |
| incremental saving of work in progress for recovery after a crash.
 | |
| </ol>
 | |
| 
 | |
| <p>
 | |
| These are just a few of the benefits of using SQLite as an application file
 | |
| format — the benefits that seem most likely to improve the user
 | |
| experience for applications like OpenOffice.  Other applications might
 | |
| benefit from SQLite in different ways. See the <a href="appfileformat.html">Application File Format</a>
 | |
| document for additional ideas.
 | |
| 
 | |
| <p>
 | |
| Finally, let us reiterate that this essay is a thought experiment.
 | |
| The OpenDocument format is well-established and already well-designed.
 | |
| Nobody really believes that OpenDocument should be changed to use SQLite
 | |
| as its container instead of ZIP.  Nor is this article a criticism of
 | |
| OpenDocument for not choosing SQLite as its container since OpenDocument
 | |
| predates SQLite.  Rather, the point of this article is to use OpenDocument
 | |
| as a concrete example of how SQLite can be used to build better 
 | |
| application file formats for future projects.
 | |
| 
 |