673 lines
		
	
	
		
			29 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			673 lines
		
	
	
		
			29 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<!DOCTYPE html>
 | 
						|
<html><head>
 | 
						|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
 | 
						|
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
 | 
						|
<link href="sqlite.css" rel="stylesheet">
 | 
						|
<title>File Locking And Concurrency In SQLite Version 3</title>
 | 
						|
<!-- path= -->
 | 
						|
</head>
 | 
						|
<body>
 | 
						|
<div class=nosearch>
 | 
						|
<a href="index.html">
 | 
						|
<img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
 | 
						|
</a>
 | 
						|
<div><!-- IE hack to prevent disappearing logo --></div>
 | 
						|
<div class="tagline desktoponly">
 | 
						|
Small. Fast. Reliable.<br>Choose any three.
 | 
						|
</div>
 | 
						|
<div class="menu mainmenu">
 | 
						|
<ul>
 | 
						|
<li><a href="index.html">Home</a>
 | 
						|
<li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
 | 
						|
<li class='wideonly'><a href='about.html'>About</a>
 | 
						|
<li class='desktoponly'><a href="docs.html">Documentation</a>
 | 
						|
<li class='desktoponly'><a href="download.html">Download</a>
 | 
						|
<li class='wideonly'><a href='copyright.html'>License</a>
 | 
						|
<li class='desktoponly'><a href="support.html">Support</a>
 | 
						|
<li class='desktoponly'><a href="prosupport.html">Purchase</a>
 | 
						|
<li class='search' id='search_menubutton'>
 | 
						|
<a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
 | 
						|
</ul>
 | 
						|
</div>
 | 
						|
<div class="menu submenu" id="submenu">
 | 
						|
<ul>
 | 
						|
<li><a href='about.html'>About</a>
 | 
						|
<li><a href='docs.html'>Documentation</a>
 | 
						|
<li><a href='download.html'>Download</a>
 | 
						|
<li><a href='support.html'>Support</a>
 | 
						|
<li><a href='prosupport.html'>Purchase</a>
 | 
						|
</ul>
 | 
						|
</div>
 | 
						|
<div class="searchmenu" id="searchmenu">
 | 
						|
<form method="GET" action="search">
 | 
						|
<select name="s" id="searchtype">
 | 
						|
<option value="d">Search Documentation</option>
 | 
						|
<option value="c">Search Changelog</option>
 | 
						|
</select>
 | 
						|
<input type="text" name="q" id="searchbox" value="">
 | 
						|
<input type="submit" value="Go">
 | 
						|
</form>
 | 
						|
</div>
 | 
						|
</div>
 | 
						|
<script>
 | 
						|
function toggle_div(nm) {
 | 
						|
var w = document.getElementById(nm);
 | 
						|
if( w.style.display=="block" ){
 | 
						|
w.style.display = "none";
 | 
						|
}else{
 | 
						|
w.style.display = "block";
 | 
						|
}
 | 
						|
}
 | 
						|
function toggle_search() {
 | 
						|
var w = document.getElementById("searchmenu");
 | 
						|
if( w.style.display=="block" ){
 | 
						|
w.style.display = "none";
 | 
						|
} else {
 | 
						|
w.style.display = "block";
 | 
						|
setTimeout(function(){
 | 
						|
document.getElementById("searchbox").focus()
 | 
						|
}, 30);
 | 
						|
}
 | 
						|
}
 | 
						|
function div_off(nm){document.getElementById(nm).style.display="none";}
 | 
						|
window.onbeforeunload = function(e){div_off("submenu");}
 | 
						|
/* Disable the Search feature if we are not operating from CGI, since */
 | 
						|
/* Search is accomplished using CGI and will not work without it. */
 | 
						|
if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
 | 
						|
document.getElementById("search_menubutton").style.display = "none";
 | 
						|
}
 | 
						|
/* Used by the Hide/Show button beside syntax diagrams, to toggle the */
 | 
						|
function hideorshow(btn,obj){
 | 
						|
var x = document.getElementById(obj);
 | 
						|
var b = document.getElementById(btn);
 | 
						|
if( x.style.display!='none' ){
 | 
						|
x.style.display = 'none';
 | 
						|
b.innerHTML='show';
 | 
						|
}else{
 | 
						|
x.style.display = '';
 | 
						|
b.innerHTML='hide';
 | 
						|
}
 | 
						|
return false;
 | 
						|
}
 | 
						|
</script>
 | 
						|
</div>
 | 
						|
 | 
						|
 | 
						|
<font color="#7f001f"><p>
 | 
						|
<p>This document was originally created in early 2004 when SQLite version 2
 | 
						|
was still in widespread use and was written to introduce
 | 
						|
the new concepts of SQLite version 3 to readers who were already familiar
 | 
						|
with SQLite version 2.  But these days, most readers of this document have
 | 
						|
probably never seen SQLite version 2 and are only familiar with SQLite
 | 
						|
version 3.  Nevertheless, this document continues to serve as an
 | 
						|
authoritative reference to how database file locking works in SQLite 
 | 
						|
version 3.</p>
 | 
						|
 | 
						|
<p>The document only describes locking for the older rollback-mode
 | 
						|
transaction mechanism.  Locking for the newer <a href="wal.html">write-ahead log</a> or <a href="wal.html">WAL mode</a>
 | 
						|
is described separately.</p>
 | 
						|
</font>
 | 
						|
 | 
						|
<h2>1.0 File Locking And Concurrency In SQLite Version 3</h2>
 | 
						|
 | 
						|
<p>SQLite <a href="releaselog/3_0_0.html">Version 3.0.0</a> introduced a new locking and journaling 
 | 
						|
mechanism designed to improve concurrency over SQLite version 2
 | 
						|
and to reduce the writer starvation 
 | 
						|
problem.  The new mechanism also allows atomic commits of transactions
 | 
						|
involving multiple database files.
 | 
						|
This document describes the new locking mechanism.
 | 
						|
The intended audience is programmers who want to understand and/or modify
 | 
						|
the pager code and reviewers working to verify the design
 | 
						|
of SQLite version 3.
 | 
						|
</p>
 | 
						|
 | 
						|
<a name="overview"></a>
 | 
						|
<h2>2.0 Overview</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
Locking and concurrency control are handled by the 
 | 
						|
<a href="http://www.sqlite.org/src/finfo?name=src/pager.c">
 | 
						|
pager module</a>.
 | 
						|
The pager module is responsible for making SQLite "ACID" (Atomic,
 | 
						|
Consistent, Isolated, and Durable).  The pager module makes sure changes
 | 
						|
happen all at once, that either all changes occur or none of them do,
 | 
						|
that two or more processes do not try to access the database
 | 
						|
in incompatible ways at the same time, and that once changes have been
 | 
						|
written they persist until explicitly deleted.  The pager also provides
 | 
						|
a memory cache of some of the contents of the disk file.</p>
 | 
						|
 | 
						|
<p>The pager is unconcerned
 | 
						|
with the details of B-Trees, text encodings, indices, and so forth.
 | 
						|
From the point of view of the pager the database consists of
 | 
						|
a single file of uniform-sized blocks.  Each block is called a
 | 
						|
"page" and is usually 1024 bytes in size.   The pages are numbered
 | 
						|
beginning with 1.  So the first 1024 bytes of the database are called
 | 
						|
"page 1" and the second 1024 bytes are call "page 2" and so forth. All 
 | 
						|
other encoding details are handled by higher layers of the library.  
 | 
						|
The pager communicates with the operating system using one of several
 | 
						|
modules 
 | 
						|
(Examples:
 | 
						|
<a href="http://www.sqlite.org/src/finfo?name=src/os_unix.c">
 | 
						|
os_unix.c</a>,
 | 
						|
<a href="http://www.sqlite.org/src/finfo?name=src/os_win.c">
 | 
						|
os_win.c</a>)
 | 
						|
that provides a uniform abstraction for operating system services.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>The pager module effectively controls access for separate threads, or
 | 
						|
separate processes, or both.  Throughout this document whenever the
 | 
						|
word "process" is written you may substitute the word "thread" without
 | 
						|
changing the truth of the statement.</p>
 | 
						|
 | 
						|
<a name="locking"></a>
 | 
						|
<h2>3.0 Locking</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
From the point of view of a single process, a database file
 | 
						|
can be in one of five locking states:
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
<table cellpadding="20">
 | 
						|
<tr><td valign="top">UNLOCKED</td>
 | 
						|
<td valign="top">
 | 
						|
No locks are held on the database.  The database may be neither read nor
 | 
						|
written.  Any internally cached data is considered suspect and subject to
 | 
						|
verification against the database file before being used.  Other 
 | 
						|
processes can read or write the database as their own locking states
 | 
						|
permit.  This is the default state.
 | 
						|
</td></tr>
 | 
						|
 | 
						|
<tr><td valign="top">SHARED</td>
 | 
						|
<td valign="top">
 | 
						|
<a name="shared_lock"></a>
 | 
						|
 | 
						|
The database may be read but not written.  Any number of 
 | 
						|
processes can hold SHARED locks at the same time, hence there can be
 | 
						|
many simultaneous readers.  But no other thread or process is allowed
 | 
						|
to write to the database file while one or more SHARED locks are active.
 | 
						|
</td></tr>
 | 
						|
 | 
						|
<tr><td valign="top">RESERVED</td>
 | 
						|
<td valign="top">
 | 
						|
<a name="reserved_lock"></a>
 | 
						|
 | 
						|
A RESERVED lock means that the process is planning on writing to the
 | 
						|
database file at some point in the future but that it is currently just
 | 
						|
reading from the file.  Only a single RESERVED lock may be active at one
 | 
						|
time, though multiple SHARED locks can coexist with a single RESERVED lock.
 | 
						|
RESERVED differs from PENDING in that new SHARED locks can be acquired
 | 
						|
while there is a RESERVED lock.
 | 
						|
</td></tr>
 | 
						|
 | 
						|
<tr><td valign="top">PENDING</td>
 | 
						|
<td valign="top">
 | 
						|
<a name="pending_lock"></a>
 | 
						|
 | 
						|
A PENDING lock means that the process holding the lock wants to write
 | 
						|
to the database as soon as possible and is just waiting on all current
 | 
						|
SHARED locks to clear so that it can get an EXCLUSIVE lock.  No new 
 | 
						|
SHARED locks are permitted against the database if
 | 
						|
a PENDING lock is active, though existing SHARED locks are allowed to
 | 
						|
continue.
 | 
						|
</td></tr>
 | 
						|
 | 
						|
<tr><td valign="top">EXCLUSIVE</td>
 | 
						|
<td valign="top">
 | 
						|
<a name="excl_lock"></a>
 | 
						|
 | 
						|
An EXCLUSIVE lock is needed in order to write to the database file.
 | 
						|
Only one EXCLUSIVE lock is allowed on the file and no other locks of
 | 
						|
any kind are allowed to coexist with an EXCLUSIVE lock.  In order to
 | 
						|
maximize concurrency, SQLite works to minimize the amount of time that
 | 
						|
EXCLUSIVE locks are held.
 | 
						|
</td></tr>
 | 
						|
</table>
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
The operating system interface layer understands and tracks all five
 | 
						|
locking states described above.  
 | 
						|
The pager module only tracks four of the five locking states.
 | 
						|
A PENDING lock is always just a temporary
 | 
						|
stepping stone on the path to an EXCLUSIVE lock and so the pager module
 | 
						|
does not track PENDING locks.
 | 
						|
</p>
 | 
						|
 | 
						|
<a name="rollback"></a>
 | 
						|
<h2>4.0 The Rollback Journal</h2>
 | 
						|
 | 
						|
<p>When a process wants to change a database file (and it is not
 | 
						|
in <a href="wal.html">WAL</a> mode), it
 | 
						|
first records the original unchanged database content
 | 
						|
in a <em>rollback journal</em>.  The rollback journal is an ordinary
 | 
						|
disk file that is always located
 | 
						|
in the same directory or folder as the database file and has the
 | 
						|
same name as the database file with the addition of a <tt>-journal</tt>
 | 
						|
suffix. The rollback journal also records the initial
 | 
						|
size of the database so that if the database file grows it can be truncated
 | 
						|
back to its original size on a rollback.</p>
 | 
						|
 | 
						|
<p>If SQLite is working with multiple databases at the same time
 | 
						|
(using the <a href="lang_attach.html">ATTACH</a> command) then each database has its own rollback journal.
 | 
						|
But there is also a separate aggregate journal
 | 
						|
called the <em>super-journal</em>.
 | 
						|
The super-journal does not contain page data used for rolling back
 | 
						|
changes.  Instead the super-journal contains the names of the
 | 
						|
individual database rollback journals for each of the ATTACHed databases. 
 | 
						|
Each of the individual database rollback journals also contain the name 
 | 
						|
of the super-journal.
 | 
						|
If there are no ATTACHed databases (or if none of the ATTACHed database
 | 
						|
is participating in the current transaction) no super-journal is
 | 
						|
created and the normal rollback journal contains an empty string
 | 
						|
in the place normally reserved for recording the name of the
 | 
						|
super-journal.</p>
 | 
						|
 | 
						|
<p>A rollback journal is said to be <a href="fileformat2.html#hotjrnl">hot</a>
 | 
						|
if it needs to be rolled back
 | 
						|
in order to restore the integrity of its database.  
 | 
						|
A hot journal is created when a process is in the middle of a database
 | 
						|
update and a program or operating system crash or power failure prevents 
 | 
						|
the update from completing.
 | 
						|
Hot journals are an exception condition. 
 | 
						|
Hot journals exist to recover from crashes and power failures.
 | 
						|
If everything is working correctly 
 | 
						|
(that is, if there are no crashes or power failures)
 | 
						|
you will never get a hot journal.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
If no super-journal is involved, then
 | 
						|
a journal is hot if it exists and has a non-zero header
 | 
						|
and its corresponding database file
 | 
						|
does not have a RESERVED lock.
 | 
						|
If a super-journal is named in the file journal, then the file journal
 | 
						|
is hot if its super-journal exists and there is no RESERVED
 | 
						|
lock on the corresponding database file.
 | 
						|
It is important to understand when a journal is hot so the
 | 
						|
preceding rules will be repeated in bullets:
 | 
						|
</p>
 | 
						|
 | 
						|
<ul>
 | 
						|
<li>A journal is hot if...
 | 
						|
    <ul>
 | 
						|
    <li>It exists, and</li>
 | 
						|
    <li>Its size is greater than 512 bytes, and</li>
 | 
						|
    <li>The journal header is non-zero and well-formed, and</li>
 | 
						|
    <li>Its super-journal exists or the super-journal name is an
 | 
						|
        empty string, and</li>
 | 
						|
    <li>There is no RESERVED lock on the corresponding database file.</li>
 | 
						|
    </ul>
 | 
						|
</li>
 | 
						|
</ul>
 | 
						|
 | 
						|
<a name="hot_journals"></a>
 | 
						|
<h3>4.1 Dealing with hot journals</h3>
 | 
						|
 | 
						|
<p>
 | 
						|
Before reading from a database file, SQLite always checks to see if that
 | 
						|
database file has a hot journal.  If the file does have a hot journal, then
 | 
						|
the journal is rolled back before the file is read.  In this way, we ensure
 | 
						|
that the database file is in a consistent state before it is read.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>When a process wants to read from a database file, it followed
 | 
						|
the following sequence of steps:
 | 
						|
</p>
 | 
						|
 | 
						|
<ol>
 | 
						|
<li>Open the database file and obtain a SHARED lock.  If the SHARED lock
 | 
						|
    cannot be obtained, fail immediately and return SQLITE_BUSY.</li>
 | 
						|
<li>Check to see if the database file has a hot journal.   If the file
 | 
						|
    does not have a hot journal, we are done.  Return immediately.
 | 
						|
    If there is a hot journal, that journal must be rolled back by
 | 
						|
    the subsequent steps of this algorithm.</li>
 | 
						|
<li>Acquire a PENDING lock then an EXCLUSIVE lock on the database file.
 | 
						|
    (Note: Do not acquire a RESERVED lock because that would make
 | 
						|
    other processes think the journal was no longer hot.)  If we
 | 
						|
    fail to acquire these locks it means another process
 | 
						|
    is already trying to do the rollback.  In that case,
 | 
						|
    drop all locks, close the database, and return SQLITE_BUSY. </li>
 | 
						|
<li>Read the journal file and roll back the changes.</li>
 | 
						|
<li>Wait for the rolled back changes to be written onto 
 | 
						|
    persistent storage.  This protects the integrity of the database
 | 
						|
    in case another power failure or crash occurs.</li>
 | 
						|
<li>Delete the journal file (or truncate the journal to zero bytes in
 | 
						|
    length if <a href="pragma.html#pragma_journal_mode">PRAGMA journal_mode=TRUNCATE</a> is
 | 
						|
    set, or zero the journal header if
 | 
						|
    <a href="pragma.html#pragma_journal_mode">PRAGMA journal_mode=PERSIST</a> is set).</li>
 | 
						|
<li>Delete the super-journal file if it is safe to do so.
 | 
						|
    This step is optional.  It is here only to prevent stale
 | 
						|
    super-journals from cluttering up the disk drive.
 | 
						|
    See the discussion below for details.</li>
 | 
						|
<li>Drop the EXCLUSIVE and PENDING locks but retain the SHARED lock.</li>
 | 
						|
</ol>
 | 
						|
 | 
						|
<p>After the algorithm above completes successfully, it is safe to 
 | 
						|
read from the database file.  Once all reading has completed, the
 | 
						|
SHARED lock is dropped.</p>
 | 
						|
 | 
						|
<a name="stale_super_journals"></a>
 | 
						|
<h3>4.2 Deleting stale super-journals</h3>
 | 
						|
 | 
						|
<p>A stale super-journal is a super-journal that is no longer being
 | 
						|
used for anything.  There is no requirement that stale super-journals
 | 
						|
be deleted.  The only reason for doing so is to free up disk space.</p>
 | 
						|
 | 
						|
<p>A super-journal is stale if no individual file journals are pointing
 | 
						|
to it.  To figure out if a super-journal is stale, we first read the
 | 
						|
super-journal to obtain the names of all of its file journals.  Then
 | 
						|
we check each of those file journals.  If any of the file journals named
 | 
						|
in the super-journal exists and points back to the super-journal, then
 | 
						|
the super-journal is not stale.  If all file journals are either missing
 | 
						|
or refer to other super-journals or no super-journal at all, then the
 | 
						|
super-journal we are testing is stale and can be safely deleted.</p>
 | 
						|
 | 
						|
<a name="writing"></a>
 | 
						|
<h2>5.0 Writing to a database file</h2>
 | 
						|
 | 
						|
<p>To write to a database, a process must first acquire a SHARED lock
 | 
						|
as described above (possibly rolling back incomplete changes if there
 | 
						|
is a hot journal). 
 | 
						|
After a SHARED lock is obtained, a RESERVED lock must be acquired.
 | 
						|
The RESERVED lock signals that the process intends to write to the
 | 
						|
database at some point in the future.  Only one process at a time
 | 
						|
can hold a RESERVED lock.  But other processes can continue to read
 | 
						|
the database while the RESERVED lock is held.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>If the process that wants to write is unable to obtain a RESERVED
 | 
						|
lock, it must mean that another process already has a RESERVED lock.
 | 
						|
In that case, the write attempt fails and returns SQLITE_BUSY.</p>
 | 
						|
 | 
						|
<p>After obtaining a RESERVED lock, the process that wants to write
 | 
						|
creates a rollback journal.  The header of the journal is initialized
 | 
						|
with the original size of the database file.  Space in the journal header
 | 
						|
is also reserved for a super-journal name, though the super-journal
 | 
						|
name is initially empty.</p>
 | 
						|
 | 
						|
<p>Before making changes to any page of the database, the process writes
 | 
						|
the original content of that page into the rollback journal.  Changes
 | 
						|
to pages are held in memory at first and are not written to the disk.
 | 
						|
The original database file remains unaltered, which means that other
 | 
						|
processes can continue to read the database.</p>
 | 
						|
 | 
						|
<p>Eventually, the writing process will want to update the database
 | 
						|
file, either because its memory cache has filled up or because it is
 | 
						|
ready to commit its changes.  Before this happens, the writer must
 | 
						|
make sure no other process is reading the database and that the rollback
 | 
						|
journal data is safely on the disk surface so that it can be used to
 | 
						|
rollback incomplete changes in the event of a power failure.
 | 
						|
The steps are as follows:</p>
 | 
						|
 | 
						|
<ol>
 | 
						|
<li>Make sure all rollback journal data has actually been written to
 | 
						|
    the surface of the disk (and is not just being held in the operating
 | 
						|
    system's  or disk controllers cache) so that if a power failure occurs
 | 
						|
    the data will still be there after power is restored.</li>
 | 
						|
<li>Obtain a PENDING lock and then an EXCLUSIVE lock on the database file.
 | 
						|
    If other processes still have SHARED locks, the writer might have
 | 
						|
    to wait until those SHARED locks clear before it is able to obtain
 | 
						|
    an EXCLUSIVE lock.</li>
 | 
						|
<li>Write all page modifications currently held in memory out to the
 | 
						|
    original database disk file.</li>
 | 
						|
</ol>
 | 
						|
 | 
						|
<p>
 | 
						|
If the reason for writing to the database file is because the memory
 | 
						|
cache was full, then the writer will not commit right away.  Instead,
 | 
						|
the writer might continue to make changes to other pages.  Before 
 | 
						|
subsequent changes are written to the database file, the rollback
 | 
						|
journal must be flushed to disk again.  Note also that the EXCLUSIVE
 | 
						|
lock that the writer obtained in order to write to the database initially
 | 
						|
must be held until all changes are committed.  That means that no other
 | 
						|
processes are able to access the database from the
 | 
						|
time the memory cache first spills to disk until the transaction
 | 
						|
commits.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
When a writer is ready to commit its changes, it executes the following
 | 
						|
steps:
 | 
						|
</p>
 | 
						|
 | 
						|
<ol>
 | 
						|
<li value="4">
 | 
						|
   Obtain an EXCLUSIVE lock on the database file and
 | 
						|
   make sure all memory changes have been written to the database file
 | 
						|
   using the algorithm of steps 1-3 above.</li>
 | 
						|
<li>Flush all database file changes to the disk.  Wait for those changes
 | 
						|
    to actually be written onto the disk surface.</li>
 | 
						|
<li>Delete the journal file.  (Or if the <a href="pragma.html#pragma_journal_mode">PRAGMA journal_mode</a> is TRUNCATE or
 | 
						|
    PERSIST, truncate the journal file or zero the header of the journal file,
 | 
						|
    respectively.)  This is the instant when the changes are
 | 
						|
    committed.  Prior to deleting the journal file, if a power failure
 | 
						|
    or crash occurs, the next process to open the database will see that
 | 
						|
    it has a hot journal and will roll the changes back.
 | 
						|
    After the journal is deleted, there will no longer be a hot journal
 | 
						|
    and the changes will persist.
 | 
						|
    </li>
 | 
						|
<li>Drop the EXCLUSIVE and PENDING locks from the database file.
 | 
						|
    </li>
 | 
						|
</ol>
 | 
						|
 | 
						|
<p>As soon as the PENDING lock is released from the database file, other
 | 
						|
processes can begin reading the database again.  In the current implementation,
 | 
						|
the RESERVED lock is also released, but that is not essential for
 | 
						|
correct operation.</p>
 | 
						|
 | 
						|
<p>If a transaction involves multiple databases, then a more complex
 | 
						|
commit sequence is used, as follows:</p>
 | 
						|
 | 
						|
<ol>
 | 
						|
<li value="4">
 | 
						|
   Make sure all individual database files have an EXCLUSIVE lock and a
 | 
						|
   valid journal.
 | 
						|
<li>Create a super-journal.  The name of the super-journal is arbitrary.
 | 
						|
    (The current implementation appends random suffixes to the name of the
 | 
						|
    main database file until it finds a name that does not previously exist.)
 | 
						|
    Fill the super-journal with the names of all the individual journals
 | 
						|
    and flush its contents to disk.
 | 
						|
<li>Write the name of the super-journal into
 | 
						|
    all individual journals (in space set aside for that purpose in the
 | 
						|
    headers of the individual journals) and flush the contents of the
 | 
						|
    individual journals to disk and wait for those changes to reach the
 | 
						|
    disk surface.
 | 
						|
<li>Flush all database file changes to the disk.  Wait for those changes
 | 
						|
    to actually be written onto the disk surface.</li>
 | 
						|
<li>Delete the super-journal file.  This is the instant when the changes are
 | 
						|
    committed.  Prior to deleting the super-journal file, if a power failure
 | 
						|
    or crash occurs, the individual file journals will be considered hot
 | 
						|
    and will be rolled back by the next process that
 | 
						|
    attempts to read them.  After the super-journal has been deleted,
 | 
						|
    the file journals will no longer be considered hot and the changes
 | 
						|
    will persist.
 | 
						|
    </li>
 | 
						|
<li>Delete all individual journal files.
 | 
						|
<li>Drop the EXCLUSIVE and PENDING locks from all database files.
 | 
						|
    </li>
 | 
						|
</ol>
 | 
						|
 | 
						|
<a name="writer_starvation"></a>
 | 
						|
<h3>5.1 Writer starvation</h3>
 | 
						|
 | 
						|
<p>In SQLite version 2, if many processes are reading from the database,
 | 
						|
it might be the case that there is never a time when there are
 | 
						|
no active readers.  And if there is always at least one read lock on the
 | 
						|
database, no process would ever be able to make changes to the database
 | 
						|
because it would be impossible to acquire a write lock.  This situation
 | 
						|
is called <em>writer starvation</em>.</p>
 | 
						|
 | 
						|
<p>SQLite version 3 seeks to avoid writer starvation through the use of
 | 
						|
the PENDING lock.  The PENDING lock allows existing readers to continue
 | 
						|
but prevents new readers from connecting to the database.  So when a
 | 
						|
process wants to write a busy database, it can set a PENDING lock which
 | 
						|
will prevent new readers from coming in.  Assuming existing readers do
 | 
						|
eventually complete, all SHARED locks will eventually clear and the
 | 
						|
writer will be given a chance to make its changes.</p>
 | 
						|
 | 
						|
<a name="how_to_corrupt"></a>
 | 
						|
<h2>6.0 How To Corrupt Your Database Files</h2>
 | 
						|
 | 
						|
<p>The pager module is very robust but it can be subverted.  
 | 
						|
This section attempts to identify and explain the risks.
 | 
						|
(See also the <a href="atomiccommit.html#sect_9_0">Things That Can Go Wrong</a> section of the article
 | 
						|
on <a href="atomiccommit.html">Atomic Commit</a>.</p>
 | 
						|
 | 
						|
<p>
 | 
						|
Clearly, a hardware or operating system fault that introduces incorrect data
 | 
						|
into the middle of the database file or journal will cause problems.
 | 
						|
Likewise, 
 | 
						|
if a rogue process opens a database file or journal and writes malformed
 | 
						|
data into the middle of it, then the database will become corrupt.
 | 
						|
There is not much that can be done about these kinds of problems
 | 
						|
so they are given no further attention.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
SQLite uses POSIX advisory locks to implement locking on Unix.  On
 | 
						|
Windows it uses the LockFile(), LockFileEx(), and UnlockFile() system
 | 
						|
calls.  SQLite assumes that these system calls all work as advertised.  If
 | 
						|
that is not the case, then database corruption can result.  One should
 | 
						|
note that POSIX advisory locking is known to be buggy or even unimplemented
 | 
						|
on many NFS implementations (including recent versions of Mac OS X)
 | 
						|
and that there are reports of locking problems
 | 
						|
for network filesystems under Windows.  Your best defense is to not
 | 
						|
use SQLite for files on a network filesystem.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
SQLite uses the fsync() system call to flush data to the disk under Unix and
 | 
						|
it uses the FlushFileBuffers() to do the same under Windows.  Once again,
 | 
						|
SQLite assumes that these operating system services function as advertised.
 | 
						|
But it has been reported that fsync() and FlushFileBuffers() do not always
 | 
						|
work correctly, especially with inexpensive IDE disks.  Apparently some
 | 
						|
manufactures of IDE disks have controller chips that report
 | 
						|
that data has reached the disk surface when in fact the data is still
 | 
						|
in volatile cache memory in the disk drive electronics.  There are also
 | 
						|
reports that Windows sometimes chooses to ignore FlushFileBuffers() for
 | 
						|
unspecified reasons.  The author cannot verify any of these reports.
 | 
						|
But if they are true, it means that database corruption is a possibility
 | 
						|
following an unexpected power loss.  These are hardware and/or operating
 | 
						|
system bugs that SQLite is unable to defend against.
 | 
						|
</p>
 | 
						|
 | 
						|
<a name="ext3-barrier-problem"></a>
 | 
						|
 | 
						|
<p>If a Linux <a href="http://en.wikipedia.org/wiki/Ext3">ext3</a>
 | 
						|
filesystem is mounted without the "barrier=1" option
 | 
						|
in the <a href="http://en.wikipedia.org/wiki/fstab">/etc/fstab</a>
 | 
						|
and the disk drive write cache is enabled
 | 
						|
then filesystem corruption can occur following a power loss or OS crash.
 | 
						|
Whether or not corruption can occur depends on the details of the disk control
 | 
						|
hardware; corruption is more likely with inexpensive consumer-grade disks
 | 
						|
and less of a problem for enterprise-class storage devices with advanced
 | 
						|
features such as non-volatile write caches.
 | 
						|
Various ext3 experts
 | 
						|
<a href="http://www.redhat.com/archives/ext3-users/2010-July/msg00001.html">
 | 
						|
confirm this behavior</a>.
 | 
						|
We are told that most Linux distributions do not use barrier=1 and do
 | 
						|
not disable the write cache so most
 | 
						|
Linux distributions are vulnerable to this problem.  Note that this is an
 | 
						|
operating system and hardware issue and that there is nothing that SQLite
 | 
						|
can do to work around it.  
 | 
						|
<a href="http://ozlabs.org/~rusty/index.cgi/tech/2009-10-20.html">
 | 
						|
Other database engines</a> have also run into this same problem.</p>
 | 
						|
 | 
						|
<p>
 | 
						|
If a crash or power failure occurs and results in a hot journal but that
 | 
						|
journal is deleted, the next process to open the database will not
 | 
						|
know that it contains changes that need to be rolled back.  The rollback
 | 
						|
will not occur and the database will be left in an inconsistent state.
 | 
						|
Rollback journals might be deleted for any number of reasons:
 | 
						|
</p>
 | 
						|
 | 
						|
<ul>
 | 
						|
<li>An administrator might be cleaning up after an OS crash or power failure,
 | 
						|
    see the journal file, think it is junk, and delete it.</li>
 | 
						|
<li>Someone (or some process) might rename the database file but fail to
 | 
						|
    also rename its associated journal.</li>
 | 
						|
<li>If the database file has aliases (hard or soft links) and the file
 | 
						|
    is opened by a different alias than the one used to create the journal,
 | 
						|
    then the journal will not be found.  To avoid this problem, you should
 | 
						|
    not create links to SQLite database files.</li>
 | 
						|
<li>Filesystem corruption following a power failure might cause the
 | 
						|
    journal to be renamed or deleted.</li>
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>
 | 
						|
The last (fourth) bullet above merits additional comment.  When SQLite creates
 | 
						|
a journal file on Unix, it opens the directory that contains that file and
 | 
						|
calls fsync() on the directory, in an effort to push the directory information
 | 
						|
to disk.  But suppose some other process is adding or removing unrelated
 | 
						|
files to the directory that contains the database and journal at the
 | 
						|
moment of a power failure.  The supposedly unrelated actions of this other
 | 
						|
process might result in the journal file being dropped from the directory and
 | 
						|
moved into "lost+found".  This is an unlikely scenario, but it could happen.
 | 
						|
The best defenses are to use a journaling filesystem or to keep the
 | 
						|
database and journal in a directory by themselves.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>
 | 
						|
For a commit involving multiple databases and a super-journal, if the
 | 
						|
various databases were on different disk volumes and a power failure occurs
 | 
						|
during the commit, then when the machine comes back up the disks might
 | 
						|
be remounted with different names.  Or some disks might not be mounted
 | 
						|
at all.   When this happens the individual file journals and the
 | 
						|
super-journal might not be able to find each other. The worst outcome from
 | 
						|
this scenario is that the commit ceases to be atomic.  
 | 
						|
Some databases might be rolled back and others might not. 
 | 
						|
All databases will continue to be self-consistent.
 | 
						|
To defend against this problem, keep all databases
 | 
						|
on the same disk volume and/or remount disks using exactly the same names
 | 
						|
after a power failure.
 | 
						|
</p>
 | 
						|
 | 
						|
<a name="transaction_control"></a>
 | 
						|
<h2>7.0 Transaction Control At The SQL Level</h2>
 | 
						|
 | 
						|
<p>
 | 
						|
The changes to locking and concurrency control in SQLite version 3 also
 | 
						|
introduce some subtle changes in the way transactions work at the SQL
 | 
						|
language level.
 | 
						|
By default, SQLite version 3 operates in <em>autocommit</em> mode.
 | 
						|
In autocommit mode,
 | 
						|
all changes to the database are committed as soon as all operations associated
 | 
						|
with the current database connection complete.</p>
 | 
						|
 | 
						|
<p>The SQL command "BEGIN TRANSACTION" (the TRANSACTION keyword
 | 
						|
is optional) is used to take SQLite out of autocommit mode.
 | 
						|
Note that the BEGIN command does not acquire any locks on the database.
 | 
						|
After a BEGIN command, a SHARED lock will be acquired when the first
 | 
						|
SELECT statement is executed.  A RESERVED lock will be acquired when
 | 
						|
the first INSERT, UPDATE, or DELETE statement is executed.  No EXCLUSIVE
 | 
						|
lock is acquired until either the memory cache fills up and must
 | 
						|
be spilled to disk or until the transaction commits.  In this way,
 | 
						|
the system delays blocking read access to the file file until the
 | 
						|
last possible moment.
 | 
						|
</p>
 | 
						|
 | 
						|
<p>The SQL command "COMMIT"  does not actually commit the changes to
 | 
						|
disk.  It just turns autocommit back on.  Then, at the conclusion of
 | 
						|
the command, the regular autocommit logic takes over and causes the
 | 
						|
actual commit to disk to occur.
 | 
						|
The SQL command "ROLLBACK" also operates by turning autocommit back on,
 | 
						|
but it also sets a flag that tells the autocommit logic to rollback rather
 | 
						|
than commit.</p>
 | 
						|
 | 
						|
<p>If the SQL COMMIT command turns autocommit on and the autocommit logic
 | 
						|
then tries to commit change but fails because some other process is holding
 | 
						|
a SHARED lock, then autocommit is turned back off automatically.  This
 | 
						|
allows the user to retry the COMMIT at a later time after the SHARED lock
 | 
						|
has had an opportunity to clear.</p>
 | 
						|
 | 
						|
<p>If multiple commands are being executed against the same SQLite database
 | 
						|
connection at the same time, the autocommit is deferred until the very
 | 
						|
last command completes.  For example, if a SELECT statement is being
 | 
						|
executed, the execution of the command will pause as each row of the
 | 
						|
result is returned.  During this pause other INSERT, UPDATE, or DELETE
 | 
						|
commands can be executed against other tables in the database.  But none
 | 
						|
of these changes will commit until the original SELECT statement finishes.
 | 
						|
</p>
 | 
						|
 |