262 lines
11 KiB
HTML
262 lines
11 KiB
HTML
<!DOCTYPE html>
|
|
<html><head>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
|
|
<link href="sqlite.css" rel="stylesheet">
|
|
<title>Powersafe Overwrite</title>
|
|
<!-- path= -->
|
|
</head>
|
|
<body>
|
|
<div class=nosearch>
|
|
<a href="index.html">
|
|
<img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
|
|
</a>
|
|
<div><!-- IE hack to prevent disappearing logo --></div>
|
|
<div class="tagline desktoponly">
|
|
Small. Fast. Reliable.<br>Choose any three.
|
|
</div>
|
|
<div class="menu mainmenu">
|
|
<ul>
|
|
<li><a href="index.html">Home</a>
|
|
<li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
|
|
<li class='wideonly'><a href='about.html'>About</a>
|
|
<li class='desktoponly'><a href="docs.html">Documentation</a>
|
|
<li class='desktoponly'><a href="download.html">Download</a>
|
|
<li class='wideonly'><a href='copyright.html'>License</a>
|
|
<li class='desktoponly'><a href="support.html">Support</a>
|
|
<li class='desktoponly'><a href="prosupport.html">Purchase</a>
|
|
<li class='search' id='search_menubutton'>
|
|
<a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
|
|
</ul>
|
|
</div>
|
|
<div class="menu submenu" id="submenu">
|
|
<ul>
|
|
<li><a href='about.html'>About</a>
|
|
<li><a href='docs.html'>Documentation</a>
|
|
<li><a href='download.html'>Download</a>
|
|
<li><a href='support.html'>Support</a>
|
|
<li><a href='prosupport.html'>Purchase</a>
|
|
</ul>
|
|
</div>
|
|
<div class="searchmenu" id="searchmenu">
|
|
<form method="GET" action="search">
|
|
<select name="s" id="searchtype">
|
|
<option value="d">Search Documentation</option>
|
|
<option value="c">Search Changelog</option>
|
|
</select>
|
|
<input type="text" name="q" id="searchbox" value="">
|
|
<input type="submit" value="Go">
|
|
</form>
|
|
</div>
|
|
</div>
|
|
<script>
|
|
function toggle_div(nm) {
|
|
var w = document.getElementById(nm);
|
|
if( w.style.display=="block" ){
|
|
w.style.display = "none";
|
|
}else{
|
|
w.style.display = "block";
|
|
}
|
|
}
|
|
function toggle_search() {
|
|
var w = document.getElementById("searchmenu");
|
|
if( w.style.display=="block" ){
|
|
w.style.display = "none";
|
|
} else {
|
|
w.style.display = "block";
|
|
setTimeout(function(){
|
|
document.getElementById("searchbox").focus()
|
|
}, 30);
|
|
}
|
|
}
|
|
function div_off(nm){document.getElementById(nm).style.display="none";}
|
|
window.onbeforeunload = function(e){div_off("submenu");}
|
|
/* Disable the Search feature if we are not operating from CGI, since */
|
|
/* Search is accomplished using CGI and will not work without it. */
|
|
if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
|
|
document.getElementById("search_menubutton").style.display = "none";
|
|
}
|
|
/* Used by the Hide/Show button beside syntax diagrams, to toggle the */
|
|
function hideorshow(btn,obj){
|
|
var x = document.getElementById(obj);
|
|
var b = document.getElementById(btn);
|
|
if( x.style.display!='none' ){
|
|
x.style.display = 'none';
|
|
b.innerHTML='show';
|
|
}else{
|
|
x.style.display = '';
|
|
b.innerHTML='hide';
|
|
}
|
|
return false;
|
|
}
|
|
</script>
|
|
</div>
|
|
|
|
|
|
|
|
<h1 align="center">Powersafe Overwrite</h1>
|
|
|
|
<p>"Powersafe overwrite" is a term used by the SQLite team to describe
|
|
a behavior of some filesystems and disk-controllers related to
|
|
data preservation during a power loss. Powersafe overwrite
|
|
is a boolean property: either the storage system has it or it does not.
|
|
|
|
<p>We say that a system has the powersafe overwrite property if the following
|
|
statement is true:
|
|
|
|
<blockquote>
|
|
<b>When an application writes a range of bytes in a file, no
|
|
bytes outside of that range will change, even if the write occurs
|
|
just before a crash or power failure.</b>
|
|
</blockquote>
|
|
|
|
<p>The powersafe overwrite property says nothing about the state of the
|
|
bytes that were written. Those bytes might contain their old values,
|
|
their new values, random values, or some combination of these. The powersafe
|
|
overwrite property merely states that writes cannot change bytes outside
|
|
of the range of bytes written.
|
|
|
|
<p>In other words, powersafe overwrite means that there is no "collateral
|
|
damage" when a power loss occurs while writing. Only those bytes actually
|
|
being written might be damaged.
|
|
|
|
<p>In practical terms, what the powersafe write property means is that when
|
|
the disk controller detects an impending power loss, it finishes writing
|
|
whatever sector it is working on prior to parking the heads. It means that
|
|
individual sector writes will complete once started, even if
|
|
there is a power loss.
|
|
|
|
<p>Consider what would happen if disk sector writes are interrupted
|
|
by a power loss. If an application writes two or three bytes in the middle
|
|
of some file, the operating system will implement this by first reading
|
|
the entire sector containing those bytes, making the change to the
|
|
sector in memory, then writing the entire sector back to the disk. If a power
|
|
loss occurs during the writeback and the sector was not completely written,
|
|
then on the next read after reboot, error correcting codes
|
|
in the sector will probably detect irreparable damage and the disk
|
|
controller will read out the sector as all zeros or all ones. Thus
|
|
values will have changed outside of the range of the two or three bytes
|
|
that were written at the application level - a violation of the powersafe
|
|
overwrite property.
|
|
|
|
<h2>SQLite Assumptions About Powersafe Overwrite</h2>
|
|
|
|
<p>All versions of SQLite up to and including <a href="releaselog/3_7_9.html">version 3.7.9</a>
|
|
(2011-11-01) assume that
|
|
the filesystem does <u>not</u> provide powersafe overwrite. SQLite
|
|
has traditionally assumed that when any one byte of a file changes, all
|
|
other bytes within the same sector of that byte have the potential of
|
|
being corrupted on a power loss. When writing, SQLite has made sure
|
|
to journal all bytes in the same sector of any modifications
|
|
and it pads journal files out to the next sector boundary so that
|
|
subsequent appends to that journal cannot damage prior records.
|
|
SQLite understands the sector size to be the value returned by the
|
|
xSectorSize method in the <a href="vfs.html">VFS</a>. The SQLite team has often referred
|
|
to the value returned by xSectorSize as the "blast radius" of a write,
|
|
since it expresses the range of bytes that might be damaged if a power
|
|
loss occurs during the write.
|
|
The default <a href="vfs.html">VFSes</a> for unix and windows have always returned 512 as
|
|
the sector size (or blast radius) for all versions of SQLite up to
|
|
and including version 3.7.9.
|
|
|
|
<p>Newer disk drives have begun using 4096 byte sectors however. Beginning
|
|
with SQLite <a href="releaselog/3_7_10.html">version 3.7.10</a> (2012-01-16),
|
|
the SQLite development team experimented with
|
|
changes xSectorSize to report 4096 bytes as the blast radius.
|
|
This had the effect of increasing write overhead on
|
|
many databases. For a database with a <a href="pragma.html#pragma_page_size">PRAGMA page_size</a> of 1024
|
|
(a very common choice) making a change to a single page in the database
|
|
now requires SQLite to backup three other adjacent pages to the rollback
|
|
journal, whereas formerly it only had to backup the one page that was
|
|
changing. In <a href="wal.html">WAL mode</a>, each transaction had to be padded out to the
|
|
next 4096-byte boundary in the WAL file, rather than the next 512-byte
|
|
boundary, resulting in thousands of extra bytes being written
|
|
per transaction.
|
|
|
|
<p>The extra write overhead prompted a reexamination of assumptions about
|
|
powersafe overwrite. With modern disk drives, the capacity has become
|
|
so large and the data density so great that a single sector is very
|
|
small and writing a single sector takes very little time. We know that
|
|
disk drives can detect an impending power loss and continue
|
|
to operate for some small amount of time on residual energy because those
|
|
drives are able to park their heads before spinning down. And
|
|
so if an impending power loss is detectable by the disk controller, it
|
|
seems reasonable that the controller will finish writing
|
|
whatever sector it is current working on when the imminent power loss
|
|
is first detected, prior to parking the heads, as long as doing so
|
|
does not take too long, which it should not with
|
|
small and dense sectors. Hence it seems reasonable
|
|
to assume powersafe overwrite for modern disks. Indeed, BerkeleyDB has
|
|
made this assumption for decades, we are told. Caution is advised
|
|
though. As Roger Binns noted on the SQLite developers mailing list:
|
|
"'poorly written' should be the main assumption about drive firmware."
|
|
|
|
<a name="tornpage"></a>
|
|
|
|
<h2>Torn Pages</h2>
|
|
|
|
<p>A torn page occurs when a database page is larger than a disk sector,
|
|
the database page is written to disk, but a power loss occurs prior to
|
|
all sectors of the database page being written. Then, upon recovery, part of
|
|
the database page will have the old content while some other parts of the
|
|
page will have the new content. Some database engines assume that
|
|
page writes are atomic and hence a torn page is an unrecoverable error.
|
|
</p>
|
|
|
|
<p>SQLite never assumes that database page writes are atomic,
|
|
regardless of the PSOW setting.<sup>(1)</sup>
|
|
And hence SQLite is always able to automatically recover from torn pages
|
|
induced by a crash. Enabling PSOW does not decrease SQLite's ability
|
|
to recover from a torn page.</p>
|
|
|
|
<h2>Changes In SQLite Version 3.7.10</h2>
|
|
|
|
<p>The <a href="vfs.html">VFS</a> for SQLite <a href="releaselog/3_7_10.html">version 3.7.10</a> (2012-01-16)
|
|
adds a new device characteristic
|
|
named <a href="c3ref/c_iocap_atomic.html">SQLITE_IOCAP_POWERSAFE_OVERWRITE</a>. Database files that report this
|
|
characteristic are assumed to reside on storage systems that have the
|
|
powersafe overwrite property.
|
|
The default unix and windows <a href="vfs.html">VFSes</a> now report
|
|
<a href="c3ref/c_iocap_atomic.html">SQLITE_IOCAP_POWERSAFE_OVERWRITE</a> if SQLite is compiled with
|
|
<a href="compile.html#powersafe_overwrite">-DSQLITE_POWERSAFE_OVERWRITE=1</a> or they
|
|
make the legacy assumption that storage does not have the powersafe
|
|
overwrite property if compiled with
|
|
<a href="compile.html#powersafe_overwrite">-DSQLITE_POWERSAFE_OVERWRITE=0</a>.
|
|
For now, the default is for powersafe overwrite to be turned on, though
|
|
we may revisit this in the future and default it off.
|
|
|
|
<p>The powersafe overwrite property for individual databases can be
|
|
specified as the database is opened using the "psow" query parameter
|
|
with a <a href="uri.html">URI filename</a>. For example, to always assume powersafe
|
|
overwrite for a file (perhaps to ensure maximum write performance),
|
|
open it as
|
|
|
|
<blockquote>
|
|
file:somefile.db?psow=1
|
|
</blockquote>
|
|
|
|
<p>Or to be extra safe with a database and to force SQLite to assume the
|
|
database lacks powersafe overwrite, open it using
|
|
|
|
<blockquote>
|
|
file:somefile.db?psow=0
|
|
</blockquote>
|
|
|
|
<p>There is also a new <a href="c3ref/c_fcntl_begin_atomic_write.html#sqlitefcntlpowersafeoverwrite">SQLITE_FCNTL_POWERSAFE_OVERWRITE</a> opcode for
|
|
the <a href="c3ref/file_control.html">sqlite3_file_control()</a> that allows
|
|
an application to query the powersafe overwrite property for a database
|
|
file.
|
|
|
|
<hr>
|
|
<h2>Notes:</h2>
|
|
<ol><li value=1><p>
|
|
SQLite never assumes atomic page writes <em>in its default configurations</em>.
|
|
But a custom <a href="vfs.html">VFS</a> can set one of the
|
|
<a href="c3ref/c_iocap_atomic.html">SQLITE_IOCAP_ATOMIC</a> bits in the result of the xDeviceCharacteristic()
|
|
method and then SQLite will assume that page writes are atomic. The
|
|
application must supply a custom VFS to accomplish this, however, since
|
|
none of the standard VFSes will ever set any of the atomic bits in the
|
|
xDeviceCharacteristics() vector.
|
|
</ol>
|
|
|