262 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			262 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE html>
 | |
| <html><head>
 | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0">
 | |
| <meta http-equiv="content-type" content="text/html; charset=UTF-8">
 | |
| <link href="sqlite.css" rel="stylesheet">
 | |
| <title>Powersafe Overwrite</title>
 | |
| <!-- path= -->
 | |
| </head>
 | |
| <body>
 | |
| <div class=nosearch>
 | |
| <a href="index.html">
 | |
| <img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
 | |
| </a>
 | |
| <div><!-- IE hack to prevent disappearing logo --></div>
 | |
| <div class="tagline desktoponly">
 | |
| Small. Fast. Reliable.<br>Choose any three.
 | |
| </div>
 | |
| <div class="menu mainmenu">
 | |
| <ul>
 | |
| <li><a href="index.html">Home</a>
 | |
| <li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
 | |
| <li class='wideonly'><a href='about.html'>About</a>
 | |
| <li class='desktoponly'><a href="docs.html">Documentation</a>
 | |
| <li class='desktoponly'><a href="download.html">Download</a>
 | |
| <li class='wideonly'><a href='copyright.html'>License</a>
 | |
| <li class='desktoponly'><a href="support.html">Support</a>
 | |
| <li class='desktoponly'><a href="prosupport.html">Purchase</a>
 | |
| <li class='search' id='search_menubutton'>
 | |
| <a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
 | |
| </ul>
 | |
| </div>
 | |
| <div class="menu submenu" id="submenu">
 | |
| <ul>
 | |
| <li><a href='about.html'>About</a>
 | |
| <li><a href='docs.html'>Documentation</a>
 | |
| <li><a href='download.html'>Download</a>
 | |
| <li><a href='support.html'>Support</a>
 | |
| <li><a href='prosupport.html'>Purchase</a>
 | |
| </ul>
 | |
| </div>
 | |
| <div class="searchmenu" id="searchmenu">
 | |
| <form method="GET" action="search">
 | |
| <select name="s" id="searchtype">
 | |
| <option value="d">Search Documentation</option>
 | |
| <option value="c">Search Changelog</option>
 | |
| </select>
 | |
| <input type="text" name="q" id="searchbox" value="">
 | |
| <input type="submit" value="Go">
 | |
| </form>
 | |
| </div>
 | |
| </div>
 | |
| <script>
 | |
| function toggle_div(nm) {
 | |
| var w = document.getElementById(nm);
 | |
| if( w.style.display=="block" ){
 | |
| w.style.display = "none";
 | |
| }else{
 | |
| w.style.display = "block";
 | |
| }
 | |
| }
 | |
| function toggle_search() {
 | |
| var w = document.getElementById("searchmenu");
 | |
| if( w.style.display=="block" ){
 | |
| w.style.display = "none";
 | |
| } else {
 | |
| w.style.display = "block";
 | |
| setTimeout(function(){
 | |
| document.getElementById("searchbox").focus()
 | |
| }, 30);
 | |
| }
 | |
| }
 | |
| function div_off(nm){document.getElementById(nm).style.display="none";}
 | |
| window.onbeforeunload = function(e){div_off("submenu");}
 | |
| /* Disable the Search feature if we are not operating from CGI, since */
 | |
| /* Search is accomplished using CGI and will not work without it. */
 | |
| if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
 | |
| document.getElementById("search_menubutton").style.display = "none";
 | |
| }
 | |
| /* Used by the Hide/Show button beside syntax diagrams, to toggle the */
 | |
| function hideorshow(btn,obj){
 | |
| var x = document.getElementById(obj);
 | |
| var b = document.getElementById(btn);
 | |
| if( x.style.display!='none' ){
 | |
| x.style.display = 'none';
 | |
| b.innerHTML='show';
 | |
| }else{
 | |
| x.style.display = '';
 | |
| b.innerHTML='hide';
 | |
| }
 | |
| return false;
 | |
| }
 | |
| </script>
 | |
| </div>
 | |
| 
 | |
| 
 | |
| 
 | |
| <h1 align="center">Powersafe Overwrite</h1>
 | |
| 
 | |
| <p>"Powersafe overwrite" is a term used by the SQLite team to describe
 | |
| a behavior of some filesystems and disk-controllers related to
 | |
| data preservation during a power loss.  Powersafe overwrite
 | |
| is a boolean property: either the storage system has it or it does not.
 | |
| 
 | |
| <p>We say that a system has the powersafe overwrite property if the following
 | |
| statement is true:
 | |
| 
 | |
| <blockquote>
 | |
|   <b>When an application writes a range of bytes in a file, no
 | |
|   bytes outside of that range will change, even if the write occurs
 | |
|   just before a crash or power failure.</b>
 | |
| </blockquote>
 | |
| 
 | |
| <p>The powersafe overwrite property says nothing about the state of the
 | |
| bytes that were written.  Those bytes might contain their old values,
 | |
| their new values, random values, or some combination of these.  The powersafe
 | |
| overwrite property merely states that writes cannot change bytes outside
 | |
| of the range of bytes written.
 | |
| 
 | |
| <p>In other words, powersafe overwrite means that there is no "collateral
 | |
| damage" when a power loss occurs while writing.  Only those bytes actually
 | |
| being written might be damaged.
 | |
| 
 | |
| <p>In practical terms, what the powersafe write property means is that when
 | |
| the disk controller detects an impending power loss, it finishes writing
 | |
| whatever sector it is working on prior to parking the heads.  It means that
 | |
| individual sector writes will complete once started, even if
 | |
| there is a power loss.
 | |
| 
 | |
| <p>Consider what would happen if disk sector writes are interrupted
 | |
| by a power loss.  If an application writes two or three bytes in the middle
 | |
| of some file, the operating system will implement this by first reading
 | |
| the entire sector containing those bytes, making the change to the
 | |
| sector in memory, then writing the entire sector back to the disk.  If a power
 | |
| loss occurs during the writeback and the sector was not completely written,
 | |
| then on the next read after reboot, error correcting codes
 | |
| in the sector will probably detect irreparable damage and the disk 
 | |
| controller will read out the sector as all zeros or all ones.  Thus
 | |
| values will have changed outside of the range of the two or three bytes 
 | |
| that were written at the application level - a violation of the powersafe
 | |
| overwrite property.
 | |
| 
 | |
| <h2>SQLite Assumptions About Powersafe Overwrite</h2>
 | |
| 
 | |
| <p>All versions of SQLite up to and including <a href="releaselog/3_7_9.html">version 3.7.9</a>
 | |
| (2011-11-01) assume that
 | |
| the filesystem does <u>not</u> provide powersafe overwrite.  SQLite 
 | |
| has traditionally assumed that when any one byte of a file changes, all
 | |
| other bytes within the same sector of that byte have the potential of
 | |
| being corrupted on a power loss.  When writing, SQLite has made sure
 | |
| to journal all bytes in the same sector of any modifications
 | |
| and it pads journal files out to the next sector boundary so that
 | |
| subsequent appends to that journal cannot damage prior records.
 | |
| SQLite understands the sector size to be the value returned by the
 | |
| xSectorSize method in the <a href="vfs.html">VFS</a>.  The SQLite team has often referred
 | |
| to the value returned by xSectorSize as the "blast radius" of a write,
 | |
| since it expresses the range of bytes that might be damaged if a power
 | |
| loss occurs during the write.
 | |
| The default <a href="vfs.html">VFSes</a> for unix and windows have always returned 512 as 
 | |
| the sector size (or blast radius) for all versions of SQLite up to
 | |
| and including version 3.7.9.
 | |
| 
 | |
| <p>Newer disk drives have begun using 4096 byte sectors however.  Beginning
 | |
| with SQLite <a href="releaselog/3_7_10.html">version 3.7.10</a> (2012-01-16), 
 | |
| the SQLite development team experimented with 
 | |
| changes xSectorSize to report 4096 bytes as the blast radius.
 | |
| This had the effect of increasing write overhead on
 | |
| many databases.  For a database with a <a href="pragma.html#pragma_page_size">PRAGMA page_size</a> of 1024
 | |
| (a very common choice) making a change to a single page in the database
 | |
| now requires SQLite to backup three other adjacent pages to the rollback
 | |
| journal, whereas formerly it only had to backup the one page that was
 | |
| changing.  In <a href="wal.html">WAL mode</a>, each transaction had to be padded out to the
 | |
| next 4096-byte boundary in the WAL file, rather than the next 512-byte
 | |
| boundary, resulting in thousands of extra bytes being written
 | |
| per transaction.
 | |
| 
 | |
| <p>The extra write overhead prompted a reexamination of assumptions about
 | |
| powersafe overwrite.  With modern disk drives, the capacity has become
 | |
| so large and the data density so great that a single sector is very
 | |
| small and writing a single sector takes very little time.  We know that
 | |
| disk drives can detect an impending power loss and continue
 | |
| to operate for some small amount of time on residual energy because those
 | |
| drives are able to park their heads before spinning down.  And
 | |
| so if an impending power loss is detectable by the disk controller, it
 | |
| seems reasonable that the controller will finish writing
 | |
| whatever sector it is current working on when the imminent power loss 
 | |
| is first detected, prior to parking the heads, as long as doing so
 | |
| does not take too long, which it should not with
 | |
| small and dense sectors.  Hence it seems reasonable
 | |
| to assume powersafe overwrite for modern disks.  Indeed, BerkeleyDB has
 | |
| made this assumption for decades, we are told.  Caution is advised
 | |
| though. As Roger Binns noted on the SQLite developers mailing list:
 | |
| "'poorly written' should be the main assumption about drive firmware."
 | |
| 
 | |
| <a name="tornpage"></a>
 | |
| 
 | |
| <h2>Torn Pages</h2>
 | |
| 
 | |
| <p>A torn page occurs when a database page is larger than a disk sector,
 | |
| the database page is written to disk, but a power loss occurs prior to
 | |
| all sectors of the database page being written.  Then, upon recovery, part of
 | |
| the database page will have the old content while some other parts of the
 | |
| page will have the new content.  Some database engines assume that 
 | |
| page writes are atomic and hence a torn page is an unrecoverable error.
 | |
| </p>
 | |
| 
 | |
| <p>SQLite never assumes that database page writes are atomic,
 | |
| regardless of the PSOW setting.<sup>(1)</sup>
 | |
| And hence SQLite is always able to automatically recover from torn pages
 | |
| induced by a crash.  Enabling PSOW does not decrease SQLite's ability
 | |
| to recover from a torn page.</p>
 | |
| 
 | |
| <h2>Changes In SQLite Version 3.7.10</h2>
 | |
| 
 | |
| <p>The <a href="vfs.html">VFS</a> for SQLite <a href="releaselog/3_7_10.html">version 3.7.10</a> (2012-01-16)
 | |
| adds a new device characteristic 
 | |
| named <a href="c3ref/c_iocap_atomic.html">SQLITE_IOCAP_POWERSAFE_OVERWRITE</a>.  Database files that report this
 | |
| characteristic are assumed to reside on storage systems that have the
 | |
| powersafe overwrite property.
 | |
| The default unix and windows <a href="vfs.html">VFSes</a> now report
 | |
| <a href="c3ref/c_iocap_atomic.html">SQLITE_IOCAP_POWERSAFE_OVERWRITE</a> if SQLite is compiled with
 | |
| <a href="compile.html#powersafe_overwrite">-DSQLITE_POWERSAFE_OVERWRITE=1</a> or they
 | |
| make the legacy assumption that storage does not have the powersafe
 | |
| overwrite property if compiled with
 | |
| <a href="compile.html#powersafe_overwrite">-DSQLITE_POWERSAFE_OVERWRITE=0</a>.
 | |
| For now, the default is for powersafe overwrite to be turned on, though
 | |
| we may revisit this in the future and default it off.
 | |
| 
 | |
| <p>The powersafe overwrite property for individual databases can be
 | |
| specified as the database is opened using the "psow" query parameter
 | |
| with a <a href="uri.html">URI filename</a>.  For example, to always assume powersafe
 | |
| overwrite for a file (perhaps to ensure maximum write performance), 
 | |
| open it as
 | |
| 
 | |
| <blockquote>
 | |
|    file:somefile.db?psow=1
 | |
| </blockquote>
 | |
| 
 | |
| <p>Or to be extra safe with a database and to force SQLite to assume the
 | |
| database lacks powersafe overwrite, open it using
 | |
| 
 | |
| <blockquote>
 | |
|    file:somefile.db?psow=0
 | |
| </blockquote>
 | |
| 
 | |
| <p>There is also a new <a href="c3ref/c_fcntl_begin_atomic_write.html#sqlitefcntlpowersafeoverwrite">SQLITE_FCNTL_POWERSAFE_OVERWRITE</a> opcode for
 | |
| the <a href="c3ref/file_control.html">sqlite3_file_control()</a> that allows
 | |
| an application to query the powersafe overwrite property for a database
 | |
| file.
 | |
| 
 | |
| <hr>
 | |
| <h2>Notes:</h2>
 | |
| <ol><li value=1><p>
 | |
| SQLite never assumes atomic page writes <em>in its default configurations</em>.
 | |
| But a custom <a href="vfs.html">VFS</a> can set one of the 
 | |
| <a href="c3ref/c_iocap_atomic.html">SQLITE_IOCAP_ATOMIC</a> bits in the result of the xDeviceCharacteristic()
 | |
| method and then SQLite will assume that page writes are atomic.  The
 | |
| application must supply a custom VFS to accomplish this, however, since
 | |
| none of the standard VFSes will ever set any of the atomic bits in the
 | |
| xDeviceCharacteristics() vector.
 | |
| </ol>
 | |
| 
 |