1242 lines
57 KiB
HTML
1242 lines
57 KiB
HTML
<!DOCTYPE html>
|
|
<html><head>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
|
|
<link href="sqlite.css" rel="stylesheet">
|
|
<title>How SQLite Is Tested</title>
|
|
<!-- path= -->
|
|
</head>
|
|
<body>
|
|
<div class=nosearch>
|
|
<a href="index.html">
|
|
<img class="logo" src="images/sqlite370_banner.gif" alt="SQLite" border="0">
|
|
</a>
|
|
<div><!-- IE hack to prevent disappearing logo --></div>
|
|
<div class="tagline desktoponly">
|
|
Small. Fast. Reliable.<br>Choose any three.
|
|
</div>
|
|
<div class="menu mainmenu">
|
|
<ul>
|
|
<li><a href="index.html">Home</a>
|
|
<li class='mobileonly'><a href="javascript:void(0)" onclick='toggle_div("submenu")'>Menu</a>
|
|
<li class='wideonly'><a href='about.html'>About</a>
|
|
<li class='desktoponly'><a href="docs.html">Documentation</a>
|
|
<li class='desktoponly'><a href="download.html">Download</a>
|
|
<li class='wideonly'><a href='copyright.html'>License</a>
|
|
<li class='desktoponly'><a href="support.html">Support</a>
|
|
<li class='desktoponly'><a href="prosupport.html">Purchase</a>
|
|
<li class='search' id='search_menubutton'>
|
|
<a href="javascript:void(0)" onclick='toggle_search()'>Search</a>
|
|
</ul>
|
|
</div>
|
|
<div class="menu submenu" id="submenu">
|
|
<ul>
|
|
<li><a href='about.html'>About</a>
|
|
<li><a href='docs.html'>Documentation</a>
|
|
<li><a href='download.html'>Download</a>
|
|
<li><a href='support.html'>Support</a>
|
|
<li><a href='prosupport.html'>Purchase</a>
|
|
</ul>
|
|
</div>
|
|
<div class="searchmenu" id="searchmenu">
|
|
<form method="GET" action="search">
|
|
<select name="s" id="searchtype">
|
|
<option value="d">Search Documentation</option>
|
|
<option value="c">Search Changelog</option>
|
|
</select>
|
|
<input type="text" name="q" id="searchbox" value="">
|
|
<input type="submit" value="Go">
|
|
</form>
|
|
</div>
|
|
</div>
|
|
<script>
|
|
function toggle_div(nm) {
|
|
var w = document.getElementById(nm);
|
|
if( w.style.display=="block" ){
|
|
w.style.display = "none";
|
|
}else{
|
|
w.style.display = "block";
|
|
}
|
|
}
|
|
function toggle_search() {
|
|
var w = document.getElementById("searchmenu");
|
|
if( w.style.display=="block" ){
|
|
w.style.display = "none";
|
|
} else {
|
|
w.style.display = "block";
|
|
setTimeout(function(){
|
|
document.getElementById("searchbox").focus()
|
|
}, 30);
|
|
}
|
|
}
|
|
function div_off(nm){document.getElementById(nm).style.display="none";}
|
|
window.onbeforeunload = function(e){div_off("submenu");}
|
|
/* Disable the Search feature if we are not operating from CGI, since */
|
|
/* Search is accomplished using CGI and will not work without it. */
|
|
if( !location.origin || !location.origin.match || !location.origin.match(/http/) ){
|
|
document.getElementById("search_menubutton").style.display = "none";
|
|
}
|
|
/* Used by the Hide/Show button beside syntax diagrams, to toggle the */
|
|
function hideorshow(btn,obj){
|
|
var x = document.getElementById(obj);
|
|
var b = document.getElementById(btn);
|
|
if( x.style.display!='none' ){
|
|
x.style.display = 'none';
|
|
b.innerHTML='show';
|
|
}else{
|
|
x.style.display = '';
|
|
b.innerHTML='hide';
|
|
}
|
|
return false;
|
|
}
|
|
</script>
|
|
</div>
|
|
<div class=fancy>
|
|
<div class=nosearch>
|
|
<div class="fancy_title">
|
|
How SQLite Is Tested
|
|
</div>
|
|
<div class="fancy_toc">
|
|
<a onclick="toggle_toc()">
|
|
<span class="fancy_toc_mark" id="toc_mk">►</span>
|
|
Table Of Contents
|
|
</a>
|
|
<div id="toc_sub"><div class="fancy-toc1"><a href="#introduction">1. Introduction</a></div>
|
|
<div class="fancy-toc2"><a href="#executive_summary">1.1. Executive Summary</a></div>
|
|
<div class="fancy-toc1"><a href="#test_harnesses">2. Test Harnesses</a></div>
|
|
<div class="fancy-toc1"><a href="#anomaly_testing">3. Anomaly Testing</a></div>
|
|
<div class="fancy-toc2"><a href="#out_of_memory_testing">3.1. Out-Of-Memory Testing</a></div>
|
|
<div class="fancy-toc2"><a href="#i_o_error_testing">3.2. I/O Error Testing</a></div>
|
|
<div class="fancy-toc2"><a href="#crash_testing">3.3. Crash Testing</a></div>
|
|
<div class="fancy-toc2"><a href="#compound_failure_tests">3.4. Compound failure tests</a></div>
|
|
<div class="fancy-toc1"><a href="#fuzz_testing">4. Fuzz Testing</a></div>
|
|
<div class="fancy-toc2"><a href="#sql_fuzz">4.1. SQL Fuzz</a></div>
|
|
<div class="fancy-toc3"><a href="#sql_fuzz_using_the_american_fuzzy_lop_fuzzer">4.1.1. SQL Fuzz Using The American Fuzzy Lop Fuzzer</a></div>
|
|
<div class="fancy-toc3"><a href="#google_oss_fuzz">4.1.2. Google OSS Fuzz</a></div>
|
|
<div class="fancy-toc3"><a href="#the_dbsqlfuzz_fuzzer">4.1.3. The dbsqlfuzz fuzzer</a></div>
|
|
<div class="fancy-toc3"><a href="#other_third_party_fuzzers">4.1.4. Other third-party fuzzers</a></div>
|
|
<div class="fancy-toc3"><a href="#the_fuzzcheck_test_harness">4.1.5. The fuzzcheck test harness</a></div>
|
|
<div class="fancy-toc3"><a href="#tension_between_fuzz_testing_and_100_mc_dc_testing">4.1.6. Tension Between Fuzz Testing And 100% MC/DC Testing</a></div>
|
|
<div class="fancy-toc2"><a href="#malformed_database_files">4.2. Malformed Database Files</a></div>
|
|
<div class="fancy-toc2"><a href="#boundary_value_tests">4.3. Boundary Value Tests</a></div>
|
|
<div class="fancy-toc1"><a href="#regression_testing">5. Regression Testing</a></div>
|
|
<div class="fancy-toc1"><a href="#automatic_resource_leak_detection">6. Automatic Resource Leak Detection</a></div>
|
|
<div class="fancy-toc1"><a href="#test_coverage">7. Test Coverage</a></div>
|
|
<div class="fancy-toc2"><a href="#statement_versus_branch_coverage">7.1. Statement versus branch coverage</a></div>
|
|
<div class="fancy-toc2"><a href="#coverage_testing_of_defensive_code">7.2. Coverage testing of defensive code</a></div>
|
|
<div class="fancy-toc2"><a href="#forcing_coverage_of_boundary_values_and_boolean_vector_tests">7.3. Forcing coverage of boundary values and boolean vector tests</a></div>
|
|
<div class="fancy-toc2"><a href="#branch_coverage_versus_mc_dc">7.4. Branch coverage versus MC/DC</a></div>
|
|
<div class="fancy-toc2"><a href="#measuring_branch_coverage">7.5. Measuring branch coverage</a></div>
|
|
<div class="fancy-toc2"><a href="#mutation_testing">7.6. Mutation testing</a></div>
|
|
<div class="fancy-toc2"><a href="#experience_with_full_test_coverage">7.7. Experience with full test coverage</a></div>
|
|
<div class="fancy-toc1"><a href="#dynamic_analysis">8. Dynamic Analysis</a></div>
|
|
<div class="fancy-toc2"><a href="#assert">8.1. Assert</a></div>
|
|
<div class="fancy-toc2"><a href="#valgrind">8.2. Valgrind</a></div>
|
|
<div class="fancy-toc2"><a href="#memsys2">8.3. Memsys2</a></div>
|
|
<div class="fancy-toc2"><a href="#mutex_asserts">8.4. Mutex Asserts</a></div>
|
|
<div class="fancy-toc2"><a href="#journal_tests">8.5. Journal Tests</a></div>
|
|
<div class="fancy-toc2"><a href="#undefined_behavior_checks">8.6. Undefined Behavior Checks</a></div>
|
|
<div class="fancy-toc1"><a href="#disabled_optimization_tests">9. Disabled Optimization Tests</a></div>
|
|
<div class="fancy-toc1"><a href="#checklists">10. Checklists</a></div>
|
|
<div class="fancy-toc1"><a href="#static_analysis">11. Static Analysis</a></div>
|
|
<div class="fancy-toc1"><a href="#summary">12. Summary</a></div>
|
|
</div>
|
|
</div>
|
|
<script>
|
|
function toggle_toc(){
|
|
var sub = document.getElementById("toc_sub")
|
|
var mk = document.getElementById("toc_mk")
|
|
if( sub.style.display!="block" ){
|
|
sub.style.display = "block";
|
|
mk.innerHTML = "▼";
|
|
} else {
|
|
sub.style.display = "none";
|
|
mk.innerHTML = "►";
|
|
}
|
|
}
|
|
</script>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<h1 id="introduction"><span>1. </span>Introduction</h1>
|
|
|
|
<p>The reliability and robustness of SQLite is achieved in part
|
|
by thorough and careful testing.</p>
|
|
|
|
<p>As of <a href="releaselog/3_33_0.html">version 3.33.0</a> (2020-08-14),
|
|
the SQLite library consists of approximately
|
|
143.4 KSLOC of C code.
|
|
(KSLOC means thousands of "Source Lines Of Code" or, in other words,
|
|
lines of code excluding blank lines and comments.)
|
|
By comparison, the project has
|
|
640 times as much
|
|
test code and test scripts -
|
|
91911.0 KSLOC.</p>
|
|
|
|
<h2 id="executive_summary"><span>1.1. </span>Executive Summary</h2>
|
|
|
|
<ul>
|
|
<li> Four independently developed test harnesses
|
|
</li><li> 100% branch test coverage in an as-deployed configuration
|
|
</li><li> Millions and millions of test cases
|
|
</li><li> Out-of-memory tests
|
|
</li><li> I/O error tests
|
|
</li><li> Crash and power loss tests
|
|
</li><li> Fuzz tests
|
|
</li><li> Boundary value tests
|
|
</li><li> Disabled optimization tests
|
|
</li><li> Regression tests
|
|
</li><li> Malformed database tests
|
|
</li><li> Extensive use of assert() and run-time checks
|
|
</li><li> Valgrind analysis
|
|
</li><li> Undefined behavior checks
|
|
</li><li> Checklists
|
|
</li></ul>
|
|
|
|
<a name="harnesses"></a>
|
|
|
|
<h1 id="test_harnesses"><span>2. </span>Test Harnesses</h1>
|
|
|
|
<p>There are four independent test harnesses used for testing the
|
|
core SQLite library.
|
|
Each test harness is designed, maintained, and managed separately
|
|
from the others.
|
|
</p>
|
|
|
|
<ol>
|
|
<li><p>
|
|
<a name="tcl"></a>
|
|
|
|
The <b>TCL Tests</b> are the oldest set of tests for SQLite.
|
|
They are contained in the same source tree as the
|
|
SQLite core and like the SQLite core are in the public domain. The
|
|
TCL tests are the primary tests used during development.
|
|
The TCL tests are written using the
|
|
<a href="http://www.tcl-lang.org/">TCL scripting language</a>.
|
|
The TCL test harness itself consists of 27.0 KSLOC
|
|
of C code used to create the TCL interface. The test scripts are contained
|
|
in 1272 files totaling
|
|
21.1MB in size. There are
|
|
46363 distinct test cases, but many of the test
|
|
cases are parameterized and run multiple times (with different parameters)
|
|
so that on a full test run millions of
|
|
separate tests are performed.
|
|
</p>
|
|
</li>
|
|
|
|
<li><p>
|
|
The <b><a href="th3.html">TH3</a></b> test harness is a set of proprietary tests, written in
|
|
C that provide 100% branch test coverage
|
|
(and <a href="testing.html#mcdc">100% MC/DC test coverage</a>) to
|
|
the core SQLite library. The TH3 tests are designed to run
|
|
on embedded and specialized platforms that would not easily support
|
|
TCL or other workstation services. TH3 tests use only the published
|
|
SQLite interfaces. TH3 consists of about
|
|
71.5 MB or 978.3 KSLOC
|
|
of C code implementing 46622 distinct test cases.
|
|
TH3 tests are heavily parameterized, though, so a full-coverage test runs
|
|
about 1.9 million different test
|
|
instances. The cases that provide 100% branch test coverage constitute
|
|
a subset of the total TH3 test suite. A soak test
|
|
prior to release does hundreds of millions of tests.
|
|
Additional information on TH3 is <a href="th3.html">available separately</a>.</p></li>
|
|
|
|
<li><p>
|
|
<a name="slt"></a>
|
|
|
|
The <a href="http://www.sqlite.org/sqllogictest"><b>SQL Logic Test</b></a>
|
|
or SLT test harness is used to run huge numbers
|
|
of SQL statements against both SQLite and several other SQL database engines
|
|
and verify that they all get the same answers. SLT currently compares
|
|
SQLite against PostgreSQL, MySQL, Microsoft SQL Server, and Oracle 10g.
|
|
SLT runs 7.2 million queries comprising
|
|
1.12GB of test data.
|
|
</p></li>
|
|
|
|
<li><p>
|
|
The <a href="#dbsqlfuzz"><b>dbsqlfuzz</b></a> engine is a
|
|
proprietary fuzz tester. Other <a href="testing.html#fuzztesting">fuzzers for SQLite</a>
|
|
mutate either the SQL inputs or the database file. Dbsqlfuzz mutates
|
|
both the SQL and the database file at the same time, and is thus able
|
|
to reach new error states. Dbsqlfuzz is built using the
|
|
<a href="http://llvm.org/docs/LibFuzzer.html">libFuzzer</a> framework of LLVM
|
|
with a custom mutator. Starting from
|
|
236 seed files, the dbsqlfuzz fuzzer
|
|
has examined many tens of millions of mutations, resulting in
|
|
63493 distinct test cases that form
|
|
the basis of each new fuzzing session. Dbsqlfuzz helps ensure
|
|
that SQLite is robust against attack via malicious SQL or database
|
|
inputs.
|
|
</p></li></ol>
|
|
|
|
<p>In addition to the four main test harnesses, there several other
|
|
small programs that implement specialized tests.
|
|
</p><ol>
|
|
<li value="5">The "speedtest1.c" program
|
|
estimates the performance of SQLite under a typical workload.
|
|
</li><li>The "mptester.c" program is a stress test for multiple processes
|
|
concurrently reading and writing a single database.
|
|
</li><li>The "threadtest3.c" program is a stress test for multiple threads using
|
|
SQLite simultaneously.
|
|
</li><li>The "fuzzershell.c" program is used to
|
|
run some <a href="#fuzztesting">fuzz tests</a>.
|
|
</li></ol>
|
|
|
|
|
|
<p>All of the tests above must run successfully, on multiple platforms
|
|
and under multiple compile-time configurations,
|
|
before each release of SQLite.</p>
|
|
|
|
<p>Prior to each check-in to the SQLite source tree, developers
|
|
typically run a subset (called "veryquick") of the Tcl tests
|
|
consisting of about
|
|
248.5 thousand test cases.
|
|
The veryquick tests include most tests other than the anomaly, fuzz, and
|
|
soak tests. The idea behind the veryquick tests are that they are
|
|
sufficient to catch most errors, but also run in only a few minutes
|
|
instead of a few hours.</p>
|
|
|
|
<a name="anomaly"></a>
|
|
|
|
<h1 id="anomaly_testing"><span>3. </span>Anomaly Testing</h1>
|
|
|
|
<p>Anomaly tests are tests designed to verify the correct behavior
|
|
of SQLite when something goes wrong. It is (relatively) easy to build
|
|
an SQL database engine that behaves correctly on well-formed inputs
|
|
on a fully functional computer. It is more difficult to build a system
|
|
that responds sanely to invalid inputs and continues to function following
|
|
system malfunctions. The anomaly tests are designed to verify the latter
|
|
behavior.</p>
|
|
|
|
<a name="oomtesting"></a>
|
|
|
|
<h2 id="out_of_memory_testing"><span>3.1. </span>Out-Of-Memory Testing</h2>
|
|
|
|
<p>SQLite, like all SQL database engines, makes extensive use of
|
|
malloc() (See the separate report on
|
|
<a href="malloc.html">dynamic memory allocation in SQLite</a> for
|
|
additional detail.)
|
|
On servers and workstations, malloc() never fails in practice and so correct
|
|
handling of out-of-memory (OOM) errors is not particularly important.
|
|
But on embedded devices, OOM errors are frighteningly common and since
|
|
SQLite is frequently used on embedded devices, it is important that
|
|
SQLite be able to gracefully handle OOM errors.</p>
|
|
|
|
<p>OOM testing is accomplished by simulating OOM errors.
|
|
SQLite allows an application to substitute an alternative malloc()
|
|
implementation using the <a href="c3ref/config.html">sqlite3_config</a>(<a href="c3ref/c_config_covering_index_scan.html#sqliteconfigmalloc">SQLITE_CONFIG_MALLOC</a>,...)
|
|
interface. The TCL and TH3 test harnesses are both capable of
|
|
inserting a modified version of malloc() that can be rigged to fail
|
|
after a certain number of allocations. These instrumented mallocs
|
|
can be set to fail only once and then start working again, or to
|
|
continue failing after the first failure. OOM tests are done in a
|
|
loop. On the first iteration of the loop, the instrumented malloc
|
|
is rigged to fail on the first allocation. Then some SQLite operation
|
|
is carried out and checks are done to make sure SQLite handled the
|
|
OOM error correctly. Then the time-to-failure counter
|
|
on the instrumented malloc is increased by one and the test is
|
|
repeated. The loop continues until the entire operation runs to
|
|
completion without ever encountering a simulated OOM failure.
|
|
Tests like this are run twice, once with the instrumented malloc
|
|
set to fail only once, and again with the instrumented malloc set
|
|
to fail continuously after the first failure.</p>
|
|
|
|
<a name="ioerrtesting"></a>
|
|
|
|
<h2 id="i_o_error_testing"><span>3.2. </span>I/O Error Testing</h2>
|
|
|
|
<p>I/O error testing seeks to verify that SQLite responds sanely
|
|
to failed I/O operations. I/O errors might result from a full disk drive,
|
|
malfunctioning disk hardware, network outages when using a network
|
|
file system, system configuration or permission changes that occur in the
|
|
middle of an SQL operation, or other hardware or operating system
|
|
malfunctions. Whatever the cause, it is important that SQLite be able
|
|
to respond correctly to these errors and I/O error testing seeks to
|
|
verify that it does.</p>
|
|
|
|
<p>I/O error testing is similar in concept to OOM testing; I/O errors
|
|
are simulated and checks are made to verify that SQLite responds
|
|
correctly to the simulated errors. I/O errors are simulated in both
|
|
the TCL and TH3 test harnesses by inserting a new
|
|
<a href="c3ref/vfs.html">Virtual File System object</a> that is specially rigged
|
|
to simulate an I/O error after a set number of I/O operations.
|
|
As with OOM error testing, the I/O error simulators can be set to
|
|
fail just once, or to fail continuously after the first failure.
|
|
Tests are run in a loop, slowly increasing the point of failure until
|
|
the test case runs to completion without error. The loop is run twice,
|
|
once with the I/O error simulator set to simulate only a single failure
|
|
and a second time with it set to fail all I/O operations after the first
|
|
failure.</p>
|
|
|
|
<p>In I/O error tests, after the I/O error simulation failure mechanism
|
|
is disabled, the database is examined using
|
|
<a href="pragma.html#pragma_integrity_check">PRAGMA integrity_check</a> to make sure that the I/O error has not
|
|
introduced database corruption.</p>
|
|
|
|
<a name="crashtesting"></a>
|
|
|
|
<h2 id="crash_testing"><span>3.3. </span>Crash Testing</h2>
|
|
|
|
<p>Crash testing seeks to demonstrate that an SQLite database will not
|
|
go corrupt if the application or operating system crashes or if there
|
|
is a power failure in the middle of a database update. A separate
|
|
white-paper titled
|
|
<a href="atomiccommit.html">Atomic Commit in SQLite</a> describes the
|
|
defensive measure SQLite takes to prevent database corruption following
|
|
a crash. Crash tests strive to verify that those defensive measures
|
|
are working correctly.</p>
|
|
|
|
<p>It is impractical to do crash testing using real power failures, of
|
|
course, and so crash testing is done in simulation. An alternative
|
|
<a href="c3ref/vfs.html">Virtual File System</a> is inserted that allows the test
|
|
harness to simulate the state of the database file following a crash.</p>
|
|
|
|
<p>In the TCL test harness, the crash simulation is done in a separate
|
|
process. The main testing process spawns a child process which runs
|
|
some SQLite operation and randomly crashes somewhere in the middle of
|
|
a write operation. A special <a href="vfs.html">VFS</a> randomly reorders and corrupts
|
|
the unsynchronized
|
|
write operations to simulate the effect of buffered filesystems. After
|
|
the child dies, the original test process opens and reads the test
|
|
database and verifies that the changes attempted by the child either
|
|
completed successfully or else were completely rolled back. The
|
|
<a href="pragma.html#pragma_integrity_check">integrity_check</a> <a href="pragma.html#syntax">PRAGMA</a> is used to make sure no database corruption
|
|
occurs.</p>
|
|
|
|
<p>The TH3 test harness needs to run on embedded systems that do not
|
|
necessarily have the ability to spawn child processes, so it uses
|
|
an in-memory <a href="vfs.html">VFS</a> to simulate crashes. The in-memory <a href="vfs.html">VFS</a> can be rigged
|
|
to make a snapshot of the entire filesystem after a set number of I/O
|
|
operations. Crash tests run in a loop. On each iteration of the loop,
|
|
the point at which a snapshot is made is advanced until the SQLite
|
|
operations being tested run to completion without ever hitting a
|
|
snapshot. Within the loop, after the SQLite operation under test has
|
|
completed, the filesystem is reverted to the snapshot and random file
|
|
damage is introduced that is characteristic of the kinds of damage
|
|
one expects to see following a power loss. Then the database is opened
|
|
and checks are made to ensure that it is well-formed and that the
|
|
transaction either ran to completion or was completely rolled back.
|
|
The interior of the loop is repeated multiple times for each
|
|
snapshot with different random damage each time.</p>
|
|
|
|
<a name="multifail"></a>
|
|
|
|
<h2 id="compound_failure_tests"><span>3.4. </span>Compound failure tests</h2>
|
|
|
|
<p>The test suites for SQLite also explore the result of stacking
|
|
multiple failures. For example, tests are run to ensure correct behavior
|
|
when an I/O error or OOM fault occurs while trying to recover from a
|
|
prior crash.
|
|
|
|
<a name="fuzztesting"></a>
|
|
|
|
</p><h1 id="fuzz_testing"><span>4. </span>Fuzz Testing</h1>
|
|
|
|
<p><a href="http://en.wikipedia.org/wiki/Fuzz_testing">Fuzz testing</a>
|
|
seeks to establish that SQLite responds correctly to invalid, out-of-range,
|
|
or malformed inputs.</p>
|
|
|
|
<h2 id="sql_fuzz"><span>4.1. </span>SQL Fuzz</h2>
|
|
|
|
<p>SQL fuzz testing consists of creating syntactically correct yet
|
|
wildly nonsensical SQL statements and feeding them to SQLite to see
|
|
what it will do with them. Usually some kind of error is returned
|
|
(such as "no such table"). Sometimes, purely by chance, the SQL
|
|
statement also happens to be semantically correct. In that case, the
|
|
resulting prepared statement is run to make sure it gives a reasonable
|
|
result.</p>
|
|
|
|
<a name="aflfuzz"></a>
|
|
|
|
<h3 id="sql_fuzz_using_the_american_fuzzy_lop_fuzzer"><span>4.1.1. </span>SQL Fuzz Using The American Fuzzy Lop Fuzzer</h3>
|
|
|
|
<p>The concept of fuzz testing has been around for decades, but fuzz
|
|
testing was not an effective way to find bugs until 2014 when
|
|
Michal Zalewski invented the first practical profile-guided fuzzer,
|
|
<a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a> or "AFL".
|
|
Unlike prior fuzzers that blindly generate random inputs, AFL
|
|
instruments the program being tested (by modifying the assembly-language
|
|
output from the C compiler) and uses that instrumentation to detect when
|
|
an input causes the program to do something different - to follow
|
|
a new control path or loop a different number of times. Inputs that provoke
|
|
new behavior are retained and further mutated. In this way, AFL is able
|
|
to "discover" new behaviors of the program under test, including behaviors
|
|
that were never envisioned by the designers.
|
|
|
|
</p><p>AFL proved adept at finding arcane bugs in SQLite.
|
|
Most of the findings have been assert() statements where the conditional
|
|
was false under obscure circumstances. But AFL has also found
|
|
a fair number of crash bugs in SQLite, and even a few cases where SQLite
|
|
computed incorrect results.
|
|
|
|
</p><p>Because of its past success, AFL became a standard part of the testing
|
|
strategy for SQLite beginning with <a href="releaselog/3_8_10.html">version 3.8.10</a> (2015-05-07) until
|
|
it was superseded by better fuzzers in <a href="releaselog/3_29_0.html">version 3.29.0</a> (2019-07-10).
|
|
|
|
<a name="ossfuzz"></a>
|
|
|
|
</p><h3 id="google_oss_fuzz"><span>4.1.2. </span>Google OSS Fuzz</h3>
|
|
|
|
<p>Beginning in 2016, a team of engineers at Google started the
|
|
<a href="https://github.com/google/oss-fuzz">OSS Fuzz</a> project.
|
|
OSS Fuzz uses a AFL-style guided fuzzer running on Google's infrastructure.
|
|
The Fuzzer automatically downloads the latest check-ins for participating
|
|
projects, fuzzes them, and sends email to the developers reporting any
|
|
problems. When a fix is checked in, the fuzzer automatically detects this
|
|
and emails a confirmation to the developers.
|
|
|
|
</p><p>SQLite is one of many open-source projects that OSS Fuzz tests. The
|
|
<a href="https://www.sqlite.org/src/file/test/ossfuzz.c">test/ossfuzz.c</a> source file
|
|
in the SQLite repository is SQLite's interface to OSS fuzz.
|
|
|
|
</p><p>OSS Fuzz no longer finds historical bugs in SQLite. But it is still
|
|
running and does occasionally find issues in new development check-ins.
|
|
Examples:
|
|
<a href="https://www.sqlite.org/src/timeline?y=ci&c=c422afb507dc8757">[1]</a>
|
|
<a href="https://www.sqlite.org/src/timeline?y=ci&c=0a2eb949f8a759e5">[2]</a>
|
|
<a href="https://www.sqlite.org/src/timeline?y=ci&c=62f2235adf796c72">[3]</a>.
|
|
|
|
<a name="dbsqlfuzz"></a>
|
|
|
|
</p><h3 id="the_dbsqlfuzz_fuzzer"><span>4.1.3. </span>The dbsqlfuzz fuzzer</h3>
|
|
|
|
<p>Beginning in late 2018, SQLite has been fuzzed using a new proprietary
|
|
fuzzer called "dbsqlfuzz". Dbsqlfuzz is built using the
|
|
<a href="http://llvm.org/docs/LibFuzzer.html">libFuzzer</a> framework of LLVM.
|
|
|
|
</p><p>The dbsqlfuzz fuzzer mutates both the SQL input and the database file
|
|
at the same time. Dbsqlfuzz uses a custom
|
|
<a href="https://github.com/google/fuzzer-test-suite/blob/master/tutorial/structure-aware-fuzzing.md">Structure-Aware Mutator</a>
|
|
on a specialized input file that defines both an input database and SQL
|
|
text to be run against that database. Because it mutates both the input
|
|
database and the input SQL at the same time, dbsqlfuzz has been able to
|
|
find some obscure faults in SQLite that were missed by prior fuzzers that
|
|
mutated only SQL inputs or only the database file.
|
|
The SQLite developers usually leave an instance or two of dbsqlfuzz
|
|
running on the latest trunk code of SQLite whenever they are away from
|
|
the office for an extended period, such as overnight.</p>
|
|
|
|
<p>The dbsqlfuzz fuzzer has been so successful at hardening the
|
|
SQLite code base against malicious attack that it is now considered
|
|
one of the four primary test harnesses for SQLite.
|
|
|
|
<a name="3pfuzz"></a>
|
|
|
|
</p><h3 id="other_third_party_fuzzers"><span>4.1.4. </span>Other third-party fuzzers</h3>
|
|
|
|
<p>SQLite seems to be a popular target for third-parties to fuzz.
|
|
The developers hear about many attempts to fuzz SQLite
|
|
and they do occasionally get bug reports found by independent
|
|
fuzzers. All such reports are promptly fixed, so the product is
|
|
improved and that the entire SQLite user community benefits.
|
|
This mechanism of having many independent testers is similar to
|
|
<a href="https://en.wikipedia.org/wiki/Linus%27s_law">Linus's law</a>:
|
|
"given enough eyeballs, all bugs are shallow".
|
|
|
|
</p><p>One fuzzing researcher of particular note is
|
|
<a href="https://www.manuelrigger.at/">Manuel Rigger</a>, currently
|
|
(as this paragraph is written on 2019-12-21)
|
|
at <a href="https://ethz.ch/en.html">ETH Zurich</a>.
|
|
Most fuzzers only look for assertion faults, crashes, undefined behavior (UB),
|
|
or other easily detected anomalies. Dr. Rigger's fuzzers, on the other hand,
|
|
are able to find cases where SQLite computes an incorrect answer.
|
|
Rigger has found
|
|
<a href="https://www.sqlite.org/src/timeline?y=t&u=mrigger&n=all">many such cases</a>.
|
|
Most of these finds are fairly obscure corner cases involving type
|
|
conversions and affinity transformations, and a good number of the finds
|
|
are against unreleased features. Nevertheless, his finds are still important
|
|
as they are real bugs,
|
|
and the SQLite developers are grateful to be able to identify and fix
|
|
the underlying problems. Rigger's work is currently unpublished. When it
|
|
is released, it could be as influential as Zalewski's invention of AFL
|
|
and profile-guided fuzzing.
|
|
|
|
<a name="fuzzcheck"></a>
|
|
|
|
</p><h3 id="the_fuzzcheck_test_harness"><span>4.1.5. </span>The fuzzcheck test harness</h3>
|
|
|
|
<p>Historical test cases from <a href="testing.html#aflfuzz">AFL</a>, <a href="testing.html#ossfuzz">OSS Fuzz</a>, and <a href="testing.html#dbsqlfuzz">dbsqlfuzz</a> are
|
|
collected in a set of database files in the main SQLite source tree
|
|
and then rerun by the "fuzzcheck" utility program whenever one runs
|
|
"make test". Fuzzcheck only runs a few thousand "interesting" cases
|
|
out of the hundreds of millions of cases that the various fuzzers have
|
|
examined over the years. "Interesting" cases are cases that exhibit
|
|
previously unseen behavior. Actual bugs found by fuzzers are always
|
|
included among the interesting test cases, but most of the cases run
|
|
by fuzzcheck were never actual bugs.
|
|
|
|
<a name="tension"></a>
|
|
|
|
</p><h3 id="tension_between_fuzz_testing_and_100_mc_dc_testing"><span>4.1.6. </span>Tension Between Fuzz Testing And 100% MC/DC Testing</h3>
|
|
|
|
<p>Fuzz testing and <a href="testing.html#mcdc">100% MC/DC testing</a> are in tension with
|
|
one another.
|
|
That is to say, code tested to 100% MC/DC will tend to be
|
|
more vulnerable to problems found by fuzzing and code that performs
|
|
well during fuzz testing will tend to have (much) less than
|
|
100% MC/DC.
|
|
This is because MC/DC testing discourages <a href="testing.html#defcode">defensive code</a> with
|
|
unreachable branches, but without defensive code, a fuzzer is
|
|
more likely to find a path that causes problems. MC/DC testing
|
|
seems to work well for building code that is robust during typical
|
|
normal use, whereas fuzz testing is good for building code that is
|
|
robust against malicious attack.
|
|
|
|
</p><p>Of course, users would prefer code that is both robust in normal
|
|
use and resistant to malicious attack. The SQLite developers are
|
|
dedicated to providing that. The purpose of this section is merely
|
|
to point out that doing both at the same time is hard.
|
|
|
|
</p><p>For much of its history SQLite has been focused on 100% MC/DC testing.
|
|
Resistance to fuzzing attacks only became a concern with the introduction
|
|
of AFL in 2014. For a while there, fuzzers were finding many problems
|
|
in SQLite. In more recent years, the testing strategy of SQLite has
|
|
evolved to place more emphasis on fuzz testing. We still maintain
|
|
100% MC/DC of the core SQLite code, but most testing CPU cycles are
|
|
now devoted to fuzzing.
|
|
|
|
</p><p>While fuzz testing and 100% MC/DC testing are in tension, they
|
|
are not completely at cross-purposes. The fact that the SQlite test
|
|
suite does test to 100% MC/DC means that when fuzzers do find problems,
|
|
those problems can be fixed quickly and with little risk of introducing
|
|
new errors.
|
|
|
|
</p><h2 id="malformed_database_files"><span>4.2. </span>Malformed Database Files</h2>
|
|
|
|
<p>There are numerous test cases that verify that SQLite is able to
|
|
deal with malformed database files.
|
|
These tests first build a well-formed database file, then add
|
|
corruption by changing one or more bytes in the file by some means
|
|
other than SQLite. Then SQLite is used to read the database.
|
|
In some cases, the bytes changes are in the middle of data.
|
|
This causes the content of the database to change while keeping the
|
|
database well-formed.
|
|
In other cases, unused bytes of the file are modified, which has
|
|
no effect on the integrity of the database.
|
|
The interesting cases are when bytes of the file that
|
|
define database structure get changed. The malformed database tests
|
|
verify that SQLite finds the file format errors and reports them
|
|
using the <a href="rescode.html#corrupt">SQLITE_CORRUPT</a> return code without overflowing
|
|
buffers, dereferencing NULL pointers, or performing other
|
|
unwholesome actions.</p>
|
|
|
|
<p>The <a href="testing.html#dbsqlfuzz">dbsqlfuzz</a> fuzzer also does an excellent job of verifying
|
|
that SQLite responds sanely to malformed database files.</p>
|
|
|
|
<h2 id="boundary_value_tests"><span>4.3. </span>Boundary Value Tests</h2>
|
|
|
|
<p>SQLite defines certain <a href="limits.html">limits</a> on its operation, such as the
|
|
maximum number of columns in a table, the maximum length of an
|
|
SQL statement, or the maximum value of an integer. The TCL and TH3 test
|
|
suites both contains numerous tests that push SQLite right to the edge
|
|
of its defined limits and verify that it performs correctly for
|
|
all allowed values. Additional tests go beyond the defined limits
|
|
and verify that SQLite correctly returns errors. The source code
|
|
contains <a href="testing.html#testcase">testcase macros</a> to verify that both sides of each boundary
|
|
have been tested.</p>
|
|
|
|
<a name="regressiontesting"></a>
|
|
|
|
<h1 id="regression_testing"><span>5. </span>Regression Testing</h1>
|
|
|
|
<p>Whenever a bug is reported against SQLite, that bug is not considered
|
|
fixed until new test cases that would exhibit the bug have been added
|
|
to either the TCL or TH3 test suites.
|
|
Over the years,
|
|
this has resulted in thousands and thousands of new tests.
|
|
These regression tests ensure that bugs that have
|
|
been fixed in the past are not reintroduced into future versions of
|
|
SQLite.</p>
|
|
|
|
<a name="leakcheck"></a>
|
|
|
|
<h1 id="automatic_resource_leak_detection"><span>6. </span>Automatic Resource Leak Detection</h1>
|
|
|
|
<p>Resource leak occurs when system resources
|
|
are allocated and never freed. The most troublesome resource leaks
|
|
in many applications are memory leaks - when memory is allocated using
|
|
malloc() but never released using free(). But other kinds of resources
|
|
can also be leaked: file descriptors, threads, mutexes, etc.</p>
|
|
|
|
<p>Both the TCL and TH3 test harnesses automatically track system
|
|
resources and report resource leaks on <u>every</u> test run.
|
|
No special configuration or setup is required. The test harnesses
|
|
are especially vigilant with regard to memory leaks. If a change
|
|
causes a memory leak, the test harnesses will recognize this
|
|
quickly. SQLite is designed to never leak memory, even after
|
|
an exception such as an OOM error or disk I/O error. The test
|
|
harnesses are zealous to enforce this.</p>
|
|
|
|
<a name="coverage"></a>
|
|
|
|
<h1 id="test_coverage"><span>7. </span>Test Coverage</h1>
|
|
|
|
<p>The SQLite core, including the unix <a href="vfs.html">VFS</a>,
|
|
has 100% branch test coverage under <a href="th3.html">TH3</a> in
|
|
its default configuration as measured by
|
|
<a href="http://gcc.gnu.org/onlinedocs/gcc/Gcov.html">gcov</a>.
|
|
Extensions such as FTS3 and RTree are excluded from this
|
|
analysis.</p>
|
|
|
|
<a name="stmtvbr"></a>
|
|
|
|
<h2 id="statement_versus_branch_coverage"><span>7.1. </span>Statement versus branch coverage</h2>
|
|
|
|
<p>There are many ways to measure test coverage. The most popular
|
|
metric is "statement coverage". When you hear someone say that their
|
|
program as "XX% test coverage" without further explanation, they usually
|
|
mean statement coverage. Statement coverage measures what percentage
|
|
of lines of code are executed at least once by the test suite.</p>
|
|
|
|
<p>Branch coverage is more rigorous than statement coverage. Branch
|
|
coverage measures the number of machine-code branch instructions that
|
|
are evaluated at least once on both directions.</p>
|
|
|
|
<p>To illustrate the difference between statement coverage and
|
|
branch coverage, consider the following hypothetical
|
|
line of C code:</p>
|
|
|
|
<div class="codeblock"><pre>if( a>b && c!=25 ){ d++; }
|
|
</pre></div>
|
|
|
|
<p>Such a line of C code might generate a dozen separate machine code
|
|
instructions. If any one of those instructions is ever evaluated, then
|
|
we say that the statement has been tested. So, for example, it might
|
|
be the case that the conditional expression is
|
|
always false and the "d" variable is
|
|
never incremented. Even so, statement coverage counts this line of
|
|
code as having been tested.</p>
|
|
|
|
<p>Branch coverage is more strict. With branch coverage, each test and
|
|
each subblock within the statement is considered separately. In order
|
|
to achieve 100% branch coverage in the example above, there must be at
|
|
least three test cases:</p>
|
|
|
|
<p></p><ul>
|
|
<li> a<=b
|
|
</li><li> a>b && c==25
|
|
</li><li> a>b && c!=25
|
|
</li></ul>
|
|
|
|
<p>Any one of the above test cases would provide 100% statement coverage
|
|
but all three are required for 100% branch coverage. Generally speaking,
|
|
100% branch coverage implies 100% statement coverage, but the converse is
|
|
not true. To reemphasize, the
|
|
<a href="th3.html">TH3</a> test harness for SQLite provides the stronger form of
|
|
test coverage - 100% branch test coverage.</p>
|
|
|
|
<a name="defcode"></a>
|
|
|
|
<h2 id="coverage_testing_of_defensive_code"><span>7.2. </span>Coverage testing of defensive code</h2>
|
|
|
|
<p>A well-written C program will typically contain some defensive
|
|
conditionals which in practice are always true or always false.
|
|
This leads to a
|
|
programming dilemma: Does one remove defensive code in order to obtain
|
|
100% branch coverage?</p>
|
|
|
|
<p>In SQLite, the answer to the previous question is "no".
|
|
For testing purposes, the SQLite source code defines
|
|
macros called ALWAYS() and NEVER(). The ALWAYS() macro
|
|
surrounds conditions
|
|
which are expected to always evaluate as true and NEVER() surrounds
|
|
conditions that are always evaluated to false. These macros serve as
|
|
comments to indicate that the conditions are defensive code.
|
|
In release builds, these macros are pass-throughs:</p>
|
|
|
|
<div class="codeblock"><pre>#define ALWAYS(X) (X)
|
|
#define NEVER(X) (X)
|
|
</pre></div>
|
|
|
|
<p>During most testing, however, these macros will throw an assertion
|
|
fault if their argument does not have the expected truth value. This
|
|
alerts the developers quickly to incorrect design assumptions.
|
|
|
|
</p><div class="codeblock"><pre>#define ALWAYS(X) ((X)?1:assert(0),0)
|
|
#define NEVER(X) ((X)?assert(0),1:0)
|
|
</pre></div>
|
|
|
|
<p>When measuring test coverage, these macros are defined to be constant
|
|
truth values so that they do not generate assembly language branch
|
|
instructions, and hence do not come into play when calculating the
|
|
branch coverage:</p>
|
|
|
|
<div class="codeblock"><pre>#define ALWAYS(X) (1)
|
|
#define NEVER(X) (0)
|
|
</pre></div>
|
|
|
|
<p>The test suite is designed to be run three times, once for each of
|
|
the ALWAYS() and NEVER() definitions shown above. All three test runs
|
|
should yield exactly the same result. There is a run-time test using
|
|
the <a href="c3ref/test_control.html">sqlite3_test_control</a>(<a href="c3ref/c_testctrl_always.html">SQLITE_TESTCTRL_ALWAYS</a>, ...) interface that
|
|
can be used to verify that the macros are correctly set to the first
|
|
form (the pass-through form) for deployment.</p>
|
|
|
|
<a name="testcase"></a>
|
|
|
|
<h2 id="forcing_coverage_of_boundary_values_and_boolean_vector_tests"><span>7.3. </span>Forcing coverage of boundary values and boolean vector tests</h2>
|
|
|
|
<p>Another macro used in conjunction with test coverage measurement is
|
|
the <tt>testcase()</tt> macro. The argument is a condition for which
|
|
we want test cases that evaluate to both true and false.
|
|
In non-coverage builds (that is to say, in release builds) the
|
|
<tt>testcase()</tt> macro is a no-op:</p>
|
|
|
|
<div class="codeblock"><pre>#define testcase(X)
|
|
</pre></div>
|
|
|
|
<p>But in a coverage measuring build, the <tt>testcase()</tt> macro
|
|
generates code that evaluates the conditional expression in its argument.
|
|
Then during analysis, a check
|
|
is made to ensure tests exist that evaluate the conditional to both true
|
|
and false. <tt>Testcase()</tt> macros are used, for example, to help verify
|
|
that boundary values are tested. For example:</p>
|
|
|
|
<div class="codeblock"><pre>testcase( a==b );
|
|
testcase( a==b+1 );
|
|
if( a>b && c!=25 ){ d++; }
|
|
</pre></div>
|
|
|
|
<p>Testcase macros are also used when two or more cases of a switch
|
|
statement go to the same block of code, to make sure that the code was
|
|
reached for all cases:</p>
|
|
|
|
<div class="codeblock"><pre>switch( op ){
|
|
case OP_Add:
|
|
case OP_Subtract: {
|
|
testcase( op==OP_Add );
|
|
testcase( op==OP_Subtract );
|
|
/* ... */
|
|
break;
|
|
}
|
|
/* ... */
|
|
}
|
|
</pre></div>
|
|
|
|
<p>For bitmask tests, <tt>testcase()</tt> macros are used to verify that every
|
|
bit of the bitmask affects the outcome. For example, in the following block
|
|
of code, the condition is true if the mask contains either of two bits
|
|
indicating either a MAIN_DB or a TEMP_DB is being opened.
|
|
The <tt>testcase()</tt>
|
|
macros that precede the if statement verify that both cases are tested:</p>
|
|
|
|
<div class="codeblock"><pre>testcase( mask & SQLITE_OPEN_MAIN_DB );
|
|
testcase( mask & SQLITE_OPEN_TEMP_DB );
|
|
if( (mask & (SQLITE_OPEN_MAIN_DB|SQLITE_OPEN_TEMP_DB))!=0 ){ ... }
|
|
</pre></div>
|
|
|
|
<p>The SQLite source code contains 1084
|
|
uses of the <tt>testcase()</tt> macro.</p>
|
|
|
|
<a name="mcdc"></a>
|
|
|
|
<h2 id="branch_coverage_versus_mc_dc"><span>7.4. </span>Branch coverage versus MC/DC</h2>
|
|
|
|
<p>Two methods of measuring test coverage were described above:
|
|
"statement" and "branch" coverage. There are many other test coverage
|
|
metrics besides these two. Another popular metric is "Modified
|
|
Condition/Decision Coverage" or MC/DC.
|
|
<a href="http://en.wikipedia.org/wiki/Modified_Condition/Decision_Coverage">Wikipedia</a>
|
|
defines MC/DC as follows:</p>
|
|
|
|
<ul>
|
|
<li> Each decision tries every possible outcome.
|
|
</li><li> Each condition in a decision takes on every possible outcome.
|
|
</li><li> Each entry and exit point is invoked.
|
|
</li><li> Each condition in a decision is shown to independently
|
|
affect the outcome of the decision.
|
|
</li></ul>
|
|
|
|
<p>In the C programming language
|
|
where <b><tt>&&</tt></b> and <b><tt>||</tt></b>
|
|
are "short-circuit" operators, MC/DC and branch coverage are very nearly
|
|
the same thing. The primary difference is in boolean vector tests.
|
|
One can test for any of several bits in bit-vector and still obtain
|
|
100% branch test coverage even though the second element of MC/DC - the
|
|
requirement that each condition in a decision take on every possible outcome -
|
|
might not be satisfied.</p>
|
|
|
|
<p>SQLite uses <tt>testcase()</tt> macros as described in the previous
|
|
subsection to make sure that every condition in a bit-vector decision takes
|
|
on every possible outcome. In this way, SQLite also achieves 100% MC/DC
|
|
in addition to 100% branch coverage.</p>
|
|
|
|
<h2 id="measuring_branch_coverage"><span>7.5. </span>Measuring branch coverage</h2>
|
|
|
|
<p>Branch coverage in SQLite is currently measured
|
|
using <a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">gcov</a> with the "-b"
|
|
option. First the test program is compiled using options
|
|
"-g -fprofile-arcs -ftest-coverage" and then the test program is run.
|
|
Then "gcov -b" is run to generate a coverage report.
|
|
The coverage report is verbose and inconvenient to read,
|
|
so the gcov-generated report is processed using
|
|
some simple scripts to put it into a more human-friendly format.
|
|
This entire process is automated using scripts, of course.
|
|
|
|
</p><p>Note that running SQLite with gcov is not a test of SQLite —
|
|
it is a test of the test suite. The gcov run does not test SQLite because
|
|
the -fprofile-args and -ftest-coverage options cause the compiler to
|
|
generate different code.
|
|
The gcov run merely verifies that the test suite provides 100% branch test
|
|
coverage. The gcov run is a test of the test - a meta-test.
|
|
|
|
</p><p>After gcov has been run to verify 100% branch test coverage,
|
|
then the test program is recompiled using delivery compiler options
|
|
(without the special -fprofile-arcs and -ftest-coverage options)
|
|
and the test program is rerun.
|
|
This second run is the actual test of SQLite.
|
|
|
|
</p><p>It is important to verify that the gcov test run
|
|
and the second real test run both give the same output. Any
|
|
differences in output indicate either the use of undefined or
|
|
indeterminate behavior in the SQLite code (and hence a bug),
|
|
or a bug in the compiler.
|
|
Note that SQLite has, over the previous decade, encountered bugs
|
|
in each of GCC, Clang, and MSVC. Compiler bugs, while rare, do happen,
|
|
which is why it is so important to test the code in an as-delivered
|
|
configuration.
|
|
|
|
<a name="mutationtests"></a>
|
|
|
|
</p><h2 id="mutation_testing"><span>7.6. </span>Mutation testing</h2>
|
|
|
|
<p>Using gcov (or similar) to show that every branch instruction is taken
|
|
at least once in both directions is good measure of test suite quality.
|
|
But even better is showing that every branch instruction makes
|
|
a difference in the output. In other words, we want to show
|
|
not only that every branch instruction both jumps and falls through but also
|
|
that every branch is doing useful work and that the test suite is able
|
|
to detect and verify that work. When a branch is found that does not
|
|
make a difference in the output, that suggests that the code associated
|
|
the branch can be removed (reducing the size of the library and perhaps
|
|
making it run faster) or that the test suite is inadequately testing the
|
|
feature that the branch implements.
|
|
|
|
</p><p>SQLite strives to verify that every branch instruction makes a difference
|
|
using <a href="https://en.wikipedia.org/wiki/Mutation_testing">mutation testing</a>.
|
|
<a href="th3.html#muttest">A script</a>
|
|
first compiles the SQLite source code into assembly language
|
|
(using, for example, the -S option to gcc). Then the script steps through
|
|
the generated assembly language and, one by one, changes each branch
|
|
instruction into either an unconditional jump or a no-op, compiles the
|
|
result, and verifies that the test suite catches the mutation.
|
|
|
|
</p><p>
|
|
Unfortunately, SQLite contains many branch instructions that
|
|
help the code run faster without changing the output.
|
|
Such branches generate false-positives during mutation testing.
|
|
As an example, consider the following
|
|
<a href="https://www.sqlite.org/src/artifact/55b5fb474?ln=55-62">hash function</a>
|
|
used to accelerate table-name lookup:
|
|
|
|
</p><div class="codeblock"><pre>55 static unsigned int strHash(const char *z){
|
|
56 unsigned int h = 0;
|
|
57 unsigned char c;
|
|
58 while( (c = (unsigned char)*z++)!=0 ){ /*OPTIMIZATION-IF-TRUE*/
|
|
59 h = (h<<3) ^ h ^ sqlite3UpperToLower[c];
|
|
60 }
|
|
61 return h;
|
|
62 }
|
|
</pre></div>
|
|
|
|
<p>
|
|
If the branch instruction that implements the "c!=0" test on line 58
|
|
is changed into a no-op, then the while-loop will loop forever and the
|
|
test suite will fail with a time-out. But if that branch is changed
|
|
into an unconditional jump, then the hash function will always return 0.
|
|
The problem is that 0 is a valid hash. A hash function that always
|
|
returns 0 still works in the sense that SQLite still always gets the correct
|
|
answer. The table-name hash table degenerates into a linked-list
|
|
and so the table-name lookups that occur while parsing SQL statements
|
|
might be a little slower, but the end result will be the same.
|
|
|
|
</p><p>
|
|
To work around this problem, comments of the form
|
|
"<code>/*OPTIMIZATION-IF-TRUE*/</code>" and
|
|
"<code>/*OPTIMIZATION-IF-FALSE*/</code>" are inserted into the SQLite
|
|
source code to tell the mutation testing script to ignore some branch
|
|
instructions.
|
|
|
|
<a name="thoughts1"></a>
|
|
|
|
</p><h2 id="experience_with_full_test_coverage"><span>7.7. </span>Experience with full test coverage</h2>
|
|
|
|
<p>The developers of SQLite have found that full coverage testing is an
|
|
extremely effective method for locating and preventing bugs.
|
|
Because every single branch
|
|
instruction in SQLite core code is covered by test cases, the developers
|
|
can be confident that changes made in one part of the code
|
|
do not have unintended consequences in other parts of the code.
|
|
The many new features and performance improvements that have been
|
|
added to SQLite in recent years would not have been possible without
|
|
the availability full-coverage testing.</p>
|
|
|
|
<p>Maintaining 100% MC/DC is laborious and time-consuming.
|
|
The level of effort needed to maintain full-coverage testing
|
|
is probably not cost effective for a typical application.
|
|
However, we think that full-coverage testing is justified for a
|
|
<a href="mostdeployed.html">very widely deployed</a> infrastructure library
|
|
like SQLite, and especially for a database library which by its very
|
|
nature "remembers" past mistakes.
|
|
|
|
|
|
<a name="dynamicanalysis"></a>
|
|
|
|
</p><h1 id="dynamic_analysis"><span>8. </span>Dynamic Analysis</h1>
|
|
|
|
<p>Dynamic analysis refers to internal and external checks on the
|
|
SQLite code which are performed while the code is live and running.
|
|
Dynamic analysis has proven to be a great help in maintaining the
|
|
quality of SQLite.</p>
|
|
|
|
<a name="asserts"></a>
|
|
|
|
<h2 id="assert"><span>8.1. </span>Assert</h2>
|
|
|
|
<p>The SQLite core contains 5855 <tt>assert()</tt>
|
|
statements that verify function preconditions and postconditions and
|
|
loop invariants. Assert() is a macro which is a standard part of
|
|
ANSI-C. The argument is a boolean value that is assumed to always be
|
|
true. If the assertion is false, the program prints an error message
|
|
and halts.</p>
|
|
|
|
<p>Assert() macros are disabled by compiling with the NDEBUG macro defined.
|
|
In most systems, asserts are enabled by default. But in SQLite, the
|
|
asserts are so numerous and are in such performance critical places, that
|
|
the database engine runs about three times slower when asserts are enabled.
|
|
Hence, the default (production) build of SQLite disables asserts.
|
|
Assert statements are only enabled when SQLite is compiled with the
|
|
SQLITE_DEBUG preprocessor macro defined.</p>
|
|
|
|
<p>See the <a href="assert.html">Use Of assert in SQLite</a> document
|
|
for additional information about how SQLite uses assert().</p>
|
|
|
|
<a name="valgrind"></a>
|
|
|
|
<h2 id="valgrind"><span>8.2. </span>Valgrind</h2>
|
|
|
|
<p><a href="http://valgrind.org/">Valgrind</a> is perhaps the most amazing
|
|
and useful developer tool in the world. Valgrind is a simulator - it simulates
|
|
an x86 running a Linux binary. (Ports of Valgrind for platforms other
|
|
than Linux are in development, but as of this writing, Valgrind only
|
|
works reliably on Linux, which in the opinion of the SQLite developers
|
|
means that Linux should be the preferred platform for all software development.)
|
|
As Valgrind runs a Linux binary, it looks for all kinds of interesting
|
|
errors such as array overruns, reading from uninitialized memory,
|
|
stack overflows, memory leaks, and so forth. Valgrind finds problems
|
|
that can easily slip through all of the other tests run against SQLite.
|
|
And, when Valgrind does find an error, it can dump the developer directly
|
|
into a symbolic debugger at the exact point where the error occur, to
|
|
facilitate a quick fix.</p>
|
|
|
|
<p>Because it is a simulator, running a binary in Valgrind is slower than
|
|
running it on native hardware. (To a first approximation, an application
|
|
running in Valgrind on a workstation will perform about the same as it
|
|
would running natively on a smartphone.) So it is impractical to run the full
|
|
SQLite test suite through Valgrind. However, the veryquick tests and
|
|
the coverage of the TH3 tests are run through Valgrind prior to every
|
|
release.</p>
|
|
|
|
<a name="memtesting"></a>
|
|
|
|
<h2 id="memsys2"><span>8.3. </span>Memsys2</h2>
|
|
|
|
<p>SQLite contains a pluggable
|
|
<a href="malloc.html">memory allocation subsystem</a>.
|
|
The default implementation uses system malloc() and free().
|
|
However, if SQLite is compiled with <a href="compile.html#memdebug">SQLITE_MEMDEBUG</a>, an alternative
|
|
memory allocation wrapper (<a href="malloc.html#memdebug">memsys2</a>)
|
|
is inserted that looks for memory allocation
|
|
errors at run-time. The memsys2 wrapper checks for memory leaks, of
|
|
course, but also looks for buffer overruns, uses of uninitialized memory,
|
|
and attempts to use memory after it has been freed. These same checks
|
|
are also done by valgrind (and, indeed, Valgrind does them better)
|
|
but memsys2 has the advantage of being much faster than Valgrind, which
|
|
means the checks can be done more often and for longer tests.</p>
|
|
|
|
<a name="mutextesting"></a>
|
|
|
|
<h2 id="mutex_asserts"><span>8.4. </span>Mutex Asserts</h2>
|
|
|
|
<p>SQLite contains a pluggable mutex subsystem. Depending on
|
|
compile-time options, the default mutex system contains interfaces
|
|
<a href="c3ref/mutex_held.html">sqlite3_mutex_held()</a> and <a href="c3ref/mutex_held.html">sqlite3_mutex_notheld()</a> that detect
|
|
whether or not a particular mutex is held by the calling thread.
|
|
These two interfaces are used extensively within assert() statements
|
|
in SQLite to verify mutexes are held and released at all the right
|
|
moments, in order to double-check that SQLite does work correctly
|
|
in multi-threaded applications.</p>
|
|
|
|
<a name="journaltest"></a>
|
|
|
|
<h2 id="journal_tests"><span>8.5. </span>Journal Tests</h2>
|
|
|
|
<p>One of the things that SQLite does to ensure that transactions
|
|
are atomic across system crashes and power failures is to write
|
|
all changes into the rollback journal file prior to changing the
|
|
database. The TCL test harness contains an alternative
|
|
<a href="vfs.html">OS backend</a> implementation that helps to
|
|
verify this is occurring correctly. The "journal-test VFS" monitors
|
|
all disk I/O traffic between the database file and rollback journal,
|
|
checking to make sure that nothing is written into the database
|
|
file which has not first been written and synced to the rollback journal.
|
|
If any discrepancies are found, an assertion fault is raised.</p>
|
|
|
|
<p>The journal tests are an additional double-check over and above
|
|
the crash tests to make sure that SQLite transactions will be atomic
|
|
across system crashes and power failures.</p>
|
|
|
|
<a name="intoverflow"></a>
|
|
|
|
<h2 id="undefined_behavior_checks"><span>8.6. </span>Undefined Behavior Checks</h2>
|
|
|
|
<p>In the C programming language, it is very easy to write code that
|
|
has "undefined" or "implementation defined" behavior.
|
|
That means that the code might work during development, but then give
|
|
a different answer on a different system, or when recompiled using different
|
|
compiler options.
|
|
Examples of undefined and implementation-defined behavior in
|
|
ANSI C include:
|
|
</p><ul>
|
|
<li>Signed integer overflow. (Signed integer overflow does <u>not</u>
|
|
necessarily wrap around, as most people expect.)
|
|
</li><li>Shifting an N-bit integer by more than N bits.
|
|
</li><li>Shifting by a negative amount.
|
|
</li><li>Shifting a negative number.
|
|
</li><li>Using the memcpy() function on overlapping buffers.
|
|
</li><li>The order of evaluation of function arguments.
|
|
</li><li>Whether or not "char" variables are signed or unsigned.
|
|
</li><li>And so forth....
|
|
</li></ul>
|
|
|
|
<p>Since undefined and implementation-defined behavior is non-portable
|
|
and can easily lead to incorrect answers, SQLite works very hard to avoid it.
|
|
For example,
|
|
when adding two integer column values together as part of an SQL statement,
|
|
SQLite does not simply add them together using the C-language "+" operator.
|
|
Instead, it first checks to make sure the
|
|
addition will not overflow, and if it will, it does the addition using
|
|
floating point instead.
|
|
|
|
</p><p>To help ensure that SQLite does not make use of undefined or
|
|
implementation defined behavior, the test suites are rerun using
|
|
instrumented builds that try to detect undefined behavior. For example,
|
|
test suites are run using the "-ftrapv" option of GCC. And they
|
|
are run again using the "-fsanitize=undefined" option on Clang. And
|
|
again using the "/RTC1" option in MSVC. Then the test suites are rerun
|
|
using options like "-funsigned-char" and "-fsigned-char" to make sure
|
|
that implementation differences do not matter either. Tests are then repeated
|
|
on 32-bit and 64-bit systems and on big-endian and little-endian systems,
|
|
using a variety of CPU architectures.
|
|
Furthermore, the test suites are augmented with many test cases that are
|
|
deliberately designed to provoke undefined behavior. For example:
|
|
"<b>SELECT -1*(-9223372036854775808);</b>".
|
|
|
|
<a name="disopttest"></a>
|
|
|
|
</p><h1 id="disabled_optimization_tests"><span>9. </span>Disabled Optimization Tests</h1>
|
|
|
|
<p>The <a href="c3ref/test_control.html">sqlite3_test_control</a>(<a href="c3ref/c_testctrl_always.html">SQLITE_TESTCTRL_OPTIMIZATIONS</a>, ...) interface
|
|
allows selected SQL statement optimizations to be disabled at run-time.
|
|
SQLite should always generate exactly the same answer with optimizations
|
|
enabled and with optimizations disabled; the answer simply arrives quicker
|
|
with the optimizations turned on. So in a production environment, one always
|
|
leaves the optimizations turned on (the default setting).</p>
|
|
|
|
<p>One verification technique used on SQLite is to run an entire test suite
|
|
twice, once with optimizations left on and a second time with optimizations
|
|
turned off, and verify that the same output is obtained both times. This
|
|
shows that the optimizations do not introduce errors.</p>
|
|
|
|
<p>Not all test cases can be handled this way. Some test cases check
|
|
to verify that the optimizations really are reducing the amount of
|
|
computation by counting the number of disk accesses, sort operations,
|
|
full-scan steps, or other processing steps that occur during queries.
|
|
Those test cases will appear to fail when optimizations are disabled.
|
|
But the majority of test cases simply check that the correct answer
|
|
was obtained, and all of those cases can be run successfully with and
|
|
without the optimizations, in order to show that the optimizations do not
|
|
cause malfunctions.</p>
|
|
|
|
|
|
<a name="cklist"></a>
|
|
|
|
<h1 id="checklists"><span>10. </span>Checklists</h1>
|
|
|
|
<p>The SQLite developers use an on-line checklist to coordinate testing
|
|
activity and to verify that all tests pass prior each SQLite release.
|
|
<a href="http://www.sqlite.org/checklists/index.html">Past checklists</a>
|
|
are retained for historical reference.
|
|
(The checklists are read-only for anonymous internet viewers, but
|
|
developers can log in and update checklist items in their web
|
|
browsers.)
|
|
The use of checklists for SQLite testing and other development activities
|
|
is inspired by <i>
|
|
<a href="http://atulgawande.com/book/the-checklist-manifesto/">The Checklist Manifesto</a>
|
|
</i>.</p>
|
|
|
|
<p>The latest checklists contain approximately 200 items that are
|
|
individually verified for each release. Some checklist items only take
|
|
a few seconds to verify and mark off. Others involve test suites
|
|
that run for many hours.</p>
|
|
|
|
<p>The release checklist is not automated: developers run each item on
|
|
the checklist manually. We find that it is important to keep a human in
|
|
the loop. Sometimes problems are found while running a checklist item
|
|
even though the test itself passed. It is important to have a human
|
|
reviewing the test output at the highest level, and constantly asking
|
|
"Is this really right?"</p>
|
|
|
|
<p>The release checklist is continuously evolving. As new problems or
|
|
potential problems are discovered, new checklist items are added to
|
|
make sure those problems do not appear in subsequent releases. The
|
|
release checklist has proven to be an invaluable tool in helping to
|
|
ensure that nothing is overlooked during the release process.</p>
|
|
|
|
|
|
<a name="staticanalysis"></a>
|
|
|
|
<h1 id="static_analysis"><span>11. </span>Static Analysis</h1>
|
|
|
|
<p>Static analysis means analyzing source code at compile-time to
|
|
check for correctness. Static analysis includes compiler
|
|
warning messages and more in-depth analysis engines such as the
|
|
<a href="http://clang-analyzer.llvm.org/">Clang Static Analyzer</a>.
|
|
SQLite compiles without warnings on GCC and Clang using
|
|
the -Wall and -Wextra flags on Linux and Mac and on MSVC on Windows.
|
|
No valid warnings are generated by the Clang Static Analyzer tool "scan-build"
|
|
either (though recent versions of clang seem to generate many false-positives.)
|
|
Nevertheless, some warnings might be generated by other
|
|
static analyzers. Users are encouraged not to stress over these
|
|
warnings and to instead take solace in the intense testing of SQLite
|
|
described above.
|
|
</p>
|
|
|
|
<p>Static analysis has not been helpful in finding
|
|
bugs in SQLite. Static analysis has found a few bugs in SQLite, but
|
|
those are the exceptions. More bugs have been
|
|
introduced into SQLite while trying to get it to compile without
|
|
warnings than have been found by static analysis.</p>
|
|
|
|
<a name="summary"></a>
|
|
|
|
<h1 id="summary"><span>12. </span>Summary</h1>
|
|
|
|
<p>SQLite is open source. This gives many people the idea that
|
|
it is not well tested as commercial software and is perhaps unreliable.
|
|
But that impression is false.
|
|
SQLite has exhibited very high reliability in the field and
|
|
a very low defect rate, especially considering how rapidly it is evolving.
|
|
The quality of SQLite is achieved in part by careful code design and
|
|
implementation. But extensive testing also plays a vital role in
|
|
maintaining and improving the quality of SQLite. This document has
|
|
summarized the testing procedures that every release of SQLite undergoes
|
|
with the hope of inspiring confidence that SQLite is
|
|
suitable for use in mission-critical applications.</p>
|
|
|