ITS#6906 Update cachesize recommendations

to remove references to indexes in Hash format

Fix whitespace error -- hyc
This commit is contained in:
Tim Mooney 2011-04-12 17:57:57 -05:00 committed by Howard Chu
parent 57ce05ec86
commit 45b41fb41a

View file

@ -234,10 +234,10 @@ will tell you how many internal pages are present in a database. You should
check this number for both dn2id and id2entry.
Also note that {{id2entry}} always uses 16KB per "page", while {{dn2id}} uses whatever
the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing the,
the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing,
your cache must be at least as large as the number of internal pages in both
the {{dn2id}} and {{id2entry}} databases, plus some extra space to accommodate the actual
leaf data pages.
the {{dn2id}} and {{id2entry}} databases, plus some extra space to accommodate
the actual leaf data pages.
For example, in my OpenLDAP 2.4 test database, I have an input LDIF file that's
about 360MB. With the back-hdb backend this creates a {{dn2id.bdb}} that's 68MB,
@ -252,23 +252,17 @@ This doesn't take into account other library overhead, so this is even lower
than the barest minimum. The default cache size, when nothing is configured,
is only 256KB.
This 2.5MB number also doesn't take indexing into account. Each indexed attribute
uses another database file of its own, using a Hash structure.
This 2.5MB number also doesn't take indexing into account. Each indexed
attribute results in another database file. Earlier versions of OpenLDAP
kept these index databases in Hash format, but from OpenLDAP 2.2 onward
the index databases are in B-tree format so the same procedure can
be used to calculate the necessary amount of cache for each index database.
Unlike the B-trees, where you only need to touch one data page to find an entry
of interest, doing an index lookup generally touches multiple keys, and the
point of a hash structure is that the keys are evenly distributed across the
data space. That means there's no convenient compact subset of the database that
you can keep in the cache to insure quick operation, you can pretty much expect
references to be scattered across the whole thing. My strategy here would be to
provide enough cache for at least 50% of all of the hash data.
For example, if your only index is for the objectClass attribute and db_stat
reveals that {{objectClass.bdb}} has 339 internal pages and uses 4096 byte
pages, the additional cache needed for just this attribute index is
> (Number of hash buckets + number of overflow pages + number of duplicate pages) * page size / 2.
The objectClass index for my example database is 5.9MB and uses 3 hash buckets
and 656 duplicate pages. So:
> ( 3 + 656 ) * 4KB / 2 =~ 1.3MB.
> (339+1) * 4KB =~ 1.3MB.
With only this index enabled, I'd figure at least a 4MB cache for this backend.
(Of course you're using a single cache shared among all of the database files,