Memcached backend?

Hello,

If somebody has many caches behind a load balancer, cache coherency can be an issue, for example when there is an "old" record with the max TTL, fetched from a server, whose admin forgot to lower the TTL and just changed the zone, and a "new" cached by another server.
This way the users will be getting inconsistent answers, which is much worse than a consistend "bad" (the old record) answer.

Another thing is that no matter if you have room for caching x records on each machines, they can't be aggregated, two machines won't be able to hold 2*x entries, because they waste memory and cache everything twice (if requested, of course).

So I wonder, would anybody like the idea of having memcached as a(n optional) storage backend for unbound? (and of course take the time to write the code? :slight_smile:

Pros:
- if you have n machines, you can use n times the memory and increase hit rate
- you will get consistent results, no matter what server you asked
- memory can be decoupled from the servers, if needed
- unbound doesn't have to manage the cache (expiring records, limiting the overall size), because memcached does this
- cache management could be done from "outside", and deleting a record would be effective for all servers, you won't have to delete in each of them

Cons:
- additional latency, answers would be served from a local or remote (but not so distant as a remote nameserver) memcached server, not from the memory space of the unbound process
- if (any) memcached server gets "ill" (answers, but slows down), answers from the cache will be slower, and will be slower for all unbound caches
- response times can't be "guaranteed", they can be more hectic and depend or more things (network latency, error rates, other server's load, etc)

But overall, on a fast local network I think it would be a gain and not a loss.

What do you think?

ps: http://www.danga.com/memcached/

So I wonder, would anybody like the idea of having memcached as a(n optional) storage backend for unbound? (and of course take the time to write the code? :slight_smile:

Pros:
- if you have n machines, you can use n times the memory and increase hit rate

Do resolvers these days actually use more then 8GB of RAM? Because a 1000 euro
1U server comes with 8GB and a quadcore cpu.

- you will get consistent results, no matter what server you asked

Mind you that you're just changing the time stamp of the old->new record change.
While you can argue about helping bad administrators getting rid of bad long TTL
records, you can also reason the other way where a bad administrator's mistake
will show up sooner before he corrects it. I think in general, one should not base
an architecture on such a corner case.

ps: memcached - a distributed memory object caching system

Interesting concept...

Paul

Paul Wouters wrote:

Pros:
- if you have n machines, you can use n times the memory and increase hit rate

Do resolvers these days actually use more then 8GB of RAM? Because a 1000 euro
1U server comes with 8GB and a quadcore cpu.

Well, I don't know. With 4 GB (unbound using about 3.5 GB), I get numbers similar to these:
info: server stats for thread 0: 205293878 queries, 166523599 answers from cache, 38770279 recursions

But will try with 8 GB.

- you will get consistent results, no matter what server you asked

Mind you that you're just changing the time stamp of the old->new record change.
While you can argue about helping bad administrators getting rid of bad long TTL
records, you can also reason the other way where a bad administrator's mistake
will show up sooner before he corrects it. I think in general, one should not base
an architecture on such a corner case.

I don't think if you have a shared cache (from which all servers answer the queries equally) means that this architecture is built on such a corner case.
The world is not perfect, and if you have a caching server, which is used by many people, they will find you first, when they get the "old" website from one machine, and the "new" from another.
They can understand and tolerate if they get only the old and then the new, but this kind of inconsistency hits them hard.
Of course with one server (and with one unbound process running) you won't have these problems. But with two, or more, these cases will generate customer calls, which could be easily solved (or minimized) with a shared cache.
I mentioned this only as a positive side effect, this is not the main driver behind the shared cache "concept".