AAAA filter patch proposal

Hello,

There was a post several days ago about AAAA filter, and questions about an implementation as a Python module by Christophe.

https://unbound.net/pipermail/unbound-users/2014-October/003579.html

I work at the same company as him (a Japanese ISP, which are all subjected to NTT's flaky practices with use of IPv6), and have been working with him on this issue.

To sum up the scenario we are trying to fix :
- customers in Japan have a physical carrier (NTT in most cases) on top of which they get their own internet provider, to which they connect via PPPoE. In our case, we currently provide only IPv4 service at this point in time.
- some customers also get an on-demand video service via his carrier, NTT, which give them a default IPv6 route via IPoE, or they raise a second PPPoE session (NTT usage terms allow for up to 2 sessions for this scenario).
- the catch: said IPv6 default route does not go outside on the internet, and only enables access to NTT's closed network. Therefore, a customer in this configuration trying to access any IPv6 site is in for a world of pain as his browser times out and retries, hopefully fallbacking to IPv4. This prompted every Internet service provider in Japan to either provide native IPv6 or to filter AAAA records for "non-v6 only sites", which in this instance pretty much means everyone uses BIND.

To answer Bill Manning's earlier statement, "we can not change providers", first reason being because we ARE a provider, but also because even if we wanted to change carriers, everyone in Japan is entirely dependant on the majority physical carrier, i.e NTT. Since it happens at a lower layer than the ones we have control over (we work at PPPoE encapsulated level, they work at Ethernet level), we have no control whatsoever over said carrier-provided route, short of ourselves providing IPv6 service over or PPPoE and an overriding route, or via IPoE (and then tell NTT to buzz off and provide our route, if we had one). This is obviously scheduled as the proper solution, but requires overhauling all of the network infrastructure, which can not be done instantly.

Also, thanks a lot to Daisuke Higashi for his statement, using "private-address/private-domain" is initially what we planned on doing, except this gets scary when we think about "what if NTT springs yet another domain on top of that, that we need to allow access to?" or "what if another customer tries to access yet another IPv6-only site in the future?", and the "whack-a-mole" administration nightmare it might become.

In the meantime, we still need to get rid of BIND, which can't handle the resource exhaustion DDoS attacks (DNS Amplification Attacks Observer: Authoritative Name Server attack) we are seeing since february of this year. This is where we wanted to go with unbound, except since it does not have AAAA-filter functionality, we could not use it in production for most of our customers.

This is where Christophe attempted to intercept queries with Python and we found out :
- Python API does not enable to spawn sub queries (for each AAAA query, the relevant A record has to exist in cache, for AAAA filter logic to work)
- Python API does not enable to lookup RRset cache for a given record (in this case, for an A record matching the queried name)
- Python API does not allow for easy scrubbing of packets (it IS possible, but very painful)

I therefore came to the conclusion that the Python API was not appropriate to do these things, and that the most appropriate place to implement a filtering/scrubbing logic was the iterator module itself.

I coded the following patch (also attached) : http://www.yomi.darkbsd.org/~darksoul/aaaa-filter-iterator.patch
It has been on tests and running in pre-production for roughly a month now (while undergoing some tuning as I got around to understanding how the state machine works).

The patch provides :
- a "aaaa-filter" config option which is off by default, so as to not be intrusive (I am fully aware this functionality is enough of an abomination as is). It can also be used in conjunction to private-address/private-domain without any issues.
- the relevant manual entry
- modifications to iterator/iter_scrub.c in scrub_sanitize() to remove AAAA records for queries that either are not AAAA type, OR that did return an A record, IF cfg->aaaa_filter is enabled.
- modifications to iterator/iter_utils.c to provide AAAA filter "on/off" info to the iterator environment.
- modifications to iterator/iterator.c :
-- a new ASN_FETCH_A_FOR_AAAA_STATE from which we branch into from QUERYTARGETS_STATE if this is a AAAA query (modifies iter_handle(), iter_state_to_string(), iter_state_is_responsestate(),
-- asn_processQueryAAAA() function that throws a subquery and flags the parent query as "already fetching an A subquery" so as to not loop
-- modifications to iter_inform_super() to handle the new state for AAAA parent queries having a A subquery.
-- asn_processAAAAResponse() function that basically takes after what error_supers() and processTargetResponse() do, except it does not alter target queries counters.
- modifications to iterator/iterator.h : declaration of new flags for iter_env (configuration option), iter_qstate (status flag), and the new iter_state

At this point, the patch is pretty much stable and performing as expected, but I am looking for pointers as to stuff I could improve on that patch, especially style-wise, to ensure it is applicable as long as possible. In its current state, I can apply it up to 1.4.22.

I also know from previous postings that unbound development staff's opinion is that this functionality as a whole would harm IPv6 adoption, and therefore can probably not be officially endorsed, but I still intend to provide it freely (my company has given approval) to people suffering from the same scenario. (that is, mainly Japanese users at this point...)

Thanks for your time,

(attachments)

aaaa-filter-iterator.patch (14.1 KB)

Hi Stephane,

Hello,

There was a post several days ago about AAAA filter, and questions
about an implementation as a Python module by Christophe.

https://unbound.net/pipermail/unbound-users/2014-October/003579.html

I work at the same company as him (a Japanese ISP, which are all
subjected to NTT's flaky practices with use of IPv6), and have
been working with him on this issue.

The patch looks to have nice clean code.

If you are looking for feedback on the code, this is what I can find:
iterator.h, comment for fetch_a_for_aaaa is misleading: say that a
subquery has been made for fetching A records. It now seems as if the
flag is set in the subquery, but it is set in the superquery (to avoid
asking twice).

iterator.c, asn_processAAAAResponse: this routine can be shortened, I
think. After changing the super_iq->state and log_query_info lines,
it can simply return. However, the current code does not fail either
; it might be more 'optimal' and save the statemachine some work.

Thank you for publishing the patch. Are you all right if I put this
patch in the source contrib/ directory to make it more easily
available to the users?

We don't provide support for the contrib material, but it may be
useful for users in weird circumstances.

Best regards,
   Wouter

To sum up the scenario we are trying to fix : - customers in Japan
have a physical carrier (NTT in most cases) on top of which they
get their own internet provider, to which they connect via PPPoE.
In our case, we currently provide only IPv4 service at this point
in time. - some customers also get an on-demand video service via
his carrier, NTT, which give them a default IPv6 route via IPoE, or
they raise a second PPPoE session (NTT usage terms allow for up to
2 sessions for this scenario). - the catch: said IPv6 default route
does not go outside on the internet, and only enables access to
NTT's closed network. Therefore, a customer in this configuration
trying to access any IPv6 site is in for a world of pain as his
browser times out and retries, hopefully fallbacking to IPv4. This
prompted every Internet service provider in Japan to either provide
native IPv6 or to filter AAAA records for "non-v6 only sites",
which in this instance pretty much means everyone uses BIND.

To answer Bill Manning's earlier statement, "we can not change
providers", first reason being because we ARE a provider, but also
because even if we wanted to change carriers, everyone in Japan is
entirely dependant on the majority physical carrier, i.e NTT. Since
it happens at a lower layer than the ones we have control over (we
work at PPPoE encapsulated level, they work at Ethernet level), we
have no control whatsoever over said carrier-provided route, short
of ourselves providing IPv6 service over or PPPoE and an overriding
route, or via IPoE (and then tell NTT to buzz off and provide our
route, if we had one). This is obviously scheduled as the proper
solution, but requires overhauling all of the network
infrastructure, which can not be done instantly.

Also, thanks a lot to Daisuke Higashi for his statement, using
"private-address/private-domain" is initially what we planned on
doing, except this gets scary when we think about "what if NTT
springs yet another domain on top of that, that we need to allow
access to?" or "what if another customer tries to access yet
another IPv6-only site in the future?", and the "whack-a-mole"
administration nightmare it might become.

In the meantime, we still need to get rid of BIND, which can't
handle the resource exhaustion DDoS attacks
(http://dnsamplificationattacks.blogspot.jp/2014/02/authoritative-name-server-attack.html)

we are seeing since february of this year. This is where we wanted to go

Hello Wouter,

Many thanks for the quick reply and review.

Right, that comment in iterator.h is a remnant of something I initially planned on doing, and changed ideas along the road.
I just fixed it.

As for processAAAAResponse, I wrote it this way to get a verbose trace of what is going on should it fail. (based mainly on how the other handler functions operate)
Actually, thinking back, the name for this function is probably not the best…

Of course, I am totally fine with contributing this patch.
Though, I was just wondering if there is a show-stopper in integrating it in the main code, since I provide an option to use this behavior or not, and this is made as to not impact default behavior. (Of course, this would add one config/environment flag check per query at execution)

Now, I do understand this is one feature you are not exactly keen with in the first place, let alone provide support for it.
However, if I can further brush up the code to make it seaworthy for the main repository, I am fine with pulling in the effort.

I understand a lot of Japanese users would be extremely thankful for easy availability of this feature via their favorite distribution, instead of manual building/packaging.
(Of course, there would remain the option of applying the contrib/ patch at distro packaging level, like Debian and *BSD do, but this would multiply efforts)

Cheers,