<character-string> processing

Hi all

We were discussing <character-string> (RFC 1035) parsing in BIND 10,
when parsing the following line in a master file:

example.org. IN TXT Test-String"Test-String"

BIND 9 parses this as two <character-strings> (one quoted and another
unquoted) and they end up in a single RR, with RDATA as:

0x0b, 'T', 'e', 's', 't', '-', 'S', 't', 'r', 'i', 'n', 'g',
0x0b, 'T', 'e', 's', 't', '-', 'S', 't', 'r', 'i', 'n', 'g'

RDLENGTH = 0x0018

Are you are aware, this wire data represents the TXT data
"Test-StringTest-String" (as a concatenation of multiple
<character-string>s - see TXT in RFC 1035 and RFC 4408 section 3.1.3 if
you want some description).

We wanted to implement the same behavior in BIND 10 (as the master
loader is based on BIND 9 and we want to support existing BIND 9 master
files). But there is some argument^wdiscussion whether this is the right
behaviour, because the description of <character-string> in 1035 when
taken very strictly does not state that " can be the start of a new
token. If one were to strictly read it, it would mean that the entire
Test-String"Test-String" is to be lexed as a single <character-string>
token with all these octets appearing in it.

I tried to test what NSD's behavior is with this line, but nsd-3.2.15 in
Fedora 20 failed to parse it and didn't load the zone. I was asked by
Shane to inform you, so that's the report. :slight_smile:

    Mukund

Hi,

Hi all

We were discussing <character-string> (RFC 1035) parsing in BIND 10,
when parsing the following line in a master file:

example.org. IN TXT Test-String"Test-String"

BIND 9 parses this as two <character-strings> (one quoted and another
unquoted) and they end up in a single RR, with RDATA as:

0x0b, 'T', 'e', 's', 't', '-', 'S', 't', 'r', 'i', 'n', 'g',
0x0b, 'T', 'e', 's', 't', '-', 'S', 't', 'r', 'i', 'n', 'g'

RDLENGTH = 0x0018

Are you are aware, this wire data represents the TXT data
"Test-StringTest-String" (as a concatenation of multiple
<character-string>s - see TXT in RFC 1035 and RFC 4408 section 3.1.3 if
you want some description).

We wanted to implement the same behavior in BIND 10 (as the master
loader is based on BIND 9 and we want to support existing BIND 9 master
files). But there is some argument^wdiscussion whether this is the right
behaviour, because the description of <character-string> in 1035 when
taken very strictly does not state that " can be the start of a new
token. If one were to strictly read it, it would mean that the entire
Test-String"Test-String" is to be lexed as a single <character-string>
token with all these octets appearing in it.

I tried to test what NSD's behavior is with this line, but nsd-3.2.15 in
Fedora 20 failed to parse it and didn't load the zone. I was asked by
Shane to inform you, so that's the report. :slight_smile:

According to RFC 1035, Section 5.1:

<character-string> is expressed in one or two ways: as a contiguous set
of characters without interior spaces, or as a string beginning with a "
and ending with a ". Inside a " delimited string any character can
occur, except for a " itself, which must be quoted using \ (back slash).

I interpret this as NSD should be able to accept this record.

example.org. IN TXT Test-String"Test-String"

I too interpret the text in 1035 such that it should be considered as
one <character-string> and thus the RDATA would become.

0x18, 'T', 'e', 's', 't', '-', 'S', 't', 'r', 'i', 'n', 'g', '"' 'T',

'e', 's', 't', '-', 'S', 't', 'r', 'i', 'n', 'g', '"'

Otherwise, failing to load this record seems appropriate, and the user
should escape the '"' characters or whitespace, depending on what is
actually meant.

Best regards,
  Matthijs