A DNS Vulnerability that Can Take Down a Server with a Single Data Packet
On July 28, 2015, bind9, the most widely used DNS server in the world, exposed a serious denial of service vulnerability (CVE-2015-5477).
A bit of background knowledge: DNS is a service that maps domain names to IP addresses. When you visit google.com, your computer will ask the DNS server in your area, what is the IP address of google.com? If your neighbor happens to be visiting google.com, the DNS server will directly return its IP; otherwise, this DNS server will ask Google’s official DNS server, get the IP address of google.com, and return it to you. The DNS server in this area is called recursive DNS; if the recursive DNS is down, it will cause the area it serves to be unable to access the internet. Google’s official DNS server is called authoritative DNS; if the authoritative DNS is down, it will cause the websites it serves to disappear from the earth.
DNS Recursive Query (Image Source)
How serious is this vulnerability? Just send a UDP data packet, and you can take down a DNS server. Whether it’s recursive DNS or authoritative DNS, no matter what configuration bind9 has made, as long as this data packet is received by the bind9 process, it will immediately throw an exception and terminate the service.
Roy Zhang, the maintainer of LUG DNS, learned of this vulnerability from the Debian Security Notice and quickly patched it. I wrote a POC and tested some DNS servers, took down the school’s DNS, and reported it to the network center’s James (who later thanked me). Most of the ISP DNS and some smaller public DNS tested were also affected by this vulnerability. Now more than 72 hours have passed since the vulnerability was made public, but this serious vulnerability has not received enough attention. Here is the POC (Proof of Concept exploit code), and I also share the process of writing the POC with you.
Where is the Vulnerability
To be informed of vulnerabilities in a timely manner, I suggest subscribing to the Security Tracker of the distribution you care about. For example, Debian’s announcement about this vulnerability, from the Source column you can link to the source of the vulnerability (usually CVE) and other distribution’s security announcements. The Description is as follows:
1 | named in ISC BIND 9.x before 9.9.7-P2 and 9.10.x before 9.10.2-P3 allows remote attackers to cause a denial of service (REQUIRE assertion failure and daemon exit) via TKEY queries. |
The best way to further understand this vulnerability is through the source code. To fix this vulnerability, what modifications did bind9’s code make, and where is the vulnerability. Ask Google to find the source tree of bind9 (Gitweb), in the commit log you can find this line
1 | 2015-07-14 Mark Andrews add CVE-2015-5477 |
This is just a note, the real code modification is before it. We can browse the commit log to find the real code modification.
The careful reader may have noticed that the commit time is July 14, 2015, which is half a month ago! Yes, the process of vulnerability repair and disclosure is like this.
- Vulnerability report, at this time only the vulnerability reporter and bind9’s security team know.
- bind9 fixes the vulnerability.
- Notify some “important manufacturers” (including major distributions, large companies with cooperative relationships).
- Publicly release at the agreed time.
If you keep an eye on some open source software repositories, you will find that some security vulnerabilities have been fixed, but there is almost no information on the internet. A few days later, the CVE database can be queried, major distributions release security announcements, and media like hacker news start reporting. That is to say, when we learn about a vulnerability from the “official channel”, it is not a 0day, not even a 1day.
Trouble Starts with ASSERTION
Let’s get to the point. The fix for this vulnerability is simple, just adding a line name = NULL;
. The problem description says that an illegal packet will cause an assertion fail and exit.
A DNS query is a UDP packet, posing a question; the DNS server will respond with a UDP packet, telling the answer to the queried question. The packet formats of DNS queries and responses are the same, consisting of questions, answers, authoritative information, additional information, etc.
DNS request format (Image Source)
The problematic code block is like this (in the dns_tkey_processquery
function):
The calling process is as follows:
- Find the name to be queried from the QUESTION block of the DNS request and store it in qname. For example, if we query google.com, there is a question in the QUESTION block, and its name is google.com.
- Find the name that matches the name to be queried (qname) from the ADDITIONAL block of the DNS request and store it in name. For a legal TKEY request, this block should contain the transaction key (this is not important, interested students can go to see RFC 2930).
- If not found in the ADDITIONAL block, try to find it from the ANSWER block. (Damn, did the developers of Win2000 have a brain cramp? This is clearly a question, but they put the TKEY in the answer block, what’s the joke?)
Let’s look at the implementation of dns_message_findname
.
It checks whether the input name pointer is null, if it is not a null pointer, it throws an exception through the REQUIRE macro (indeed, C language has no exceptions… the effect is almost the same) This is a good coding practice: check the legality of parameters at the beginning of function calls. However, as the comment in this code says, the check is too strict… what’s the point of caring whether the value of the name pointer before is null or not!
Here Comes the Problem
If the first dns_message_findname
in dns_tkey_processquery
assigns a non-null value to name, and the return value of the function is not successful, it will cause an exception with the non-null name value when calling the second dns_message_findname
. (The way to fix the vulnerability is to assign name to null before the second call)
In the implementation of dns_message_findname
below:
- Take
target
(that is, the caller’sqname
) to look up in the ADDITIONAL block, if not found, return an error code, this will not be a problem. - If found,
name
will be assigned. Next, we need to make it return failure. - Then check the type of the found record. If it is a
dns_rdatatype_any
(i.e., ANY) ordns_rdatatype_tkey
(i.e., TKEY) type record, return success, otherwise return failure.
We just need to add a record with the same name but mismatched type in the ADDITIONAL block! For example, if we put a TKEY record asking about ring0.me in the QUESTION block, and put an A record of ring0.me in the ADDITIONAL block, it will trigger an exception and cause the bind9 process to exit.
Constructing Payload
This vulnerability only causes bind9 to exit and cannot be used to execute arbitrary commands, so writing payload (attack payload) is much easier than most buffer overflow vulnerabilities. This mainly involves analyzing the DNS protocol, on the one hand, looking at RFC, on the other hand, looking at the source code of bind9 parsing DNS requests. Once again, I recommend the tool combination of vim + cscope for reading C source code. POC is here.
When doing network programming with the C language, the first thing to note is not to let the compiler insert extra padding between the members of the structure, which can be achieved with the GCC extension syntax __attribute__((packed))
. Secondly, you can use the bit field members shown in the figure below to represent each bit in the network protocol, without having to do bit operations yourself.
Attach the vulnerable bind9 with GDB and run the POC. Just send a UDP packet DNS query to the target server, and bind9 will receive the SIGABRT signal and terminate. From the call stack, it can be seen that it is dns_tkey_processquery
that triggered isc_assertion_failed
after calling dns_message_findname
. This DNS vulnerability attack condition is very similar to last year’s Linux SCTP protocol stack buffer overflow vulnerability, and only one packet is enough.
When I started writing this POC, I had a brain cramp and didn’t think I could write an A record, so I wrote a SIG record (which needs to include a domain name and is different from TKEY), but I couldn’t reach the dns_tkey_processquery
function. Using gdb to debug, I found that the signature of the SIG record would be checked at an earlier stage. If the signature is wrong, the processing of this DNS request will be terminated. If the POC does not work as expected, gdb is quite important.
Conclusion
What was originally a good practice of defensive programming has become a source of denial of service vulnerabilities. We can say that the person who wrote dns_tkey_processquery
was not careful enough and did not notice that the function being called requires the output parameters to be initialized to NULL. We can also say that the testing of this code was not thorough and did not test for this situation that would cause an assertion failure. But this situation is indeed very marginal, it is fate to happen to test it, and it is the duty not to cover it in the test. Perhaps only black technology like automatic code verification can save these logical errors.
When a program encounters an unanticipated situation, the UNIX programming philosophy is to fail fast, rather than running with a disease. During development and testing, fail fast is necessary to allow us to locate bugs as early and accurately as possible. However, for some network services where availability is critical, running with a disease may be a better choice in a production environment (abnormal situations can be detected by logs and alarms).
Unlike general client vulnerabilities and server vulnerabilities, this vulnerability affects the infrastructure of the Internet—DNS. Many operators have weak operation and maintenance capabilities, DNS servers are not upgraded for many years, and are more likely to fall. Official organizations like CNCERT should issue security bulletins to remind operators, universities, etc. to upgrade their DNS servers.