The PowerDNS Recursor collects many statistics about itself.
Every half hour or so (configurable with statistics-interval, the recursor outputs a line with statistics. To force the output of statistics, send the process a SIGUSR1. A line of statistics looks like this:
stats: 346362 questions, 7388 cache entries, 1773 negative entries, 18% cache hits
stats: cache contended/acquired 1583/56041728 = 0.00282468%
stats: throttle map: 3, ns speeds: 1487, failed ns: 15, ednsmap: 1363
stats: outpacket/query ratio 54%, 0% throttled, 0 no-delegation drops
stats: 217 outgoing tcp connections, 0 queries running, 9155 outgoing timeouts
stats: 4536 packet cache entries, 82% packet cache hits
stats: thread 0 has been distributed 175728 queries
stats: thread 1 has been distributed 169484 queries
stats: 1 qps (average over 1800 seconds)
This means that in total 346362 queries were received and there are 7388 different name/type combinations in the record cache, each entry may have multiple records attached to it.
There are 1773 items in the negative cache, items of which it is known that don’t exist and won’t do so for the near future. 18% of incoming questions not handled by the packets cache could be answered without any additional queries going out to the net. The record cache was consulted or modified 56041728 times, and 1583 of those accesses caused lock contention.
Next a line with the sizes of maps that can be consulted by rec_control is printed.
The outpacket/query ratio means that on average, 0.54 packets were needed to answer a question. This ratio can be greater than 100% since additional queries could be needed to actually recurse the DNS and figure out the addresses of nameservers.
0% of queries were not performed because identical queries had gone out previously and failed, saving load on servers worldwide. 217 outgoing tcp connections were done, there were 0 queries running at the moment and 9155 queries to authoritative servers saw timeouts.
The packets cache had 4536 entries and 82% of queries were served from it. The workload of the worker queries was 175728 and 169484 respectively. Finally, measured in the last half hour, an average of 1 qps was performed.
Some metrics are collected in thread-local variables, and an aggregate values is computed to report. Other statistics are recorded in global memory and each thread updates the one instance, taking proper precautions to make sure consistency is maintained. The only exception are the cpu-msec-thread-N metrics, which report per-thread data.
For carbon/graphite/metronome, we use the following namespace. Everything starts with ‘pdns.’, which is then followed by the local hostname. Thirdly, we add ‘recursor’ to signify the daemon generating the metrics. This is then rounded off with the actual name of the metric. As an example: ‘pdns.ns1.recursor.questions’.
Care has been taken to make the sending of statistics as unobtrusive as possible, the daemons will not be hindered by an unreachable carbon server, timeouts or connection refused situations.
To benefit from our carbon/graphite support, either install Graphite, or use our own lightweight statistics daemon, Metronome, currently available on GitHub.
To enable sending metrics, set carbon-server, possibly carbon-interval and possibly carbon-ourname in the configuration.
Warning
If your hostname includes dots, they will be replaced by underscores so as not to confuse the namespace.
If you include dots in carbon-ourname, they will not be replaced by underscores. As PowerDNS assumes you know what you are doing if you override your hostname.
Should Carbon not be the preferred way of receiving metrics, several other techniques can be employed to retrieve them.
The API exposes a statistics endpoint at
GET
/api/v1/servers/:server_id/statistics
¶This endpoint exports all statistics in a single JSON document.
rec_control
¶Metrics can also be gathered on the system itself by invoking rec_control:
rec_control get-all
Single statistics can also be retrieved with the get
command, e.g.:
rec_control get all-outqueries
External programs can use this technique to scrape metrics, though it is preferred to use a Prometheus export.
The internal web server exposes Prometheus formatted metrics at
GET
/metrics
¶The Prometheus name are the names listed in metricnames, prefixed with pdns_recursor_
and with hyphens substituted by underscores.
For example:
# HELP pdns_recursor_all_outqueries Number of outgoing UDP queries since starting
# TYPE pdns_recursor_all_outqueries counter
pdns_recursor_all_outqueries 7
The recursor can export statistics over SNMP and send traps from Lua, provided support is compiled into the Recursor and snmp-agent set.
For the details of all values that can be retrieved using SNMP, see the SNMP MIB.
These statistics are gathered.
It should be noted that answers0-1 + answers1-10 + answers10-100 + answers100-1000 + answers-slow + packetcache-hits + over-capacity-drops + policy-drops = questions.
Also note that unauthorized-tcp and unauthorized-udp packets do not end up in the ‘questions’ count.
New in version 4.6.
number of almost-expired tasks that caused an exception
New in version 4.5.
number of negative answers generated from NSEC entries by the aggressive NSEC cache
New in version 4.5.
number of answers synthesized from NSEC entries and wildcards by the NSEC aggressive cache
New in version 4.5.
number of answers synthesized from NSEC entries and wildcards by the NSEC3 aggressive cache
counts the number of outgoing queries since starting, this includes UDP, TCP, DoT queries both over IPv4 and IPv6
counts the number of queries answered after 1 second
counts the number of queries answered within 1 millisecond
counts the number of queries answered within 10 milliseconds
counts the number of queries answered within 100 milliseconds
counts the number of queries answered within 1 second
counts the number of queries answered by auth4s after 1 second (4.0)
counts the number of queries answered by auth4s within 1 millisecond (4.0)
counts the number of queries answered by auth4s within 10 milliseconds (4.0)
counts the number of queries answered by auth4s within 100 milliseconds (4.0)
counts the number of queries answered by auth4s within 1 second (4.0)
counts the number of queries answered by auth6s after 1 second (4.0)
counts the number of queries answered by auth6s within 1 millisecond (4.0)
counts the number of queries answered by auth6s within 10 milliseconds (4.0)
counts the number of queries answered by auth6s within 100 milliseconds (4.0)
counts the number of queries answered by auth6s within 1 second (4.0)
where xxx
is an rcode name (noerror
, formerr
, servfail
, nxdomain
, notimp
, refused
, yxdomain
, yxrrset
, nxrrset
, notauth
, rcode10
, rcode11
, rcode2
, rcode13
, rcode14
, rcode15
).
Counts the rcodes returned by authoritative servers.
The corresponding Prometheus metrics consist of multiple entries of the form pdns_recursor_auth_rcode_answers{rcode="xxx"}
.
counts the number of queries to locally hosted authoritative zones (auth-zones) since starting
size of the cache in bytes (disabled by default, see stats-rec-control-disabled-list) This metric is a rough estimate and takes a long time to compute, and is therefore not enabled in default outputs.
shows the number of entries in the cache
counts the number of cache hits since starting, this does not include hits that got answered from the packet-cache
counts the number of cache misses since starting
counts the number of mismatches in character case since starting
counts the number of times a chain limit (size or age) has been hit
number of queries chained to existing outstanding query
counts number of client packets that could not be parsed
shows the number of MThreads currently running
shows the number of milliseconds spent in thread n. Available since 4.1.12.
New in version 4.4.
Time spent waiting for I/O to complete by the whole system, in units of USER_HZ.
New in version 4.4.
Stolen time, which is the time spent by the whole system in other operating systems when running in a virtualized environment, in units of USER_HZ.
New in version 4.6.
Cumulative counts of answer times of authoritative servers in buckets less than x microseconds. (disabled by default, see stats-rec-control-disabled-list) These metrics are useful for Prometheus and not listed in other outputs by default.
New in version 4.6.
Cumulative counts of our answer times to clients in buckets less or equal than x microseconds. These metrics include packet cache hits. These metrics are useful for Prometheus and not listed in other outputs by default.
New in version 4.6.
number of AAAA
and PTR
answers generated by dns64-prefix matching.
number of queries received with the DO bit set
number of responses sent, packet-cache hits excluded, that were in the DNSSEC Bogus state. Since 4.4.2 detailed counters are available, see below.
Since 4.5.0, if x-dnssec-names is set, a separate set of x-dnssec-result-...
metrics become available, counting
the DNSSEC validation results for names suffix-matching a name in x-dnssec-names
.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a valid DNSKEY could not be found.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a valid denial of existence proof could not be found.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a valid DS could not be retrieved.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a valid DNSKEY could not be retrieved.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a DS record was signed by itself.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because required RRSIG records were not present in an answer.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because only invalid RRSIG records were present in an answer.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a NODATA or NXDOMAIN answer lacked the required SOA and/or NSEC(3) records.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because the signature inception time in the RRSIG was not yet valid.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because the signature expired time in the RRSIG was in the past.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a DNSKEY RRset contained only unsupported DNSSEC algorithms.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because a DS RRset contained only unsupported digest types.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because no DNSKEY with the Zone Key bit set was found.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because all DNSKEYs were revoked.
New in version 4.4.2.
number of responses sent, packet-cache hits excluded, that were in the Bogus state because all DNSKEYs had invalid protocols.
number of DNSSEC validations that had the Indeterminate state
number of responses sent, packet-cache hits excluded, that were in the Insecure state
number of responses sent, packet-cache hits excluded, that were in the NTA (negative trust anchor) state
number of responses sent, packet-cache hits excluded, that were in the Secure state
number of responses sent, packet-cache hits excluded, for which a DNSSEC validation was requested by either the client or the configuration
number of outgoing queries dropped because of dont-query setting (since 3.3)
counts the number of outgoing DoT queries since starting, both using IPv4 and IPv6
New in version 4.3.0.
number of successful queries due to fallback mechanism within qname-minimization setting.
number of outgoing queries adorned with an EDNS Client Subnet option (since 4.1)
number of responses received from authoritative servers with an EDNS Client Subnet option we used (since 4.1)
New in version 4.2.0.
number of responses received from authoritative servers with an IPv4 EDNS Client Subnet option we used, of this subnet size (1 to 32). (disabled by default, see stats-rec-control-disabled-list)
New in version 4.2.0.
number of responses received from authoritative servers with an IPv6 EDNS Client Subnet option we used, of this subnet size (1 to 128). (disabled by default, see stats-rec-control-disabled-list)
number of servers that sent a valid EDNS PING response
number of servers that sent an invalid EDNS PING response
number of addresses in the failed NS cache.
Number of currently used file descriptors. Currently, this metric is available on Linux and OpenBSD only.
counts the number of non-query packets received on server sockets that should only get query packets
number of outgoing queries over IPv6 using UDP, since version 5.0.0 also including TCP and DoT
counts all client initiated queries using IPv6
time spent doing internal maintenance, including Lua maintenance
number of times internal maintenance has been called, including Lua maintenance
returns the number of bytes allocated by the process (broken, always returns 0)
currently configured maximum number of cache entries
maximum chain length
maximum chain weight. The weight of a chain of outgoing queries is the product of the number of chained queries by the size of the response received from the external authoritative server.
currently configured maximum number of packet cache entries
maximum amount of thread stack ever used
shows the number of entries in the negative answer cache
number of erroneous received packets
Number of NOD lookups dropped because they would exceed the maximum name length
number of queries sent out without EDNS
counts the number of times it answered NOERROR since starting
number of entries in the non-resolving NS name cache
number of queries sent out without ENDS PING
number of times an nsset was dropped because it no longer worked
shows the number of entries in the NS speeds map
counts the number of times it answered NXDOMAIN since starting
counts the number of timeouts on outgoing UDP queries since starting
counts the number of timeouts on outgoing UDP IPv4 queries since starting (since 4.0)
counts the number of timeouts on outgoing UDP IPv6 queries since starting (since 4.0)
questions dropped because over maximum concurrent query limit (since 3.2)
size of the packet cache in bytes (since 3.3.1) (disabled by default, see stats-rec-control-disabled-list) This metric is a rough estimate and takes a long time to compute, and is therefore not enabled in default outputs.
size of packet cache (since 3.2)
packet cache hits (since 3.2)
packet cache misses (since 3.2)
packets dropped because of (Lua) policy decision
Number of policy decisions based on Lua (type = "filter"
), or RPZ (type = "rpz"
). RPZ hits include the policyName.
These metrics are useful for Prometheus and not listed in other outputs by default.
packets that were not acted upon by the RPZ/filter engine
packets that were dropped by the RPZ/filter engine
packets that were replied to with NXDOMAIN by the RPZ/filter engine
packets that were replied to with no data by the RPZ/filter engine
packets that were forced to TCP by the RPZ/filter engine
packets that were sent a custom answer by the RPZ/filter engine
shows the current latency average, in microseconds, exponentially weighted over past ‘latency-statistic-size’ packets
New in version 4.2.
questions dropped because the query distribution pipe was full
counts all end-user initiated queries with the RD bit set
New in version 4.1.12.
number of queries balanced to a different worker thread because the first selected one was above the target load configured with ‘distribution-load-factor’
Counts the number of queries that could not be performed because of resource limits. This counter is increased when Recursor encounters a network issue that does not seem to be caused by the remote end. For example when it runs out of file descriptors (monitor fd-usage) or when there is no route to a given IP address.
security status based on Security Polling
counts number of server replied packets that could not be parsed
counts the number of times it answered SERVFAIL since starting
number of times PowerDNS considered itself spoofed, and dropped the data
number of CPU milliseconds spent in ‘system’ mode
number of times an incoming TCP connection was closed immediately because there were too many open connections already
number of times an IP address was denied TCP access because it already had too many connections
counts the number of currently active TCP/IP clients
counts the number of outgoing TCP queries since starting, both using IPv4 and IPV6
counts all incoming TCP queries (since starting)
shows the number of entries in the throttle map
counts the number of throttled outgoing UDP queries since starting
idem to throttled-out
questions dropped that were too old
number of TCP questions denied because of allow-from restrictions
number of UDP questions denied because of allow-from restrictions
number of NOTIFY operations denied because of allow-notify-from restrictions
number of NOTIFY operations denied because of allow-notify-for restrictions
number of answers from remote servers that were unexpected (might point to spoofing)
number of times nameservers were unreachable since starting
number of seconds process has been running (since 3.1.5)
number of CPU milliseconds spent in ‘user’ mode
New in version 4.2.
Responses that were marked as ‘variable’. This could be because of EDNS Client Subnet or Lua rules that indicate this variable status (dependent on time or who is asking, for example).
New in version 4.1: Not yet proven to be reliable
PowerDNS measures per query how much time has been spent waiting on authoritative servers. In addition, the Recursor measures the total amount of time needed to answer a question. The difference between these two durations is a measure of how much time was spent within PowerDNS. This metric is the average of that difference, in microseconds.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 0 and 1 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 1 and 2 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 2 and 4 milliseconds was spent within the Recursor. Since 4.1. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 4 and 8 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 8 and 16 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 16 and 32 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where more than 32 milliseconds was spent within the Recursor. See x-our-latency for further details.