[dns] Add dual-stack fake IP rewriting with nftables maps

Support A and AAAA DNS answer rewriting, CNAME alias handling, and temporary IPv4/IPv6 NAT mappings backed by nftables sets/maps.

Add example nftables rules and expand the README with usage, behavior, and setup notes.
This commit is contained in:
2026-05-19 13:43:17 +08:00
parent 63c471be8e
commit b066e36770
3 changed files with 680 additions and 239 deletions

197
README.md
View File

@@ -1,3 +1,196 @@
# dotp
# DNS-TProxy
Domain Transport Proxy with DNS and Netfilter
DNS-based transparent proxy using fake IPs and nftables to redirect and NAT selected domain traffic.
`dotp` could stand for Domain Over Transparent Proxy.
## Overview
`dotp` is a small DNS proxy that selectively rewrites A and AAAA records for
specified domains to "fake" IP addresses and maintains a temporary
one-to-one mapping between fake and real client addresses using nftables. This
allows traffic to be transparently redirected and NAT-ed while keeping
per-connection state in userspace and in kernel nftables sets.
The program:
- Listens for DNS queries on a UDP socket.
- Forwards queries to an upstream DNS server.
- Inspects the DNS responses and, for configured domains, replaces the
returned IPv4/IPv6 addresses with fake addresses from configured prefixes.
- Programs nftables sets to map fake addresses back to the corresponding
real addresses for a short TTL.
## How it works
### DNS path
1. A client sends a DNS query (UDP/53) to `dotp`.
2. `dotp` receives the packet, allocates a per-request `client_ctx`, and opens
a temporary UDP socket to the upstream DNS server.
3. The upstream response is read and parsed:
- DNS header and question section are copied as-is.
- Answer section is scanned record by record.
- For each name that matches a configured domain:
- CNAME records are copied over, preserving compression and structure.
- A/AAAA records are intercepted: their RDATA (IP address) is replaced
with a fake IP allocated from the configured IPv4/IPv6 prefix pools.
4. For every substituted address:
- A NAT entry is created or looked up in an in-memory hash table.
- A libev timer is armed for 120 seconds (`NAT_TTL`) for the mapping.
- nftables commands are issued (via libnftables) to:
- Add the real address to `inet <table> nat_addr` / `nat_addr6` sets.
- Add a mapping from fake to real in `inet <table> nat_map` /
`nat_map6` sets.
5. The modified DNS response is sent back to the client.
6. When the NAT timer expires, `dotp` removes the corresponding entries from
the nftables sets and frees the mapping.
If the incoming DNS packet is larger than the internal buffer
(`MAX_MESSAGE_SIZE`), the response is truncated and the TC (truncation) flag is
set in the DNS header.
### Domain matching
Domains are supplied via the `-d` option and stored in a tree of labels. During
DNS parsing, domain names in questions and answers are decoded, including
compression pointers, and matched against this tree:
- Only names under one of the configured domains are subject to rewriting.
- CNAME chains are followed so that subsequent A/AAAA answers for the aliased
name are also rewritten.
### NAT pool
`dotp` manages two address pools (IPv4 and IPv6) defined by prefixes passed on
the command line. It uses a simple hash (`city_hash_mix`) to allocate unique
fake addresses within the prefix ranges:
- For each real address, a `struct ip_nat` is created containing:
- real address and fake address
- family (AF_INET / AF_INET6)
- links for two separate hash chains (by real and by fake address)
- a libev timer (`expire`) that removes the mapping after `NAT_TTL` seconds.
- Lookups by real address refresh the timer.
## nftables integration
The code assumes an existing nftables table named `inet dotp` with appropriate
sets/rules. It manipulates the following sets:
- `nat_addr` / `nat_addr6`: containers for real client addresses.
- `nat_map` / `nat_map6`: maps from fake to real addresses.
It uses commands like:
- `add element inet dotp nat_addr { REAL }`
- `add element inet dotp nat_map { FAKE:REAL }`
- `add element inet dotp nat_addr6 { [REAL6] }`
- `add element inet dotp nat_map6 { [FAKE6]:[REAL6] }`
and their corresponding `delete element` variants on expiry.
The nftables rules themselves (e.g. for DNAT/SNAT using these sets) are not
set up by the program; an example configuration is provided in `src/rules.nft`.
## Command-line usage
From the `main` function, the expected usage is:
```text
Usage: dotp -H LISTEN_HOST -p LISTEN_PORT
-4 FAKE_IP_PREFIX -6 FAKE_IP6_PREFIX
--upstream-host UPSTREAM_HOST
--upstream-port UPSTREAM_PORT
[ -d DOMAIN ]
[ --daemonize ]
```
### Options
- `-H, --host LISTEN_HOST`
IP address to bind the local UDP DNS listener to (IPv4 only in current
implementation). Required.
- `-p, --port LISTEN_PORT`
Local UDP port to listen on. Defaults to 53.
- `-d, --domain DOMAIN`
Domain to subject to address rewriting (may be specified multiple times).
Domain syntax is validated (alphanumeric plus `-`, no leading/trailing `-`,
each label up to 63 chars).
- `-4, --ipv4-prefix FAKE_IP_PREFIX`
IPv4 prefix (e.g. `100.64.0.0/24`) from which fake IPv4 addresses will be
allocated. Prefix length must be ≤ 30. Required.
- `-6, --ipv6-prefix FAKE_IP6_PREFIX`
IPv6 prefix (e.g. `fd00::/64`) from which fake IPv6 addresses will be
allocated. Prefix length must be ≤ 64. Required.
- `--upstream-host UPSTREAM_HOST`
IPv4 address of the upstream DNS server to which queries are forwarded.
Required.
- `--upstream-port UPSTREAM_PORT`
UDP port of the upstream DNS server. Defaults to 53.
- `--daemonize`
Run the process in the background using `daemon(3)`.
`dotp` exits with error status if required options are missing or invalid.
## Runtime behavior
- Uses `libev` for event-driven I/O:
- One main `server_ctx` for the listening socket.
- A short-lived `client_ctx` for each in-flight upstream query with a
5-second timeout.
- On `SIGINT` or `SIGTERM`, the event loop is broken and the server exits
cleanly, freeing domain/NAT structures and nftables context.
## Dependencies
Build-time and runtime dependencies inferred from `src/main.c`:
- POSIX sockets (`AF_INET`, `SOCK_DGRAM`, `recvfrom`, `sendto`, `bind`)
- `libev`
- `libnftables`
On Debian/Ubuntu-like systems, packages may be named:
- `libev-dev`
- `libnftables-dev`
## Limitations
- Only UDP DNS is supported.
- Listener and upstream are currently IPv4-only.
- DNS payload size is limited to `MAX_MESSAGE_SIZE` (0x200 bytes).
- nftables table/sets must be created externally (see `src/rules.nft`).
## Example
Assuming:
- You have an nftables table `inet dotp` set up according to `src/rules.nft`.
- You want to redirect traffic for `example.com` and `foo.example.com`.
- You have fake address ranges:
- IPv4: `100.64.0.0/24`
- IPv6: `fd00::/64`
- Your upstream resolver is `1.1.1.1`.
You might run:
```sh
./dotp \
-H 0.0.0.0 -p 53 \
-4 100.64.0.0/24 \
-6 fd00::/64 \
--upstream-host 1.1.1.1 \
--upstream-port 53 \
-d example.com
```
Point your clients' DNS to the host running `dotp`. For matching domains,
clients will see fake IPs, while nftables rules will map those fake addresses
back to the real ones for the lifetime of the mapping.

View File

@@ -17,12 +17,17 @@
#include <nftables/libnftables.h>
#define MAX_MESSAGE_SIZE 0x200
#define MAX_FAKE_ADDRESS_SEARCH_ATTEMPS 20
#define DNS_FLAG_TC 0x200
#define DNS_TYPE_A 1
#define DNS_TYPE_NS 2
#define DNS_TYPE_CNAME 5
#define DNS_TYPE_SOA 6
#define DNS_TYPE_AAAA 28
#define DNS_CLASS_IN 0x0001
#define NAT_TTL 60
#define NAT_TTL 120
typedef struct __attribute__((packed)) dns_header {
uint16_t id;
@@ -40,6 +45,11 @@ typedef struct __attribute__((packed)) dns_answer_header {
uint16_t rd_len;
} dns_answer_header_t;
typedef union ip_address {
struct in_addr in;
struct in6_addr in6;
} ip_address_t;
typedef struct domain_set {
struct domain_set *next;
struct domain_set *head;
@@ -50,15 +60,17 @@ typedef struct domain_set {
typedef struct domain_name {
struct domain_name *next;
ssize_t ptr;
domain_set_t *match;
ssize_t ptr, new_ptr;
size_t label_len;
struct domain_name *alias;
domain_set_t *exact_match, *match;
} domain_name_t;
typedef struct domain_msg {
domain_set_t *match_root;
ssize_t len;
uint8_t *raw;
uint8_t *raw, *new_raw;
domain_name_t *name_head;
} domain_msg_t;
@@ -69,15 +81,15 @@ typedef struct ip_nat {
ev_timer expire;
ip_pool_t *pool;
struct ip_nat *fake_next, *real_next;
uint32_t fake, real;
int dst_handle, src_handle;
sa_family_t family;
ip_address_t fake_addr, real_addr;
} ip_nat_t;
typedef struct ip_pool {
uint32_t pf;
uint32_t pf_mask;
struct in_addr net, net_mask;
struct in6_addr net6, net6_mask;
// Indexed by fake
// must be an exponent of 2
size_t size;
ip_nat_t **fake;
ip_nat_t **real;
@@ -107,7 +119,52 @@ typedef struct client_ctx {
static struct nft_ctx *nft_ctx;
static const char *nft_nat_table = "dotp";
static char *nft_fake_set;
static uint64_t city_hash_mix(uint64_t a, uint64_t b) {
a ^= b;
a *= 0x9ddfea08eb382d69ULL;
a ^= (a >> 47);
b ^= a;
b *= 0x9ddfea08eb382d69ULL;
b ^= (b >> 47);
return b * 0x9ddfea08eb382d69ULL;
}
static uint64_t city_hash_4(const uint32_t *s) {
return city_hash_mix(((uint64_t)s[0] << 3), s[0]);
}
static uint64_t city_hash_16(const uint64_t *s) {
return city_hash_mix(s[0], s[1]);
}
static size_t address_hash(sa_family_t family, ip_address_t const *addr) {
if (family == AF_INET) {
return city_hash_4((uint32_t const *)&addr->in.s_addr);
} else if (family == AF_INET6) {
return city_hash_16((uint64_t const *)&addr->in6.s6_addr);
}
return 0;
}
static int address_cmp(sa_family_t family, ip_address_t const *addr_a,
ip_address_t const *addr_b) {
if (family == AF_INET) {
return memcmp(&addr_a->in.s_addr, &addr_b->in.s_addr, 4);
} else if (family == AF_INET6) {
return memcmp(&addr_a->in6.s6_addr, &addr_b->in6.s6_addr, 16);
}
return 0;
}
static void address_copy(sa_family_t family, ip_address_t *addr_dst,
ip_address_t const *addr_src) {
if (family == AF_INET) {
memcpy(&addr_dst->in.s_addr, &addr_src->in.s_addr, 4);
} else if (family == AF_INET6) {
memcpy(&addr_dst->in6.s6_addr, &addr_src->in6.s6_addr, 16);
}
}
char *malloc_sprintf(const char *fmt, ...) {
char *buffer = NULL;
@@ -134,22 +191,14 @@ finish:
return buffer;
}
static uint32_t ip_hash(uint32_t addr) {
return addr ^ ((addr << 3) | (addr >> 29)) ^ ((addr << 7) | (addr >> 25)) ^
((addr << 13) | (addr >> 19));
}
static void ip_pool_init(ip_pool_t *pool, uint32_t pf, uint32_t pf_mask) {
pool->pf = pf;
pool->pf_mask = pf_mask;
static void ip_pool_init(ip_pool_t *pool) {
pool->size = 0x1000;
pool->fake = (ip_nat_t **)malloc(pool->size * sizeof(ip_nat_t *));
pool->real = (ip_nat_t **)malloc(pool->size * sizeof(ip_nat_t *));
bzero(pool->fake, pool->size * sizeof(ip_nat_t *));
bzero(pool->real, pool->size * sizeof(ip_nat_t *));
memset(pool->fake, 0, pool->size * sizeof(ip_nat_t *));
memset(pool->real, 0, pool->size * sizeof(ip_nat_t *));
}
static void nat_free_chain(ip_nat_t *nat) {
@@ -167,41 +216,19 @@ static void ip_pool_fini(ip_pool_t *pool) {
free(pool->real);
}
static void nat_expire(EV_P_ ev_timer *w, int revents) {
ip_nat_t *nat = (ip_nat_t *)w;
static void nat_remove(ip_nat_t *nat) {
ip_pool_t *pool = nat->pool;
ev_timer_stop(EV_A, w);
size_t real_bucket =
address_hash(nat->family, &nat->real_addr) & (pool->size - 1);
size_t fake_bucket =
address_hash(nat->family, &nat->fake_addr) & (pool->size - 1);
char *cmd = malloc_sprintf("delete rule ip %s postrouting handle %d",
nft_nat_table, nat->src_handle);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
cmd = malloc_sprintf("delete rule ip %s prerouting handle %d", nft_nat_table,
nat->dst_handle);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
if (nft_fake_set) {
char real_ip[16];
uint32_t af_real = htonl(nat->real);
inet_ntop(AF_INET, &af_real, real_ip, 16);
char *cmd =
malloc_sprintf("delete element %s { %s }", nft_fake_set, real_ip);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
uint32_t mask = pool->size - 1;
ip_nat_t **p_other = &pool->real[nat->real & mask];
ip_nat_t **p_other = &pool->real[real_bucket];
ip_nat_t *other = *p_other;
for (; other; other = other->real_next) {
if (nat == other) {
for (; other;) {
if (other == nat) {
*p_other = nat->real_next;
break;
}
@@ -209,11 +236,11 @@ static void nat_expire(EV_P_ ev_timer *w, int revents) {
other = *p_other;
}
p_other = &pool->fake[nat->fake & mask];
p_other = &pool->fake[fake_bucket];
other = *p_other;
for (; other; other = other->fake_next) {
if (nat == other) {
for (; other;) {
if (other == nat) {
*p_other = nat->fake_next;
break;
}
@@ -224,26 +251,89 @@ static void nat_expire(EV_P_ ev_timer *w, int revents) {
free(nat);
}
static ip_nat_t *find_nat(EV_P_ ip_pool_t *pool, uint32_t real_addr) {
size_t mask = pool->size - 1;
static void nat_expire(EV_P_ ev_timer *w, int revents) {
ip_nat_t *nat = (ip_nat_t *)w;
uint32_t h = ip_hash(real_addr);
uint32_t fake_addr;
for (ip_nat_t *nat = pool->real[real_addr & mask]; nat;
nat = nat->real_next) {
if (nat->real == real_addr) {
ev_timer_stop(EV_A, w);
if (nat->family == AF_INET) {
char fake_p[16], real_p[16];
inet_ntop(AF_INET, &nat->fake_addr.in, fake_p, 16);
inet_ntop(AF_INET, &nat->real_addr.in, real_p, 16);
{
char *cmd = malloc_sprintf("delete element inet %s nat_map { %s }",
nft_nat_table, fake_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
{
char *cmd = malloc_sprintf("delete element inet %s nat_addr { %s }",
nft_nat_table, real_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
} else if (nat->family == AF_INET6) {
char fake_p[40], real_p[40];
inet_ntop(AF_INET6, &nat->fake_addr.in6, fake_p, 40);
inet_ntop(AF_INET6, &nat->real_addr.in6, real_p, 40);
{
char *cmd = malloc_sprintf("delete element inet %s nat_map6 { [%s] }",
nft_nat_table, fake_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
{
char *cmd = malloc_sprintf("delete element inet %s nat_addr6 { [%s] }",
nft_nat_table, real_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
}
nat_remove(nat);
}
static ip_nat_t *nat_find(EV_P_ ip_pool_t *pool, sa_family_t family,
ip_address_t const *addr) {
size_t real_bucket = address_hash(family, addr) & (pool->size - 1);
for (ip_nat_t *nat = pool->real[real_bucket]; nat; nat = nat->real_next) {
if (nat->family == family && !address_cmp(family, &nat->real_addr, addr)) {
ev_timer_again(EV_A, &nat->expire);
return nat;
}
}
int ok = 0;
for (; h != 0;) {
fake_addr = pool->pf | (h & ~pool->pf_mask);
ip_address_t fake_addr;
size_t fake_bucket;
uint64_t h = address_hash(family, addr);
for (int k = 0; k < MAX_FAKE_ADDRESS_SEARCH_ATTEMPS; k++) {
ok = 1;
for (ip_nat_t *nat = pool->fake[fake_addr & mask]; nat;
nat = nat->fake_next) {
if (nat->fake == fake_addr) {
if (family == AF_INET) {
fake_addr.in.s_addr =
pool->net.s_addr | ((uint32_t)h & ~pool->net_mask.s_addr);
} else {
fake_addr.in6.s6_addr32[0] =
pool->net6.s6_addr32[0] |
(addr->in6.s6_addr32[0] & ~pool->net6_mask.s6_addr32[0]);
fake_addr.in6.s6_addr32[1] =
pool->net6.s6_addr32[1] |
(addr->in6.s6_addr32[1] & ~pool->net6_mask.s6_addr32[1]);
fake_addr.in6.s6_addr32[2] =
pool->net6.s6_addr32[2] |
((uint32_t)(h >> 32) & ~pool->net6_mask.s6_addr32[2]);
fake_addr.in6.s6_addr32[3] =
pool->net6.s6_addr32[3] |
((uint32_t)h & ~pool->net6_mask.s6_addr32[3]);
}
fake_bucket = address_hash(family, &fake_addr) & (pool->size - 1);
for (ip_nat_t *nat = pool->fake[fake_bucket]; nat; nat = nat->fake_next) {
if (nat->family == family &&
!address_cmp(family, &nat->fake_addr, &fake_addr)) {
ok = 0;
break;
}
@@ -253,9 +343,9 @@ static ip_nat_t *find_nat(EV_P_ ip_pool_t *pool, uint32_t real_addr) {
}
h++;
}
if (h == 0) {
if (!ok) {
return NULL;
} else {
}
ip_nat_t *nat = (ip_nat_t *)malloc(sizeof(ip_nat_t));
nat->pool = pool;
@@ -264,53 +354,53 @@ static ip_nat_t *find_nat(EV_P_ ip_pool_t *pool, uint32_t real_addr) {
nat->expire.repeat = NAT_TTL;
ev_timer_again(EV_A, &nat->expire);
nat->fake = fake_addr;
nat->real = real_addr;
nat->family = family;
address_copy(family, &nat->fake_addr, &fake_addr);
address_copy(family, &nat->real_addr, addr);
nat->fake_next = pool->fake[fake_addr & mask];
pool->fake[fake_addr & mask] = nat;
nat->fake_next = pool->fake[fake_bucket];
pool->fake[fake_bucket] = nat;
nat->real_next = pool->real[real_addr & mask];
pool->real[real_addr & mask] = nat;
nat->real_next = pool->real[real_bucket];
pool->real[real_bucket] = nat;
char real_ip[16], fake_ip[16];
uint32_t af_fake = htonl(fake_addr);
uint32_t af_real = htonl(real_addr);
if (family == AF_INET) {
char real_p[16], fake_p[16];
inet_ntop(AF_INET, &nat->fake_addr.in, fake_p, 16);
inet_ntop(AF_INET, &nat->real_addr.in, real_p, 16);
inet_ntop(AF_INET, &af_fake, fake_ip, 16);
inet_ntop(AF_INET, &af_real, real_ip, 16);
if (nft_fake_set) {
char *cmd =
malloc_sprintf("add element %s { %s }", nft_fake_set, real_ip);
{
char *cmd = malloc_sprintf("add element inet %s nat_addr { %s }",
nft_nat_table, real_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
nft_ctx_buffer_output(nft_ctx);
char *cmd =
malloc_sprintf("add rule ip %s prerouting ip daddr %s dnat to %s",
nft_nat_table, fake_ip, real_ip);
{
char *cmd = malloc_sprintf("add element inet %s nat_map { %s:%s }",
nft_nat_table, fake_p, real_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
char *echo_fmt = malloc_sprintf("%s # handle %%d", cmd);
sscanf(nft_ctx_get_output_buffer(nft_ctx), echo_fmt, &nat->dst_handle);
free(echo_fmt);
free(cmd);
nft_ctx_unbuffer_output(nft_ctx);
}
} else if (family == AF_INET6) {
char real_p[40], fake_p[40];
inet_ntop(AF_INET6, &nat->fake_addr.in6, fake_p, 40);
inet_ntop(AF_INET6, &nat->real_addr.in6, real_p, 40);
nft_ctx_buffer_output(nft_ctx);
cmd = malloc_sprintf("add rule ip %s postrouting ip saddr %s snat to %s",
nft_nat_table, real_ip, fake_ip);
{
char *cmd = malloc_sprintf("add element inet %s nat_addr6 { [%s] }",
nft_nat_table, real_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
echo_fmt = malloc_sprintf("%s # handle %%d", cmd);
sscanf(nft_ctx_get_output_buffer(nft_ctx), echo_fmt, &nat->src_handle);
free(echo_fmt);
free(cmd);
nft_ctx_unbuffer_output(nft_ctx);
}
{
char *cmd = malloc_sprintf("add element inet %s nat_map6 { [%s]:[%s] }",
nft_nat_table, fake_p, real_p);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
}
}
return nat;
}
}
static void free_domain_names(domain_name_t *head) {
if (head) {
@@ -323,12 +413,6 @@ static ssize_t read_domain_name(domain_msg_t *msg, ssize_t ptr,
domain_name_t **p_name) {
domain_name_t *name = NULL;
for (name = msg->name_head; name; name = name->next) {
if (name->ptr == ptr) {
*p_name = name;
return 0;
}
}
if (ptr >= msg->len) {
*p_name = NULL;
return -1;
@@ -337,32 +421,42 @@ static ssize_t read_domain_name(domain_msg_t *msg, ssize_t ptr,
if ((label_len & 0xc0) == 0) {
name = (domain_name_t *)malloc(sizeof(domain_name_t));
name->ptr = ptr;
name->new_ptr = 0;
name->label_len = label_len;
ptr++;
name->alias = name;
if (label_len == 0) {
name->match = msg->match_root;
name->match = name->exact_match = msg->match_root;
} else {
if (ptr + label_len > msg->len) {
goto label_fail;
}
domain_name_t *parent;
ssize_t new_ptr = read_domain_name(msg, ptr + label_len, &parent);
ssize_t next_ptr = read_domain_name(msg, ptr + label_len, &parent);
if (!parent) {
goto label_fail;
}
if (parent->match && parent->match->head) {
if (parent->exact_match && parent->exact_match->head) {
domain_set_t *child;
for (child = parent->match->head; child; child = child->next) {
for (child = parent->exact_match->head; child; child = child->next) {
if (child->label_len == label_len &&
!memcmp(child->label, msg->raw + ptr, label_len)) {
break;
}
}
name->match = child;
if (child) {
name->match = name->exact_match = child;
} else {
name->exact_match = NULL;
name->match = parent->match;
}
ptr = new_ptr;
} else {
name->exact_match = NULL;
name->match = parent->match;
}
ptr = next_ptr;
}
name->next = msg->name_head;
@@ -379,28 +473,69 @@ static ssize_t read_domain_name(domain_msg_t *msg, ssize_t ptr,
ptr++;
if (ptr >= msg->len) {
*p_name = NULL;
goto ptr_fail;
return -1;
}
ssize_t new_ptr = ((label_len & 0x3f) << 8) | *(msg->raw + ptr);
if (new_ptr >= msg->len) {
size_t ref_ptr = ((label_len & 0x3f) << 8) | *(msg->raw + ptr);
if (ref_ptr >= msg->len) {
*p_name = NULL;
goto ptr_fail;
return -1;
}
ptr++;
if (read_domain_name(msg, new_ptr, p_name) < 0) {
goto ptr_fail;
} else {
return ptr;
while (ref_ptr < ptr - 2 && (*(msg->raw + ref_ptr) & 0xc0)) {
ref_ptr =
((*(msg->raw + ref_ptr) & 0x3f) << 8) | *(msg->raw + ref_ptr + 1);
}
ptr_fail:
for (name = msg->name_head; name; name = name->next) {
if (name->ptr == ref_ptr) {
*p_name = name;
break;
}
}
if (!name) {
*p_name = NULL;
return -1;
}
return ptr;
} else {
*p_name = NULL;
return -1;
}
}
static ssize_t copy_domain_name(domain_msg_t *msg, domain_name_t *name,
ssize_t new_ptr) {
if (name->new_ptr) {
if (new_ptr + 2 > msg->len) {
return -1;
}
*(msg->new_raw + new_ptr) = 0xc0 | ((name->new_ptr >> 8) & 0x3f);
*(msg->new_raw + new_ptr + 1) = name->new_ptr & 0xff;
return new_ptr + 2;
} else if (name->label_len == 0) {
if (new_ptr + 1 > msg->len) {
return -1;
}
name->new_ptr = new_ptr;
*(msg->new_raw + new_ptr) = 0;
new_ptr++;
return new_ptr;
} else {
if (new_ptr + 1 + name->label_len > msg->len) {
return -1;
}
name->new_ptr = new_ptr;
memcpy(msg->new_raw + new_ptr, msg->raw + name->ptr, name->label_len + 1);
new_ptr += name->label_len + 1;
if (!name->next) {
return -1;
}
return copy_domain_name(msg, name->next, new_ptr);
}
}
static void client_read(EV_P_ ev_io *w, int revents) {
client_ctx_t *client_ctx = (client_ctx_t *)w;
@@ -412,6 +547,7 @@ static void client_read(EV_P_ ev_io *w, int revents) {
msg->name_head = NULL;
msg->raw = malloc(MAX_MESSAGE_SIZE);
msg->new_raw = malloc(MAX_MESSAGE_SIZE);
msg->len = recvfrom(client_ctx->fd, msg->raw, MAX_MESSAGE_SIZE, MSG_TRUNC,
NULL, NULL);
@@ -419,63 +555,151 @@ static void client_read(EV_P_ ev_io *w, int revents) {
goto fail;
}
dns_header_t *header = (dns_header_t *)msg->raw;
memcpy(msg->new_raw, msg->raw, sizeof(dns_header_t));
dns_header_t *new_header = (dns_header_t *)msg->new_raw;
ssize_t new_ptr = MAX_MESSAGE_SIZE;
if (msg->len > MAX_MESSAGE_SIZE) {
header->flags = htons(ntohs(header->flags) | DNS_FLAG_TC);
memcpy(msg->new_raw, msg->raw, MAX_MESSAGE_SIZE);
new_header->flags = htons(ntohs(new_header->flags) | DNS_FLAG_TC);
goto send;
}
ssize_t ptr = sizeof(dns_header_t);
new_header->ns_count = 0;
new_header->ar_count = 0;
for (uint16_t i = 0; i < ntohs(header->qd_count); i++) {
ssize_t ptr = sizeof(dns_header_t);
new_ptr = sizeof(dns_header_t);
uint16_t new_an_count = 0;
for (uint16_t i = 0; i < ntohs(new_header->qd_count); i++) {
domain_name_t *name;
ssize_t new_ptr = read_domain_name(msg, ptr, &name);
ptr = read_domain_name(msg, ptr, &name);
if (ptr < 0) {
goto fail;
}
// Copy the query name to new response
new_ptr = copy_domain_name(msg, name, new_ptr);
if (new_ptr < 0) {
goto fail;
}
ptr = new_ptr;
// Skip type and class
// Copy type and class
if (ptr + 4 > msg->len) {
goto fail;
}
memcpy(msg->new_raw + new_ptr, msg->raw + ptr, 4);
new_ptr += 4;
ptr += 4;
}
for (uint16_t i = 0; i < ntohs(header->an_count); i++) {
for (uint16_t i = 0; i < ntohs(new_header->an_count); i++) {
domain_name_t *name;
ssize_t new_ptr = read_domain_name(msg, ptr, &name);
ptr = read_domain_name(msg, ptr, &name);
if (ptr < 0) {
goto fail;
}
if (ptr + sizeof(dns_answer_header_t) > msg->len) {
goto fail;
}
dns_answer_header_t *an_header = (dns_answer_header_t *)(msg->raw + ptr);
ptr += sizeof(dns_answer_header_t);
size_t record_len = ntohs(an_header->rd_len);
if (ptr + record_len > msg->len) {
goto fail;
}
if (name->alias->match && ntohs(an_header->an_type) == DNS_TYPE_CNAME) {
// Copy CNAME records
domain_name_t *cname;
ptr = read_domain_name(msg, ptr, &cname);
if (ptr < 0) {
goto fail;
}
cname->alias = name->alias;
new_ptr = copy_domain_name(msg, name, new_ptr);
if (new_ptr < 0) {
goto fail;
}
ptr = new_ptr;
dns_answer_header_t *new_an_header =
(dns_answer_header_t *)(msg->new_raw + new_ptr);
dns_answer_header_t *an_header = (dns_answer_header_t *)(msg->raw + ptr);
ptr += sizeof(dns_answer_header_t);
new_an_header->an_type = an_header->an_type;
new_an_header->an_class = an_header->an_class;
new_an_header->an_ttl = an_header->an_ttl;
if (name->match && ntohs(an_header->an_type) == DNS_TYPE_A &&
new_ptr += sizeof(dns_answer_header_t);
ssize_t next_new_ptr = copy_domain_name(msg, cname, new_ptr);
if (next_new_ptr < 0) {
goto fail;
}
new_an_header->rd_len = htons(next_new_ptr - new_ptr);
new_ptr = next_new_ptr;
new_an_count++;
} else if (name->alias->match &&
ntohs(an_header->an_class) == DNS_CLASS_IN &&
ntohs(an_header->rd_len) == 4) {
// Replace answer with fake ip
(ntohs(an_header->an_type) == DNS_TYPE_A ||
ntohs(an_header->an_type) == DNS_TYPE_AAAA)) {
// Replace address with fake ip, and set a short TTL
new_ptr = copy_domain_name(msg, name, new_ptr);
if (new_ptr < 0) {
goto fail;
}
uint32_t *p_addr = (uint32_t *)(msg->raw + ptr);
if (new_ptr + sizeof(dns_answer_header_t) + record_len > msg->len) {
goto fail;
}
dns_answer_header_t *new_an_header =
(dns_answer_header_t *)(msg->new_raw + new_ptr);
new_ptr += sizeof(dns_answer_header_t);
uint32_t real_addr = ntohl(*p_addr);
ip_nat_t *nat = find_nat(EV_A, client_ctx->ip_pool, real_addr);
new_an_header->an_type = an_header->an_type;
new_an_header->an_class = an_header->an_class;
sa_family_t family;
if (ntohs(an_header->an_type) == DNS_TYPE_A) {
family = AF_INET;
} else if (ntohs(an_header->an_type) == DNS_TYPE_AAAA) {
family = AF_INET6;
}
ip_nat_t *nat = nat_find(EV_A, client_ctx->ip_pool, family,
(ip_address_t *)(msg->raw + ptr));
new_an_header->rd_len = an_header->rd_len;
if (nat) {
*p_addr = htonl(nat->fake);
an_header->an_ttl = htonl(NAT_TTL);
new_an_header->an_ttl = htonl(NAT_TTL);
address_copy(family, (ip_address_t *)(msg->new_raw + new_ptr),
&nat->fake_addr);
} else {
new_an_header->an_ttl = an_header->an_ttl;
address_copy(family, (ip_address_t *)(msg->new_raw + new_ptr),
(ip_address_t *)(msg->raw + ptr));
}
ptr += record_len;
new_ptr += record_len;
new_an_count++;
} else {
ptr += record_len;
}
}
ptr += ntohs(an_header->rd_len);
}
new_header->an_count = htons(new_an_count);
send:
sendto(client_ctx->server_fd, msg->raw, msg->len, 0,
sendto(client_ctx->server_fd, msg->new_raw, new_ptr, 0,
(struct sockaddr *)&client_ctx->client_addr,
client_ctx->client_addr_len);
fail:
free_domain_names(msg->name_head);
free(msg->new_raw);
free(msg->raw);
close(client_ctx->fd);
@@ -621,8 +845,7 @@ static const char *parse_nft_string(const char *s) {
#define OPT_UPSTREAM_HOST 0x101
#define OPT_UPSTREAM_PORT 0x102
#define OPT_FAKE_SET 0x103
#define OPT_DAEMONIZE 0x104
#define OPT_DAEMONIZE 0x103
int main(int argc, char *const *argv) {
domain_set_t *domain_set = domain_set_new("", 0);
@@ -637,15 +860,15 @@ int main(int argc, char *const *argv) {
listen_addr.sin_family = AF_INET;
upstream_addr.sin_family = AF_INET;
const char *options = "h:p:d:x:";
const char *options = "H:p:d:4:6:";
struct option long_options[] = {
{"host", required_argument, NULL, 'h'},
{"host", required_argument, NULL, 'H'},
{"port", required_argument, NULL, 'p'},
{"domain", required_argument, NULL, 'd'},
{"prefix", required_argument, NULL, 'x'},
{"ipv4-prefix", required_argument, NULL, '4'},
{"ipv6-prefix", required_argument, NULL, '6'},
{"upstream-host", required_argument, NULL, OPT_UPSTREAM_HOST},
{"upstream-port", required_argument, NULL, OPT_UPSTREAM_PORT},
{"fake-set", required_argument, NULL, OPT_FAKE_SET},
{"daemonize", no_argument, NULL, OPT_DAEMONIZE},
{NULL, 0, NULL, 0}};
int option_index;
@@ -660,7 +883,7 @@ int main(int argc, char *const *argv) {
while ((o = getopt_long(argc, argv, options, long_options, &option_index)) !=
-1) {
switch (o) {
case 'h':
case 'H':
if (inet_pton(AF_INET, optarg, &listen_addr.sin_addr) != 1) {
goto fail;
} else {
@@ -677,26 +900,49 @@ int main(int argc, char *const *argv) {
goto fail;
}
break;
case 'x': {
case '4': {
char *sep = strchr(optarg, '/');
if (!sep) {
goto fail;
}
*sep = 0;
uint32_t af_addr;
unsigned pf_len;
if (inet_pton(AF_INET, optarg, &af_addr) != 1 ||
sscanf(sep + 1, "%u", &pf_len) == -1 || pf_len > 30) {
unsigned prefix_len;
if (inet_pton(AF_INET, optarg, &ip_pool.net) != 1 ||
sscanf(sep + 1, "%u", &prefix_len) == -1 || prefix_len > 30) {
goto fail;
}
uint32_t pf = ntohl(af_addr);
uint32_t pf_mask = ~((1 << (32 - pf_len)) - 1);
if (pf & ~pf_mask) {
ip_pool.net_mask.s_addr = htonl(~((1UL << (32 - prefix_len)) - 1));
ip_pool.net.s_addr &= ip_pool.net_mask.s_addr;
prefix_set++;
break;
}
case '6': {
char *sep = strchr(optarg, '/');
if (!sep) {
goto fail;
}
ip_pool_init(&ip_pool, pf, pf_mask);
prefix_set = 1;
*sep = 0;
unsigned prefix_len;
if (inet_pton(AF_INET6, optarg, &ip_pool.net6) != 1 ||
sscanf(sep + 1, "%u", &prefix_len) == -1 || prefix_len > 64) {
goto fail;
}
for (int i = 0; i < 4; i++) {
if (prefix_len <= i << 5) {
ip_pool.net6_mask.s6_addr32[i] = 0UL;
} else if (prefix_len >= (i + 1) << 5) {
ip_pool.net6_mask.s6_addr32[i] = 0xffffffffUL;
} else {
ip_pool.net6_mask.s6_addr32[i] =
htonl(~((1UL << (32 - (prefix_len - (i << 5)))) - 1));
}
ip_pool.net6.s6_addr32[i] &= ip_pool.net6_mask.s6_addr32[i];
}
prefix_set++;
break;
}
case OPT_UPSTREAM_HOST:
@@ -710,33 +956,6 @@ int main(int argc, char *const *argv) {
goto fail;
}
break;
case OPT_FAKE_SET: {
const char *s = optarg;
size_t t[3];
t[0] = parse_nft_string(s) - s;
if (s[t[0]] != '#' || t[0] == 0) {
goto fail;
}
t[1] = parse_nft_string(s + t[0] + 1) - s;
if (s[t[1]] != '#' || t[1] == t[0] + 1) {
goto fail;
}
t[2] = parse_nft_string(s + t[1] + 1) - s;
if (s[t[2]] != '\0' || t[2] == t[1] + 1) {
goto fail;
}
char *fake_set = malloc(t[2] + 1);
memcpy(fake_set, s, t[2] + 1);
fake_set[t[0]] = ' ';
fake_set[t[1]] = ' ';
nft_fake_set = fake_set;
break;
}
case OPT_DAEMONIZE:
if (daemon(0, 0) == -1) {
domain_set_fini(domain_set);
@@ -750,21 +969,34 @@ int main(int argc, char *const *argv) {
case '?':
fprintf(stderr, "Unrecognized option: %s\n", optarg);
opterr = 1;
// clang-format off
fprintf(stderr,
"Usage: %s -h LISTEN_HOST -p LISTEN_PORT\n"
" -x FAKE_IP_PREFIX [-d DOMAIN]\n"
" --upstream-host UPSTREAM_HOST\n"
" --upstream-port UPSTREAM_PORT\n"
" [ --fake-set FAKE_SET ]\n"
" [ --daemonize ]\n",
"Usage: %s [OPTIONS]\n\n"
"Required options:\n"
" -H, --host ADDR IPv4 address to listen on for DNS queries.\n"
" -4, --ipv4-prefix PREFIX IPv4 fake address prefix (e.g. 100.64.0.0/24).\n"
" -6, --ipv6-prefix PREFIX IPv6 fake address prefix (e.g. fd00::/64).\n"
" --upstream-host ADDR IPv4 address of upstream DNS server.\n\n"
"Optional options:\n"
" -p, --port PORT UDP port to listen on (default: 53).\n"
" --upstream-port PORT UDP port of upstream DNS server (default: 53).\n"
" -d, --domain DOMAIN Domain to rewrite (may be given multiple times).\n"
" --daemonize Run in background (daemonize).\n",
argv[0]);
// clang-format on
break;
}
}
if (!opterr && !(listen_host_set && prefix_set && upstream_host_set)) {
if (!opterr && !(listen_host_set && prefix_set == 2 && upstream_host_set)) {
// clang-format off
fprintf(stderr,
"LISTEN_ADDR, FAKE_IP_PREFIX, UPSTREAM_HOST must be set.\n");
"Missing required options. You must provide:\n"
" -H / --host (listen address),\n"
" -4 / --ipv4-prefix (fake IPv4 prefix),\n"
" -6 / --ipv6-prefix (fake IPv6 prefix),\n"
" --upstream-host (upstream DNS IPv4 address).\n");
// clang-format on
opterr = 1;
}
@@ -778,26 +1010,11 @@ int main(int argc, char *const *argv) {
nft_ctx = nft_ctx_new(NFT_CTX_DEFAULT);
if (!nft_ctx) {
perror("Failed to create nftables context.");
perror("Failed to create nftables context");
}
nft_ctx_output_set_flags(nft_ctx,
NFT_CTX_OUTPUT_HANDLE | NFT_CTX_OUTPUT_ECHO);
nft_ctx_output_set_flags(nft_ctx, NFT_CTX_OUTPUT_ECHO);
char *cmd = malloc_sprintf("add table ip %s", nft_nat_table);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
cmd = malloc_sprintf("add chain ip %s prerouting"
"{ type nat hook prerouting priority dstnat; }",
nft_nat_table);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
cmd = malloc_sprintf("add chain ip %s postrouting"
"{ type nat hook postrouting priority srcnat; }",
nft_nat_table);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
ip_pool_init(&ip_pool);
struct ev_loop *loop = ev_default_loop(0);
@@ -833,12 +1050,6 @@ int main(int argc, char *const *argv) {
ip_pool_fini(&ip_pool);
cmd = malloc_sprintf("delete table ip %s", nft_nat_table);
nft_run_cmd_from_buffer(nft_ctx, cmd);
free(cmd);
free(nft_fake_set);
nft_ctx_free(nft_ctx);
domain_set_fini(domain_set);

37
src/rules.nft Normal file
View File

@@ -0,0 +1,37 @@
table inet dotp {
map nat_map {
type ipv4_addr : ipv4_addr
}
map nat_map6 {
type ipv6_addr : ipv6_addr
}
chain prerouting {
type nat hook prerouting priority dstnat; policy accept;
dnat ip to ip daddr map @nat_map
dnat ip6 to ip6 daddr map @nat_map6
}
set nat_addr {
type ipv4_addr
}
set nat_addr6 {
type ipv6_addr
}
chain proxy {
meta mark set 1
meta l4proto tcp counter tproxy ip to 127.0.0.1:2040 accept
meta l4proto udp counter tproxy ip to 127.0.0.1:2040 accept
meta l4proto tcp counter tproxy ip6 to [::1]:2040 accept
meta l4proto udp counter tproxy ip6 to [::1]:2040 accept
}
chain proxy_dst_check {
type filter hook prerouting priority dstnat + 1; policy accept;
ip daddr @nat_addr goto proxy
ip6 daddr @nat_addr6 goto proxy
}
}