rawWrite passed the iovec pointer to syscall.Syscall as a uintptr, so
the Go compiler's escape analysis could not keep the underlying
[]unix.Iovec (or the payload slices its Base pointers reach) rooted
across the syscall. Under heavy sustained write load, GC could
collect or move these before tun_chr_write_iter finished reading
them, at which point the kernel read freed memory.
Observed on a UniFi UXG-Pro (Annapurna Labs Alpine V2, arm64, Linux
4.19.152) forwarding 1 Gbps iperf3 -R between LAN and a remote
Nebula peer, as two paired kernel warnings in the same second:
refcount_t: underflow; use-after-free
sock_wfree -> skb_release_head_state -> kfree_skb
-> skb_release_data -> __kfree_skb -> tcp_recvmsg ...
refcount_t: addition on 0; use-after-free
skb_set_owner_w -> sock_alloc_send_pskb
-> tun_get_user -> tun_chr_write_iter -> do_iter_write
-> vfs_writev -> do_writev -> __arm64_sys_writev
The Annapurna watchdog then soft-rebooted the device. No crash or
kernel WARN after patching; box ran sustained 1 Gbps iperf3 -R
without issue.
Fix: add a variadic `keepAlive ...interface{}` parameter to
rawWrite, and call runtime.KeepAlive on the iovec plus every
supplied root after the syscall returns. writeWithScratch now
passes its buffer + iovec; WriteGSO passes the iovec array, the
header buffer, and the payload fragment slice.
runtime.KeepAlive is a compiler directive, not a runtime barrier,
so the cost is effectively zero: it just forces the compiler's
liveness analysis to treat the object as used at that point.
remove runtime.LockOSThread() because it makes things worse now
remove the "custom" Write() method from tun_linux.go, the stdlib path via os.File performs better
We should change our guidance around number of routines, ~2 per thread (that you wish to use for Nebula) seems to be about right now
Recent merge of cert-v2 support introduced the ability to tunnel IPv6. However, FreeBSD's IPv6 tunneling does not work for 2 reasons:
* The ifconfig commands did not work for IPv6 addresses
* The tunnel device was not configured for link-layer mode, so it only supported IPv4
This PR improves FreeBSD tunneling support in 3 ways:
* Use ioctl instead of exec'ing ifconfig to configure the interface, with additional logic to support IPv6
* Configure the tunnel in link-layer mode, allowing IPv6 traffic
* Use readv() and writev() to communicate with the tunnel device, to avoid the need to copy the packet buffer
We switched to yaml.v3 with #1148, but missed this spot that was still
casting into `map[any]any` when yaml.v3 makes it `map[string]any`. Also
clean up a few more `interface{}` that were added as we changed them all
to `any` with #1148.
* upgrade to yaml.v3
The main nice fix here is that maps unmarshal into `map[string]any`
instead of `map[any]any`, so it cleans things up a bit.
* add config.AsBool
Since yaml.v3 doesn't automatically convert yes to bool now, for
backwards compat
* use type aliases for m
* more cleanup
* more cleanup
* more cleanup
* go mod cleanup
* firewall: add option to send REJECT replies
This change allows you to configure the firewall to send REJECT packets
when a packet is denied.
firewall:
# Action to take when a packet is not allowed by the firewall rules.
# Can be one of:
# `drop` (default): silently drop the packet.
# `reject`: send a reject reply.
# - For TCP, this will be a RST "Connection Reset" packet.
# - For other protocols, this will be an ICMP port unreachable packet.
outbound_action: drop
inbound_action: drop
These packets are only sent to established tunnels, and only on the
overlay network (currently IPv4 only).
$ ping -c1 192.168.100.3
PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
From 192.168.100.3 icmp_seq=2 Destination Port Unreachable
--- 192.168.100.3 ping statistics ---
2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 31ms
$ nc -nzv 192.168.100.3 22
(UNKNOWN) [192.168.100.3] 22 (?) : Connection refused
This change also modifies the smoke test to capture tcpdump pcaps from
both the inside and outside to inspect what is going on over the wire.
It also now does TCP and UDP packet tests using the Nmap version of
ncat.
* calculate seq and ack the same was as the kernel
The logic a bit confusing, so we copy it straight from how the kernel
does iptables `--reject-with tcp-reset`:
- https://github.com/torvalds/linux/blob/v5.19/net/ipv4/netfilter/nf_reject_ipv4.c#L193-L221
* cleanup