Commit Graph

62 Commits

Author SHA1 Message Date
rawdigits 3dea496c7f overlay/tio: KeepAlive poll-path readv/writev buffers too
The Poll fallback (used when IFF_VNET_HDR can't be enabled) has the
same unsafe-pointer-via-uintptr pattern as Offload.rawWrite:
readOne/writeOne build a [2]syscall.Iovec on the stack, pass it to
syscall.Syscall as uintptr, and the kernel then DMAs in/out of the
to/from slices whose Base pointers the iovec holds.

Escape analysis can't see that use, so under GC pressure the
backing memory could be collected or moved mid-syscall.

Add runtime.KeepAlive on the iovec and the user buffer around both
the SYS_READV and SYS_WRITEV syscalls. Same pattern and rationale
as the prior commit on the offload path.
2026-04-24 21:44:37 +00:00
rawdigits 7c38aa7e6b overlay/tio: KeepAlive writev iovec and payloads through the syscall
rawWrite passed the iovec pointer to syscall.Syscall as a uintptr, so
the Go compiler's escape analysis could not keep the underlying
[]unix.Iovec (or the payload slices its Base pointers reach) rooted
across the syscall. Under heavy sustained write load, GC could
collect or move these before tun_chr_write_iter finished reading
them, at which point the kernel read freed memory.

Observed on a UniFi UXG-Pro (Annapurna Labs Alpine V2, arm64, Linux
4.19.152) forwarding 1 Gbps iperf3 -R between LAN and a remote
Nebula peer, as two paired kernel warnings in the same second:

  refcount_t: underflow; use-after-free
    sock_wfree -> skb_release_head_state -> kfree_skb
    -> skb_release_data -> __kfree_skb -> tcp_recvmsg ...

  refcount_t: addition on 0; use-after-free
    skb_set_owner_w -> sock_alloc_send_pskb
    -> tun_get_user -> tun_chr_write_iter -> do_iter_write
    -> vfs_writev -> do_writev -> __arm64_sys_writev

The Annapurna watchdog then soft-rebooted the device. No crash or
kernel WARN after patching; box ran sustained 1 Gbps iperf3 -R
without issue.

Fix: add a variadic `keepAlive ...interface{}` parameter to
rawWrite, and call runtime.KeepAlive on the iovec plus every
supplied root after the syscall returns. writeWithScratch now
passes its buffer + iovec; WriteGSO passes the iovec array, the
header buffer, and the payload fragment slice.

runtime.KeepAlive is a compiler directive, not a runtime barrier,
so the cost is effectively zero: it just forces the compiler's
liveness analysis to treat the object as used at that point.
2026-04-24 21:40:51 +00:00
JackDoan 8fd724d762 fix? 2026-04-24 16:27:23 -05:00
JackDoan 90f2938f9c cruft 2026-04-23 13:12:24 -05:00
JackDoan f76ac2e216 fix tests 2026-04-23 11:35:51 -05:00
JackDoan 4104a48a86 checksum speed 2026-04-21 17:07:50 -05:00
JackDoan 50d6632845 fix 2026-04-21 14:52:28 -05:00
JackDoan 78af44068f typo! 2026-04-21 14:02:15 -05:00
JackDoan ad6b918e4d checkpt 2026-04-21 13:31:16 -05:00
JackDoan d0825514a0 GSO again 2026-04-21 11:25:47 -05:00
Nate Brown 8c71f2f3f9 FreeBSD tun needs to be non blocking as well (#1666) 2026-04-21 10:45:46 -05:00
Jack Doan e80b9830a3 Remove more os.Exit calls and give a more reliable wait for stop function (attempt 3) (#1661) 2026-04-20 16:08:26 -05:00
Jack Doan b3194236aa udp_linux: wrap socket operations with syscall.RawConn for clean teardown (#1654)
gofmt / Run gofmt (push) Failing after 3s
smoke-extra / Run extra smoke tests (push) Failing after 2s
smoke / Run multi node smoke test (push) Failing after 3s
Build and test / Build all and test on ubuntu-linux (push) Failing after 3s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
remove runtime.LockOSThread() because it makes things worse now

remove the "custom" Write() method from tun_linux.go, the stdlib path via os.File performs better

We should change our guidance around number of routines, ~2 per thread (that you wish to use for Nebula) seems to be about right now
2026-04-14 18:25:24 -05:00
Jack Doan 42bee7cf17 Report if Nebula start fails because of tun device name (#1588)
gofmt / Run gofmt (push) Failing after 2s
smoke-extra / Run extra smoke tests (push) Failing after 2s
smoke / Run multi node smoke test (push) Failing after 2s
Build and test / Build all and test on ubuntu-linux (push) Failing after 2s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
* specifically report if nebula start fails because of tun device name

* close all routines when closing the tun
2026-01-28 10:03:36 -06:00
Wade Simmons e1e92f017c initialize routesFromSystem (#1580)
gofmt / Run gofmt (push) Failing after 3s
smoke-extra / Run extra smoke tests (push) Failing after 2s
smoke / Run multi node smoke test (push) Failing after 3s
Build and test / Build all and test on ubuntu-linux (push) Failing after 2s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
This is a regression introduced by #1573. We need to initialize this
map.

Fixes: #1579
2026-01-20 11:15:20 -05:00
Nate Brown ac3bd9cdd0 Avoid losing system originated unsafe routes on reload (#1573)
gofmt / Run gofmt (push) Failing after 3s
smoke-extra / Run extra smoke tests (push) Failing after 2s
smoke / Run multi node smoke test (push) Failing after 3s
Build and test / Build all and test on ubuntu-linux (push) Failing after 2s
Build and test / Build and test on linux with boringcrypto (push) Failing after 3s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
2026-01-15 13:48:17 -06:00
Bryan Lee 12cf348c80 feat: support via gateway for v6 multihop for v4 routes (#1521)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
2025-11-19 22:21:03 -06:00
Nate Brown 7aff313a17 Relax the restriction on routines from the config (#1531) 2025-11-19 13:10:11 -06:00
Nate Brown 45c1d3eab3 Support for multi proto tun device on OpenBSD (#1495) 2025-10-08 16:56:42 -05:00
Nate Brown eb89839d13 Support for multi proto tun device on NetBSD (#1492) 2025-10-07 20:17:50 -05:00
Nate Brown fb7f0c3657 Use x/net/route to manage routes directly (#1488) 2025-10-03 10:59:53 -05:00
sl274 b1f53d8d25 Support IPv6 tunneling in FreeBSD (#1399)
Recent merge of cert-v2 support introduced the ability to tunnel IPv6. However, FreeBSD's IPv6 tunneling does not work for 2 reasons:
* The ifconfig commands did not work for IPv6 addresses
* The tunnel device was not configured for link-layer mode, so it only supported IPv4

This PR improves FreeBSD tunneling support in 3 ways:
* Use ioctl instead of exec'ing ifconfig to configure the interface, with additional logic to support IPv6
* Configure the tunnel in link-layer mode, allowing IPv6 traffic
* Use readv() and writev() to communicate with the tunnel device, to avoid the need to copy the packet buffer
2025-10-02 21:54:30 -05:00
Jack Doan 65cc253c19 prevent linux from assigning ipv6 link-local addresses (#1476) 2025-09-09 13:25:23 -05:00
Jack Doan 768325c9b4 cert-v2 chores (#1466) 2025-09-05 15:08:22 -05:00
Wade Simmons 5cff83b282 netlink: ignore route updates with no destination (#1437)
Currently we assume each route update must have a destination, but we
should check that it is set before we try to use it.

See: #1436
2025-08-25 13:05:35 -05:00
Andriyanov Nikita e5ce8966d6 add netlink options (#1326)
* add netlink options

* force use buffer

* fix namings and add config examples

* fix linter
2025-04-21 13:44:33 -04:00
Wade Simmons 36bc9dd261 fix parseUnsafeRoutes for yaml.v3 (#1371)
We switched to yaml.v3 with #1148, but missed this spot that was still
casting into `map[any]any` when yaml.v3 makes it `map[string]any`. Also
clean up a few more `interface{}` that were added as we changed them all
to `any` with #1148.
2025-04-01 09:49:26 -04:00
Wade Simmons 879852c32a upgrade to yaml.v3 (#1148)
gofmt / Run gofmt (push) Successful in 37s
smoke-extra / Run extra smoke tests (push) Failing after 20s
smoke / Run multi node smoke test (push) Failing after 1m25s
Build and test / Build all and test on ubuntu-linux (push) Failing after 18m51s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2m44s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2m27s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
* upgrade to yaml.v3

The main nice fix here is that maps unmarshal into `map[string]any`
instead of `map[any]any`, so it cleans things up a bit.

* add config.AsBool

Since yaml.v3 doesn't automatically convert yes to bool now, for
backwards compat

* use type aliases for m

* more cleanup

* more cleanup

* more cleanup

* go mod cleanup
2025-03-31 16:08:34 -04:00
dioss-Machiel f86953ca56 Implement ECMP for unsafe_routes (#1332)
gofmt / Run gofmt (push) Successful in 27s
smoke-extra / Run extra smoke tests (push) Failing after 18s
smoke / Run multi node smoke test (push) Failing after 1m26s
Build and test / Build all and test on ubuntu-linux (push) Failing after 21m43s
Build and test / Build and test on linux with boringcrypto (push) Failing after 3m45s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2m59s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
2025-03-24 17:15:59 -05:00
Caleb Jasik 088af8edb2 Enable running testifylint in CI (#1350)
gofmt / Run gofmt (push) Successful in 10s
smoke-extra / Run extra smoke tests (push) Failing after 18s
smoke / Run multi node smoke test (push) Failing after 1m28s
Build and test / Build all and test on ubuntu-linux (push) Failing after 19m44s
Build and test / Build and test on linux with boringcrypto (push) Failing after 3m1s
Build and test / Build and test on linux with pkcs11 (push) Failing after 3m6s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
2025-03-10 17:38:14 -05:00
Caleb Jasik 612637f529 Fix testifylint lint errors (#1321)
gofmt / Run gofmt (push) Successful in 11s
smoke-extra / Run extra smoke tests (push) Failing after 19s
smoke / Run multi node smoke test (push) Failing after 1m28s
Build and test / Build all and test on ubuntu-linux (push) Failing after 19m3s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2m44s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2m54s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
* Fix bool-compare

* Fix empty

* Fix encoded-compare

* Fix error-is-as

* Fix error-nil

* Fix expected-actual

* Fix len
2025-03-10 10:18:34 -04:00
Nate Brown d97ed57a19 V2 certificate format (#1216)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Jack Doan <jackdoan@rivian.com>
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
Co-authored-by: Jack Doan <me@jackdoan.com>
2025-03-06 11:28:26 -06:00
Nate Brown e264a0ff88 Switch most everything to netip in prep for ipv6 in the overlay (#1173) 2024-07-31 10:18:56 -05:00
John Maguire b5c3486796 Push Docker images as part of the release workflow (#1037) 2024-05-02 09:37:11 -04:00
Nate Brown bbb15f8cb1 Unsafe route reload (#1083) 2024-03-28 15:17:28 -05:00
John Maguire af2fc48378 Fix mobile builds (#1035) 2023-12-06 16:18:21 -05:00
Tristan Rice 1083279a45 add gvisor based service library (#965)
* add service/ library
2023-11-21 11:50:18 -05:00
Nate Brown 5181cb0474 Use generics for CIDRTrees to avoid casting issues (#1004) 2023-11-02 17:05:08 -05:00
Nate Brown 5fccbb8676 Retry wintun creation (#985) 2023-10-16 10:06:43 -05:00
Nate Brown 0bffa76b5e Build for openbsd (#812) 2023-07-27 14:27:35 -05:00
c0repwn3r 03e70210a5 Add support for NetBSD (#916) 2023-07-27 13:44:47 -05:00
Nate Brown 9c6592b159 Guard e2e udp and tun channels when closed (#934) 2023-07-26 12:52:14 -05:00
John Maguire 8ba5d64dbc Add support for naming FreeBSD tun devices (#903) 2023-06-22 12:13:31 -04:00
Nate Brown a9cb2e06f4 Add ability to respect the system route table for unsafe route on linux (#839) 2023-05-09 10:36:55 -05:00
Nate Brown 397fe5f879 Add ability to skip installing unsafe routes on the os routing table (#831) 2023-04-10 12:32:37 -05:00
brad-defined 2801fb2286 Fix relay (#827)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
2023-03-30 11:09:20 -05:00
Wade Simmons 6e0ae4f9a3 firewall: add option to send REJECT replies (#738)
* firewall: add option to send REJECT replies

This change allows you to configure the firewall to send REJECT packets
when a packet is denied.

    firewall:
      # Action to take when a packet is not allowed by the firewall rules.
      # Can be one of:
      #   `drop` (default): silently drop the packet.
      #   `reject`: send a reject reply.
      #     - For TCP, this will be a RST "Connection Reset" packet.
      #     - For other protocols, this will be an ICMP port unreachable packet.
      outbound_action: drop
      inbound_action: drop

These packets are only sent to established tunnels, and only on the
overlay network (currently IPv4 only).

    $ ping -c1 192.168.100.3
    PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
    From 192.168.100.3 icmp_seq=2 Destination Port Unreachable

    --- 192.168.100.3 ping statistics ---
    2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 31ms

    $ nc -nzv 192.168.100.3 22
    (UNKNOWN) [192.168.100.3] 22 (?) : Connection refused

This change also modifies the smoke test to capture tcpdump pcaps from
both the inside and outside to inspect what is going on over the wire.
It also now does TCP and UDP packet tests using the Nmap version of
ncat.

* calculate seq and ack the same was as the kernel

The logic a bit confusing, so we copy it straight from how the kernel
does iptables `--reject-with tcp-reset`:

- https://github.com/torvalds/linux/blob/v5.19/net/ipv4/netfilter/nf_reject_ipv4.c#L193-L221

* cleanup
2023-03-13 15:08:40 -04:00
Nate Brown 92cc32f844 Remove handshake race avoidance (#820)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-03-13 12:35:14 -05:00
John Maguire 85f5849d0b Fix a hang when shutting down Android (#772) 2022-11-11 10:18:43 -06:00
Nate Brown feb3e1317f Add a simple benchmark to e2e tests (#739) 2022-09-01 09:44:58 -05:00