Commit Graph

65 Commits

Author SHA1 Message Date
Wade Simmons
510a8912a9 Merge remote-tracking branch 'origin/master' into multiport 2025-12-04 15:22:14 -05:00
Nate Brown
56067afca2 Stab at better logging when a relay is being used (#1533)
Some checks failed
gofmt / Run gofmt (push) Failing after 5s
smoke-extra / Run extra smoke tests (push) Failing after 2s
smoke / Run multi node smoke test (push) Failing after 3s
Build and test / Build all and test on ubuntu-linux (push) Failing after 2s
Build and test / Build and test on linux with boringcrypto (push) Failing after 3s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
2025-12-03 17:48:29 -06:00
Jack Doan
a89f95182c Firewall types and cross-stack subnet stuff (#1509)
* firewall can distinguish if the host connecting has an overlapping network, is a VPN peer without an overlapping network, or is a unsafe network

* Cross stack subnet stuff (#1512)

* experiment with not filtering out non-common addresses in hostinfo.networks

* allow handshakes without overlaps

* unsafe network test

* change HostInfo.buildNetworks argument to reference the cert
2025-11-12 13:40:20 -06:00
Jack Doan
01909f4715 try to make certificate addition/removal reloadable in some cases (#1468)
* try to make certificate addition/removal reloadable in some cases

* very spicy change to respond to handshakes with cert versions we cannot match with a cert that we can indeed match

* even spicier change to rehandshake if we detect our cert is lower-version than our peer, and we have a newer-version cert available

* make tryRehandshake easier to understand
2025-11-03 19:38:44 -06:00
Wade Simmons
ae9de47dd9 Merge remote-tracking branch 'origin/master' into multiport 2025-07-11 12:57:52 -04:00
brad-defined
b158eb0c4c Use a list for relay IPs instead of a map (#1423)
* Use a list for relay IPs instead of a map

* linter
2025-07-02 08:47:05 -04:00
Wade Simmons
b8ea55eb90 optimize usage of bart (#1395)
Some checks failed
gofmt / Run gofmt (push) Successful in 9s
smoke-extra / Run extra smoke tests (push) Failing after 19s
smoke / Run multi node smoke test (push) Failing after 1m19s
Build and test / Build all and test on ubuntu-linux (push) Failing after 18m41s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2m47s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2m47s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
Use `bart.Lite` and `.Contains` as suggested by the bart maintainer:

- 9455952eed (commitcomment-155362580)
2025-04-18 12:37:20 -04:00
Wade Simmons
4eb86afa54 Merge remote-tracking branch 'origin/master' into multiport 2025-03-07 14:01:35 -05:00
Nate Brown
f8734ffa43 Improve logging when handshaking with an invalid cert (#1345) 2025-03-07 10:45:31 -06:00
Wade Simmons
f36db374ac Merge remote-tracking branch 'origin/master' into multiport 2025-03-06 16:11:32 -05:00
Nate Brown
d97ed57a19 V2 certificate format (#1216)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Jack Doan <jackdoan@rivian.com>
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
Co-authored-by: Jack Doan <me@jackdoan.com>
2025-03-06 11:28:26 -06:00
Nate Brown
08ac65362e Cert interface (#1212) 2024-10-10 18:00:22 -05:00
Wade Simmons
dabce8a1b4 Merge tag 'v1.9.4' into multiport
1.9.4 Release
2024-09-13 10:17:59 -04:00
Jack Doan
248cf194cd fix integer wraparound in the calculation of handshake timeouts on 32-bit targets (#1185)
Fixes: #1169
2024-08-13 09:25:18 -04:00
Wade Simmons
f5f6c269ac fix rare panic when local index collision happens (#1191)
A local index collision happens when two tunnels attempt to use the same
random int32 index ID. This is a rare chance, and we have code to deal
with it, but we have a panic because we return the wrong thing in this
case. This change should fix the panic.
2024-08-07 11:53:32 -04:00
Nate Brown
e264a0ff88 Switch most everything to netip in prep for ipv6 in the overlay (#1173) 2024-07-31 10:18:56 -05:00
Wade Simmons
6b78e9cdb3 Merge remote-tracking branch 'origin/master' into multiport 2024-07-10 13:38:11 -04:00
Wade Simmons
4eb1da0958 remove deadlock in GetOrHandshake (#1151)
We had a rare deadlock in GetOrHandshake because we kept the hostmap
lock when we do the call to StartHandshake. StartHandshake can block
while sending to the lighthouse query worker channel, and that worker
needs to be able to grab the hostmap lock to do its work. Other calls
for StartHandshake don't hold the hostmap lock so we should be able to
drop it here.

This lock was originally added with: https://github.com/slackhq/nebula/pull/954
2024-05-29 12:52:52 -04:00
Wade Simmons
b445d14ddb Merge remote-tracking branch 'origin/master' into multiport 2024-05-08 11:22:19 -04:00
Wade Simmons
7efa750aef avoid deadlock in lighthouse queryWorker (#1112)
* avoid deadlock in lighthouse queryWorker

If the lighthouse queryWorker tries to grab to call StartHandshake on
a lighthouse vpnIp, we can deadlock on the handshake_manager lock. This
change drops the handshake_manager lock before we send on the lighthouse
queryChan (which could block), and also avoids sending to the channel if
this is a lighthouse IP itself.

* need to hold lock during cacheCb
2024-04-11 17:00:01 -04:00
Nate Brown
a390125935 Support reloading preferred_ranges (#1043) 2024-04-03 22:14:51 -05:00
Wade Simmons
659d7fece6 Merge tag 'v1.8.2' into multiport
1.8.2 Release
2024-01-26 10:45:15 -05:00
Nate Brown
072edd56b3 Fix re-entrant GetOrHandshake issues (#1044) 2023-12-19 11:58:31 -06:00
Nate Brown
a44e1b8b05 Clean up a hostinfo to reduce memory usage (#955) 2023-11-02 16:53:59 -05:00
Wade Simmons
f2aef0d6eb Merge remote-tracking branch 'origin/master' into multiport 2023-10-27 08:48:13 -04:00
Nate Brown
50d6a1e8ca QueryServer needs to be done outside of the lock (#996) 2023-10-17 15:43:51 -05:00
Nate Brown
076ebc6c6e Simplify getting a hostinfo or starting a handshake with one (#954) 2023-08-21 18:51:45 -05:00
Nate Brown
7edcf620c0 We only need the certificate in ConnectionState (#953) 2023-08-21 14:11:06 -05:00
Nate Brown
a10baeee92 Pull hostmap and pending hostmap apart, remove unused functions (#843) 2023-07-24 12:37:52 -05:00
Nate Brown
3bbf5f4e67 Use an interface for udp conns (#901) 2023-06-14 10:48:52 -05:00
Wade Simmons
0e593ad582 Merge branch 'master' into multiport 2023-05-09 15:37:30 -04:00
Nate Brown
03e4a7f988 Rehandshaking (#838)
Co-authored-by: Brad Higgins <brad@defined.net>
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-05-04 15:16:37 -05:00
Wade Simmons
28ecfcbc03 Merge remote-tracking branch 'origin/master' into multiport 2023-05-03 10:50:06 -04:00
brad-defined
9b03053191 update EncReader and EncWriter interface function args to have concrete types (#844)
* Update LightHouseHandlerFunc to remove EncWriter param.
* Move EncWriter to interface
* EncReader, too
2023-04-07 14:28:37 -04:00
Nate Brown
d3fe3efcb0 Fix handshake retry regression (#842) 2023-04-05 10:04:30 -05:00
Wade Simmons
e71059a410 Merge remote-tracking branch 'origin/master' into multiport 2023-04-03 11:30:41 -04:00
Nate Brown
ee8e1348e9 Use connection manager to drive NAT maintenance (#835)
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
2023-03-31 15:45:05 -05:00
Nate Brown
1a6c657451 Normalize logs (#837) 2023-03-30 15:07:31 -05:00
brad-defined
2801fb2286 Fix relay (#827)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
2023-03-30 11:09:20 -05:00
Nate Brown
f0ef80500d Remove dead code and re-order transit from pending to main hostmap on stage 2 (#828) 2023-03-17 15:36:24 -05:00
Wade Simmons
e1af37e46d add calculated_remotes (#759)
* add calculated_remotes

This setting allows us to "guess" what the remote might be for a host
while we wait for the lighthouse response. For networks that hard
designed with in mind, it can help speed up handshake performance, as well as
improve resiliency in the case that all lighthouses are down.

Example:

    lighthouse:
      # ...

      calculated_remotes:
        # For any Nebula IPs in 10.0.10.0/24, this will apply the mask and add
        # the calculated IP as an initial remote (while we wait for the response
        # from the lighthouse). Both CIDRs must have the same mask size.
        # For example, Nebula IP 10.0.10.123 will have a calculated remote of
        # 192.168.1.123

        10.0.10.0/24:
          - mask: 192.168.1.0/24
            port: 4242

* figure out what is up with this test

* add test

* better logic for sending handshakes

Keep track of the last light of hosts we sent handshakes to. Only log
handshake sent messages if the list has changed.

Remove the test Test_NewHandshakeManagerTrigger because it is faulty and
makes no sense. It relys on the fact that no handshake packets actually
get sent, but with these changes we would send packets now (which it
should!)

* use atomic.Pointer

* cleanup to make it clearer

* fix typo in example
2023-03-13 15:09:08 -04:00
Wade Simmons
aec7f5f865 Merge remote-tracking branch 'origin/master' into multiport 2023-03-13 15:07:32 -04:00
Nate Brown
92cc32f844 Remove handshake race avoidance (#820)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-03-13 12:35:14 -05:00
Nate Brown
5278b6f926 Generic timerwheel (#804) 2023-01-18 10:56:42 -06:00
Caleb Jasik
12dbbd3dd3 Fix typos found by https://github.com/crate-ci/typos (#735) 2022-12-19 11:28:27 -06:00
Wade Simmons
326fc8758d Support multiple UDP source ports (multiport)
The goal of this work is to send packets between two hosts using more than one
5-tuple. When running on networks like AWS where the underlying network driver
and overlay fabric makes routing, load balancing, and failover decisions based
on the flow hash, this enables more than one flow between pairs of hosts.

Multiport spreads outgoing UDP packets across multiple UDP send ports,
which allows nebula to work around any issues on the underlay network.
Some example issues this could work around:

- UDP rate limits on a per flow basis.
- Partial underlay network failure in which some flows work and some don't

Agreement is done during the handshake to decide if multiport mode will
be used for a given tunnel (one side must have tx_enabled set, the other
side must have rx_enabled set)

NOTE: you cannot use multiport on a host if you are relying on UDP hole
punching to get through a NAT or firewall.

NOTE: Linux only (uses raw sockets to send). Also currently only works
with IPv4 underlay network remotes.

This is implemented by opening a raw socket and sending packets with
a source port that is based on a hash of the overlay source/destiation
port. For ICMP and Nebula metadata packets, we use a random source port.

Example configuration:

    multiport:
      # This host support sending via multiple UDP ports.
      tx_enabled: false

      # This host supports receiving packets sent from multiple UDP ports.
      rx_enabled: false

      # How many UDP ports to use when sending. The lowest source port will be
      # listen.port and go up to (but not including) listen.port + tx_ports.
      tx_ports: 100

      # NOTE: All of your hosts must be running a version of Nebula that supports
      # multiport if you want to enable this feature. Older versions of Nebula
      # will be confused by these multiport handshakes.
      #
      # If handshakes are not getting a response, attempt to transmit handshakes
      # using random UDP source ports (to get around partial underlay network
      # failures).
      tx_handshake: false

      # How many unresponded handshakes we should send before we attempt to
      # send multiport handshakes.
      tx_handshake_delay: 2
2022-10-17 12:58:06 -04:00
brad-defined
1a7c575011 Relay (#678)
Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>
2022-06-21 13:35:23 -05:00
Wade Simmons
304b12f63f create ConnectionState before adding to HostMap (#535)
We have a few small race conditions with creating the HostInfo.ConnectionState
since we add the host info to the pendingHostMap before we set this
field. We can make everything a lot easier if we just add an "init"
function so that we can set this field in the hostinfo before we add it
to the hostmap.
2021-11-08 14:46:22 -05:00
Nate Brown
bcabcfdaca Rework some things into packages (#489) 2021-11-03 20:54:04 -05:00
brad-defined
6ae8ba26f7 Add a context object in nebula.Main to clean up on error (#550) 2021-11-02 13:14:26 -05:00