59 Commits

Author SHA1 Message Date
Wade Simmons
1c9fdba403 Merge remote-tracking branch 'origin/master' into mutex-debug 2025-04-02 09:22:18 -04:00
Nate Brown
f8734ffa43
Improve logging when handshaking with an invalid cert (#1345) 2025-03-07 10:45:31 -06:00
Nate Brown
d97ed57a19
V2 certificate format (#1216)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Jack Doan <jackdoan@rivian.com>
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
Co-authored-by: Jack Doan <me@jackdoan.com>
2025-03-06 11:28:26 -06:00
Nate Brown
08ac65362e
Cert interface (#1212) 2024-10-10 18:00:22 -05:00
Jack Doan
248cf194cd
fix integer wraparound in the calculation of handshake timeouts on 32-bit targets (#1185)
Fixes: #1169
2024-08-13 09:25:18 -04:00
Wade Simmons
f5f6c269ac
fix rare panic when local index collision happens (#1191)
A local index collision happens when two tunnels attempt to use the same
random int32 index ID. This is a rare chance, and we have code to deal
with it, but we have a panic because we return the wrong thing in this
case. This change should fix the panic.
2024-08-07 11:53:32 -04:00
Nate Brown
e264a0ff88
Switch most everything to netip in prep for ipv6 in the overlay (#1173) 2024-07-31 10:18:56 -05:00
Wade Simmons
4eb1da0958
remove deadlock in GetOrHandshake (#1151)
We had a rare deadlock in GetOrHandshake because we kept the hostmap
lock when we do the call to StartHandshake. StartHandshake can block
while sending to the lighthouse query worker channel, and that worker
needs to be able to grab the hostmap lock to do its work. Other calls
for StartHandshake don't hold the hostmap lock so we should be able to
drop it here.

This lock was originally added with: https://github.com/slackhq/nebula/pull/954
2024-05-29 12:52:52 -04:00
Wade Simmons
7efa750aef
avoid deadlock in lighthouse queryWorker (#1112)
* avoid deadlock in lighthouse queryWorker

If the lighthouse queryWorker tries to grab to call StartHandshake on
a lighthouse vpnIp, we can deadlock on the handshake_manager lock. This
change drops the handshake_manager lock before we send on the lighthouse
queryChan (which could block), and also avoids sending to the channel if
this is a lighthouse IP itself.

* need to hold lock during cacheCb
2024-04-11 17:00:01 -04:00
Wade Simmons
dffaaf38d4 Merge branch 'lighthouse-query-chan-lock' into mutex-debug 2024-04-11 13:25:09 -04:00
Wade Simmons
2ff26b261d need to hold lock during cacheCb 2024-04-11 13:02:13 -04:00
Wade Simmons
c7f1bed882 avoid deadlock in lighthouse queryWorker
If the lighthouse queryWorker tries to grab to call StartHandshake on
a lighthouse vpnIp, we can deadlock on the handshake_manager lock. This
change drops the handshake_manager lock before we send on the lighthouse
queryChan (which could block), and also avoids sending to the channel if
this is a lighthouse IP itself.
2024-04-11 12:58:25 -04:00
Wade Simmons
0ccfad1a1e Merge remote-tracking branch 'origin/master' into mutex-debug 2024-04-11 12:15:52 -04:00
Nate Brown
a390125935
Support reloading preferred_ranges (#1043) 2024-04-03 22:14:51 -05:00
Wade Simmons
1be8dc43a7 more 2024-02-05 11:13:20 -05:00
Wade Simmons
91ec6bb1ff Merge remote-tracking branch 'origin/master' into mutex-debug 2023-12-19 13:30:40 -05:00
Nate Brown
072edd56b3
Fix re-entrant GetOrHandshake issues (#1044) 2023-12-19 11:58:31 -06:00
Wade Simmons
6f27f46965 simplify 2023-12-19 09:10:00 -05:00
Wade Simmons
bcaefce4ac more types 2023-12-18 22:38:52 -05:00
Wade Simmons
540a171ef8 WIP more locks 2023-12-18 22:28:24 -05:00
Wade Simmons
4d88c0711a gofmt 2023-12-18 21:04:05 -05:00
Wade Simmons
fdb78044ba Merge remote-tracking branch 'origin/master' into mutex-debug 2023-12-17 09:19:48 -05:00
Nate Brown
a44e1b8b05
Clean up a hostinfo to reduce memory usage (#955) 2023-11-02 16:53:59 -05:00
Nate Brown
50d6a1e8ca
QueryServer needs to be done outside of the lock (#996) 2023-10-17 15:43:51 -05:00
Nate Brown
076ebc6c6e
Simplify getting a hostinfo or starting a handshake with one (#954) 2023-08-21 18:51:45 -05:00
Nate Brown
7edcf620c0
We only need the certificate in ConnectionState (#953) 2023-08-21 14:11:06 -05:00
Wade Simmons
4c89b3c6a3 cleanup 2023-08-21 13:09:25 -04:00
Nate Brown
a10baeee92
Pull hostmap and pending hostmap apart, remove unused functions (#843) 2023-07-24 12:37:52 -05:00
Nate Brown
3bbf5f4e67
Use an interface for udp conns (#901) 2023-06-14 10:48:52 -05:00
Nate Brown
03e4a7f988
Rehandshaking (#838)
Co-authored-by: Brad Higgins <brad@defined.net>
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-05-04 15:16:37 -05:00
brad-defined
9b03053191
update EncReader and EncWriter interface function args to have concrete types (#844)
* Update LightHouseHandlerFunc to remove EncWriter param.
* Move EncWriter to interface
* EncReader, too
2023-04-07 14:28:37 -04:00
Nate Brown
d3fe3efcb0
Fix handshake retry regression (#842) 2023-04-05 10:04:30 -05:00
Nate Brown
ee8e1348e9
Use connection manager to drive NAT maintenance (#835)
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
2023-03-31 15:45:05 -05:00
Nate Brown
1a6c657451
Normalize logs (#837) 2023-03-30 15:07:31 -05:00
brad-defined
2801fb2286
Fix relay (#827)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
2023-03-30 11:09:20 -05:00
Nate Brown
f0ef80500d
Remove dead code and re-order transit from pending to main hostmap on stage 2 (#828) 2023-03-17 15:36:24 -05:00
Wade Simmons
e1af37e46d
add calculated_remotes (#759)
* add calculated_remotes

This setting allows us to "guess" what the remote might be for a host
while we wait for the lighthouse response. For networks that hard
designed with in mind, it can help speed up handshake performance, as well as
improve resiliency in the case that all lighthouses are down.

Example:

    lighthouse:
      # ...

      calculated_remotes:
        # For any Nebula IPs in 10.0.10.0/24, this will apply the mask and add
        # the calculated IP as an initial remote (while we wait for the response
        # from the lighthouse). Both CIDRs must have the same mask size.
        # For example, Nebula IP 10.0.10.123 will have a calculated remote of
        # 192.168.1.123

        10.0.10.0/24:
          - mask: 192.168.1.0/24
            port: 4242

* figure out what is up with this test

* add test

* better logic for sending handshakes

Keep track of the last light of hosts we sent handshakes to. Only log
handshake sent messages if the list has changed.

Remove the test Test_NewHandshakeManagerTrigger because it is faulty and
makes no sense. It relys on the fact that no handshake packets actually
get sent, but with these changes we would send packets now (which it
should!)

* use atomic.Pointer

* cleanup to make it clearer

* fix typo in example
2023-03-13 15:09:08 -04:00
Nate Brown
92cc32f844
Remove handshake race avoidance (#820)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-03-13 12:35:14 -05:00
Nate Brown
5278b6f926
Generic timerwheel (#804) 2023-01-18 10:56:42 -06:00
Caleb Jasik
12dbbd3dd3
Fix typos found by https://github.com/crate-ci/typos (#735) 2022-12-19 11:28:27 -06:00
brad-defined
1a7c575011
Relay (#678)
Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>
2022-06-21 13:35:23 -05:00
Wade Simmons
304b12f63f
create ConnectionState before adding to HostMap (#535)
We have a few small race conditions with creating the HostInfo.ConnectionState
since we add the host info to the pendingHostMap before we set this
field. We can make everything a lot easier if we just add an "init"
function so that we can set this field in the hostinfo before we add it
to the hostmap.
2021-11-08 14:46:22 -05:00
Nate Brown
bcabcfdaca
Rework some things into packages (#489) 2021-11-03 20:54:04 -05:00
brad-defined
6ae8ba26f7
Add a context object in nebula.Main to clean up on error (#550) 2021-11-02 13:14:26 -05:00
John Maguire
98c391396c
Remove log when no handshake message is sent (#452) 2021-04-30 18:19:40 -05:00
Wade Simmons
44cb697552
Add more metrics (#450)
* Add more metrics

This change adds the following counter metrics:

Metrics to track packets dropped at the firewall:

    firewall.dropped.local_ip
    firewall.dropped.remote_ip
    firewall.dropped.no_rule

Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).

    handshake_manager.initiated
    handshake_manager.timed_out

Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.

    hostinfo.cached_packets.dropped
    hostinfo.cached_packets.sent

This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.

* separate incoming/outgoing metrics

* remove "allowed" firewall metrics

We don't need this on the hotpath, they aren't worh it.

* don't need pointers here
2021-04-27 22:23:18 -04:00
Nathan Brown
db23fdf9bc
Dont apply race avoidance to existing handshakes, use the handshake time to determine who wins (#451)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2021-04-27 21:15:34 -05:00
Nathan Brown
710df6a876
Refactor remotes and handshaking to give every address a fair shot (#437) 2021-04-14 13:50:09 -05:00
Nathan Brown
3ea7e1b75f
Don't use a global logger (#423) 2021-03-26 09:46:30 -05:00
Wade Simmons
6c55d67f18
Refactor handshake_ix (#401)
There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake.

Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Ryan Huber <rhuber@gmail.com>
Co-authored-by: forfuncsake <drussell@slack-corp.com>
2021-03-12 14:16:25 -05:00