nebula

mirror of https://github.com/slackhq/nebula.git synced 2025-11-10 06:43:57 +01:00

Author	SHA1	Message	Date
Wade Simmons	f36db374ac	Merge remote-tracking branch 'origin/master' into multiport	2025-03-06 16:11:32 -05:00
Nate Brown	d97ed57a19	V2 certificate format (#1216 ) Co-authored-by: Nate Brown <nbrown.us@gmail.com> Co-authored-by: Jack Doan <jackdoan@rivian.com> Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com> Co-authored-by: Jack Doan <me@jackdoan.com>	2025-03-06 11:28:26 -06:00
Nate Brown	08ac65362e	Cert interface (#1212 )	2024-10-10 18:00:22 -05:00
Wade Simmons	dabce8a1b4	1.9.4 Release -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnN7QnoQoG72upUfo5qM118W2lxoFAmbfOr4ACgkQ5qM118W2 lxoTGQ//SKoaiZwbtWZtEjYWUJPxGL5gbidmqdmtT9b0ttBK+ufRRbRQXeuXv+pY KlKE3YxS8aWbW+YPvtQ7Ly6W4KoJ49esZYnFRMwnLnOpJY9KXtWe0ej+ohQIqm0g R/7MFx9YiKsO+oNI3Bk8Flfkdhh2RCSECO/i5V0oZIkZHy3ceeM/EAlMXy2slC7Z jcDLKkHsDSTkNhuCiNFwR8t04y2sZhYXPDC3xG/9FzO8dlstj6Kj7L0E7uceb3yP 9LlmnQB8AAXQ/ZpJ82Roe72ORGuL5xwUPDpEPKnM2090h6skIA9cpIn4BpRpg/6S rrZb/fSIjLlE8YnkA39kKnMS1SW5O2EXSDtXCzEkZI40vGHIJiVY2j+mELqHiWLf 8MLVC0qW2DvOMA28ZAipQ2gG9txxuArLBD/Zlhtlzn4KeP8m1Dnnv1kkL8z8+H+6 18zM9lcE4xK8ET+9yao5yNpYinhwEHQnekeevMBJPrI/5SQxkb53u+FXeg1eGAbK IewcLlpxun/IwL8D0NwY2/1EVlemupEed9geHDBIjM9gPmBG/zYJdRvh2aLUXcti C5nxXAXUknXYAyUwT2kvplLyj1yZheA9nDonIVI9GY1nyZmzWsT0D7BSoOGxw+6H 4nhcsQfHpEVQvCfY9G2wOvmqiZEkbFDho/3o7hebowkFljXXcKU= =IC32 -----END PGP SIGNATURE----- Merge tag 'v1.9.4' into multiport 1.9.4 Release	2024-09-13 10:17:59 -04:00
Jack Doan	248cf194cd	fix integer wraparound in the calculation of handshake timeouts on 32-bit targets (#1185 ) Fixes: #1169	2024-08-13 09:25:18 -04:00
Wade Simmons	f5f6c269ac	fix rare panic when local index collision happens (#1191 ) A local index collision happens when two tunnels attempt to use the same random int32 index ID. This is a rare chance, and we have code to deal with it, but we have a panic because we return the wrong thing in this case. This change should fix the panic.	2024-08-07 11:53:32 -04:00
Nate Brown	e264a0ff88	Switch most everything to netip in prep for ipv6 in the overlay (#1173 )	2024-07-31 10:18:56 -05:00
Wade Simmons	6b78e9cdb3	Merge remote-tracking branch 'origin/master' into multiport	2024-07-10 13:38:11 -04:00
Wade Simmons	4eb1da0958	remove deadlock in GetOrHandshake (#1151 ) We had a rare deadlock in GetOrHandshake because we kept the hostmap lock when we do the call to StartHandshake. StartHandshake can block while sending to the lighthouse query worker channel, and that worker needs to be able to grab the hostmap lock to do its work. Other calls for StartHandshake don't hold the hostmap lock so we should be able to drop it here. This lock was originally added with: https://github.com/slackhq/nebula/pull/954	2024-05-29 12:52:52 -04:00
Wade Simmons	b445d14ddb	Merge remote-tracking branch 'origin/master' into multiport	2024-05-08 11:22:19 -04:00
Wade Simmons	7efa750aef	avoid deadlock in lighthouse queryWorker (#1112 ) * avoid deadlock in lighthouse queryWorker If the lighthouse queryWorker tries to grab to call StartHandshake on a lighthouse vpnIp, we can deadlock on the handshake_manager lock. This change drops the handshake_manager lock before we send on the lighthouse queryChan (which could block), and also avoids sending to the channel if this is a lighthouse IP itself. * need to hold lock during cacheCb	2024-04-11 17:00:01 -04:00
Nate Brown	a390125935	Support reloading preferred_ranges (#1043 )	2024-04-03 22:14:51 -05:00
Wade Simmons	659d7fece6	1.8.2 Release -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnN7QnoQoG72upUfo5qM118W2lxoFAmWcXeYACgkQ5qM118W2 lxo8yBAAxnMxvP2d2Mu2n6SExRxqmK5e+CddM0XWNZQzTXO1gyKw7YPLzzQwRPTa mhmuGEmqjmG0/VXwz9dl1jrpIJu0ge7APgIn9duFzz5HYnDbb+6+T0cQ/8LQbNe1 i+xGdY3n1RYHKoeqOi14lmf9uB6zrklfhzFG/05AyYjNNipMtAsC82FrFmySTQ9w gp4XGwK5edzWSrBZ0w4nbo8G8r4mP/2qZdbxY+9g9IrrQoeoZtWVttdZ36rkEvIi uzyj//PClLTTrAiSHcWdrdPHlLj2L4t1S0ixjnAk2OO/OD/EQ5FwtYggF+x+YE6N fedIcUliJNidK7FZ+cWUdB6tUWgjM9TsbfuPoCI786e1OnBRML5ZPCiXZpzhxMWZ l+uKJkOUqoC7Nu83+WoedLrJo5zwOhq8oYx0/BVw8dNMdYFGSPrbE3ooFtgUc6Lu 2TEtD5NzVz6nPAyPOYVNOw726J19fFBKbBZsV12KSTW1ElFafEDCHGelIf2wt8mI t23SlYfHMJOhKPMnJWczAFsuVDfMmt5xRvH1mFORiBIm/4EXYIS00IEGKQYuC7m+ lUmdrk9R6pVdq5lekL1KkB/fjGI/mg5liYY0ubx/4oeHXRyMPXeVY0ZkTqc2PPHi 7wl2iLytG/FTMdGPC4F4LmXT9xPRzTGNpANItael2PTSBPThQb8= =XsOf -----END PGP SIGNATURE----- Merge tag 'v1.8.2' into multiport 1.8.2 Release	2024-01-26 10:45:15 -05:00
Nate Brown	072edd56b3	Fix re-entrant `GetOrHandshake` issues (#1044 )	2023-12-19 11:58:31 -06:00
Nate Brown	a44e1b8b05	Clean up a hostinfo to reduce memory usage (#955 )	2023-11-02 16:53:59 -05:00
Wade Simmons	f2aef0d6eb	Merge remote-tracking branch 'origin/master' into multiport	2023-10-27 08:48:13 -04:00
Nate Brown	50d6a1e8ca	QueryServer needs to be done outside of the lock (#996 )	2023-10-17 15:43:51 -05:00
Nate Brown	076ebc6c6e	Simplify getting a hostinfo or starting a handshake with one (#954 )	2023-08-21 18:51:45 -05:00
Nate Brown	7edcf620c0	We only need the certificate in ConnectionState (#953 )	2023-08-21 14:11:06 -05:00
Nate Brown	a10baeee92	Pull hostmap and pending hostmap apart, remove unused functions (#843 )	2023-07-24 12:37:52 -05:00
Nate Brown	3bbf5f4e67	Use an interface for udp conns (#901 )	2023-06-14 10:48:52 -05:00
Wade Simmons	0e593ad582	Merge branch 'master' into multiport	2023-05-09 15:37:30 -04:00
Nate Brown	03e4a7f988	Rehandshaking (#838 ) Co-authored-by: Brad Higgins <brad@defined.net> Co-authored-by: Wade Simmons <wadey@slack-corp.com>	2023-05-04 15:16:37 -05:00
Wade Simmons	28ecfcbc03	Merge remote-tracking branch 'origin/master' into multiport	2023-05-03 10:50:06 -04:00
brad-defined	9b03053191	update EncReader and EncWriter interface function args to have concrete types (#844 ) * Update LightHouseHandlerFunc to remove EncWriter param. * Move EncWriter to interface * EncReader, too	2023-04-07 14:28:37 -04:00
Nate Brown	d3fe3efcb0	Fix handshake retry regression (#842 )	2023-04-05 10:04:30 -05:00
Wade Simmons	e71059a410	Merge remote-tracking branch 'origin/master' into multiport	2023-04-03 11:30:41 -04:00
Nate Brown	ee8e1348e9	Use connection manager to drive NAT maintenance (#835 ) Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>	2023-03-31 15:45:05 -05:00
Nate Brown	1a6c657451	Normalize logs (#837 )	2023-03-30 15:07:31 -05:00
brad-defined	2801fb2286	Fix relay (#827 ) Co-authored-by: Nate Brown <nbrown.us@gmail.com>	2023-03-30 11:09:20 -05:00
Nate Brown	f0ef80500d	Remove dead code and re-order transit from pending to main hostmap on stage 2 (#828 )	2023-03-17 15:36:24 -05:00
Wade Simmons	e1af37e46d	add calculated_remotes (#759 ) * add calculated_remotes This setting allows us to "guess" what the remote might be for a host while we wait for the lighthouse response. For networks that hard designed with in mind, it can help speed up handshake performance, as well as improve resiliency in the case that all lighthouses are down. Example: lighthouse: # ... calculated_remotes: # For any Nebula IPs in 10.0.10.0/24, this will apply the mask and add # the calculated IP as an initial remote (while we wait for the response # from the lighthouse). Both CIDRs must have the same mask size. # For example, Nebula IP 10.0.10.123 will have a calculated remote of # 192.168.1.123 10.0.10.0/24: - mask: 192.168.1.0/24 port: 4242 * figure out what is up with this test * add test * better logic for sending handshakes Keep track of the last light of hosts we sent handshakes to. Only log handshake sent messages if the list has changed. Remove the test Test_NewHandshakeManagerTrigger because it is faulty and makes no sense. It relys on the fact that no handshake packets actually get sent, but with these changes we would send packets now (which it should!) * use atomic.Pointer * cleanup to make it clearer * fix typo in example	2023-03-13 15:09:08 -04:00
Wade Simmons	aec7f5f865	Merge remote-tracking branch 'origin/master' into multiport	2023-03-13 15:07:32 -04:00
Nate Brown	92cc32f844	Remove handshake race avoidance (#820 ) Co-authored-by: Wade Simmons <wadey@slack-corp.com>	2023-03-13 12:35:14 -05:00
Nate Brown	5278b6f926	Generic timerwheel (#804 )	2023-01-18 10:56:42 -06:00
Caleb Jasik	12dbbd3dd3	Fix typos found by https://github.com/crate-ci/typos (#735 )	2022-12-19 11:28:27 -06:00
Wade Simmons	326fc8758d	Support multiple UDP source ports (multiport) The goal of this work is to send packets between two hosts using more than one 5-tuple. When running on networks like AWS where the underlying network driver and overlay fabric makes routing, load balancing, and failover decisions based on the flow hash, this enables more than one flow between pairs of hosts. Multiport spreads outgoing UDP packets across multiple UDP send ports, which allows nebula to work around any issues on the underlay network. Some example issues this could work around: - UDP rate limits on a per flow basis. - Partial underlay network failure in which some flows work and some don't Agreement is done during the handshake to decide if multiport mode will be used for a given tunnel (one side must have tx_enabled set, the other side must have rx_enabled set) NOTE: you cannot use multiport on a host if you are relying on UDP hole punching to get through a NAT or firewall. NOTE: Linux only (uses raw sockets to send). Also currently only works with IPv4 underlay network remotes. This is implemented by opening a raw socket and sending packets with a source port that is based on a hash of the overlay source/destiation port. For ICMP and Nebula metadata packets, we use a random source port. Example configuration: multiport: # This host support sending via multiple UDP ports. tx_enabled: false # This host supports receiving packets sent from multiple UDP ports. rx_enabled: false # How many UDP ports to use when sending. The lowest source port will be # listen.port and go up to (but not including) listen.port + tx_ports. tx_ports: 100 # NOTE: All of your hosts must be running a version of Nebula that supports # multiport if you want to enable this feature. Older versions of Nebula # will be confused by these multiport handshakes. # # If handshakes are not getting a response, attempt to transmit handshakes # using random UDP source ports (to get around partial underlay network # failures). tx_handshake: false # How many unresponded handshakes we should send before we attempt to # send multiport handshakes. tx_handshake_delay: 2	2022-10-17 12:58:06 -04:00
brad-defined	1a7c575011	Relay (#678 ) Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>	2022-06-21 13:35:23 -05:00
Wade Simmons	304b12f63f	create ConnectionState before adding to HostMap (#535 ) We have a few small race conditions with creating the HostInfo.ConnectionState since we add the host info to the pendingHostMap before we set this field. We can make everything a lot easier if we just add an "init" function so that we can set this field in the hostinfo before we add it to the hostmap.	2021-11-08 14:46:22 -05:00
Nate Brown	bcabcfdaca	Rework some things into packages (#489 )	2021-11-03 20:54:04 -05:00
brad-defined	6ae8ba26f7	Add a context object in nebula.Main to clean up on error (#550 )	2021-11-02 13:14:26 -05:00
John Maguire	98c391396c	Remove log when no handshake message is sent (#452 )	2021-04-30 18:19:40 -05:00
Wade Simmons	44cb697552	Add more metrics (#450 ) * Add more metrics This change adds the following counter metrics: Metrics to track packets dropped at the firewall: firewall.dropped.local_ip firewall.dropped.remote_ip firewall.dropped.no_rule Metrics to track handshakes attempts that have been initiated and ones that have timed out (ones that have completed are tracked by the existing "handshakes" histogram). handshake_manager.initiated handshake_manager.timed_out Metrics to track when cached_packets are dropped because we run out of buffer space, and how many are sent once the handshake completes. hostinfo.cached_packets.dropped hostinfo.cached_packets.sent This change also notes how many cached packets we have when we log the final "Handshake received" message for either stage1 for stage2. * separate incoming/outgoing metrics * remove "allowed" firewall metrics We don't need this on the hotpath, they aren't worh it. * don't need pointers here	2021-04-27 22:23:18 -04:00
Nathan Brown	db23fdf9bc	Dont apply race avoidance to existing handshakes, use the handshake time to determine who wins (#451 ) Co-authored-by: Wade Simmons <wadey@slack-corp.com>	2021-04-27 21:15:34 -05:00
Nathan Brown	710df6a876	Refactor remotes and handshaking to give every address a fair shot (#437 )	2021-04-14 13:50:09 -05:00
Nathan Brown	3ea7e1b75f	Don't use a global logger (#423 )	2021-03-26 09:46:30 -05:00
Wade Simmons	6c55d67f18	Refactor handshake_ix (#401 ) There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake. Co-authored-by: Nate Brown <nbrown.us@gmail.com> Co-authored-by: Ryan Huber <rhuber@gmail.com> Co-authored-by: forfuncsake <drussell@slack-corp.com>	2021-03-12 14:16:25 -05:00
Wade Simmons	d604270966	Fix most known data races (#396 ) This change fixes all of the known data races that `make smoke-docker-race` finds, except for one. Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages. Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field). The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now. here is the results of `make smoke-docker-race`: before: lighthouse1: Found 2 data race(s) host2: Found 36 data race(s) host3: Found 17 data race(s) host4: Found 31 data race(s) after: host2: Found 1 data race(s) host4: Found 1 data race(s) Fixes: #147 Fixes: #226 Fixes: #283 Fixes: #316	2021-03-05 21:18:33 -05:00
Wade Simmons	1bae5b2550	more validation in pending hostmap deletes (#344 ) We are currently seeing some cases where we are not deleting entries correctly from the pending hostmap. I believe this is a case of an inbound timer tick firing and deleting the Hosts map entry for a newer handshake attempt than intended, thus leaving the old Indexes entry orphaned. This change adds some extra checking when deleteing from the Indexes and Hosts maps to ensure we clean everything up correctly.	2021-03-01 12:40:46 -05:00
Wade Simmons	ee7c27093c	add HostMap.RemoteIndexes (#329 ) This change adds an index based on HostInfo.remoteIndexId. This allows us to use HostMap.QueryReverseIndex without having to loop over all entries in the map (this can be a bottleneck under high traffic lighthouses). Without this patch, a high traffic lighthouse server receiving recv_error packets and lots of handshakes, cpu pprof trace can look like this: flat flat% sum% cum cum% 2000ms 32.26% 32.26% 3040ms 49.03% github.com/slackhq/nebula.(*HostMap).QueryReverseIndex 870ms 14.03% 46.29% 1060ms 17.10% runtime.mapiternext Which shows 50% of total cpu time is being spent in QueryReverseIndex.	2020-11-23 14:51:16 -05:00

1 2

56 Commits