veloren

mirror of https://gitlab.com/veloren/veloren.git synced 2024-08-30 18:12:32 +00:00

Author	SHA1	Message	Date
Marcel Märtens	2a82405df2	update toolchain to `nightly-2021-09-24`	2021-09-24 23:18:07 +02:00
Avi Weinstock	8c5d52b70a	Enable TCP_NODELAY.	2021-06-27 17:45:33 -04:00
Marcel Märtens	cf3188b412	remove Protocol from Quic, cleanup code, fix some log spam	2021-05-21 10:41:19 +02:00
Marcel Märtens	df7b65289d	fix error handling in networking and switch to hashbrown, fixing #1118	2021-05-04 15:29:42 +02:00
Marcel Märtens	653fb065e0	extract protocol specific listen code from scheduler and move it to channel.rs	2021-04-29 17:51:52 +02:00
Marcel Märtens	4afadf57dc	move connect code to channel and get rid of unwraps	2021-04-29 15:58:43 +02:00
Marcel Märtens	95b186e29a	QuicSink and QuicDrain do work now. When local SendProtocol is opening a Stream, it will send a empty message to QuicDrain which will then know that its time to open a quic stream. It will open a QuicStream and send its SID over to remote. The RecvStream will be send to local QuicSink RemoteRecv will notice a new BiStream was opened and read its Sid. It will now start listening on it. while remote main will get the information that a stream was opened and will notice the frontend. in participant remote Recv is synced with remote send (without triggering a empty message!). RemoteRecv Sink will send the sendstream to RemoteSend Drain and it will be used when a first message is send on this stream.	2021-04-29 15:58:39 +02:00
Marcel Märtens	760c382ed9	protocoladdr change for listen and connect (remove a loop in quic protocol which wasnt a actual loop)	2021-04-29 15:58:34 +02:00
Marcel Märtens	9f0aceba4c	work on getting quic in the network	2021-04-29 15:58:26 +02:00
Marcel Märtens	383482a36e	Quic: We had the followuing problem: - locally we open a stream, our local Drain is sending OpenStream - remote Sink will know this and notify remote Drain - remote side sends a message - local sink does not know about the Stream. as there is (and CANT) be a wat to notify local Sink from local Drain (it could introduce race conditions). One of the possible solutions was, that the remote drain will copy the OpenStream Msg ON the Quic::stream before first data is send. This would work but is complicated. Instead we now just mark such streams as "potentially open" and we listen for the first DataHeader to get it's SID. add support for unreliable messages in quic protocol, benchmarks	2021-04-29 15:58:23 +02:00
Marcel Märtens	aea52d8b54	implement Upload Bandwidth prediction. Its available to `api` and `metrics` and can be used to slow down msg send in veloren. It uses a tokio::watch for now, as i plan to have a watch job in the scheduler that recalculates prio on change. Also cleaning up participant metrics after a disconnect	2021-03-26 08:58:03 +01:00
Marcel Märtens	514d5db038	Update Network Protocol - now last digit version is compatible 0.6.0 will connect to 0.6.1 - the TCP DATA Frames no longer contain START field, as it's not needed - the TCP OPENSTREAM Frames will now contain the BANDWIDTH field - MID is not Protocol internal Update network - update API with Bandwidth Update veloren - introduce better runtime and `async` things that are IO bound. - Remove `uvth` and instead use `tokio::runtime::Runtime::spawn_blocking` - remove futures_execute from client and server use tokio::runtime::Runtime instead - give threads a Name	2021-02-22 17:34:55 +01:00
Marcel Märtens	03af9937cf	Stabelize Network again: - completly switch to Bytes, even in api. speed up TCP by fak 2 - improve benchmarks - speed up mpsc metrics - gracefully handle shutdown by interpreting Ok(0) as tokio::tcpstream closed now. - fix hotloop in participants by adding `Some(n)` to fix endless handing. - fix closing bug by closing streams after `recv_mgr` is shutdown even if now shutdown is triggered locally. - fix prometheus - no longer throw when a `Stream` is dropped while participant still receives a msg for it. - fix the bandwith handling, TCP network send speed is up to 1.5GiB/s while recv is 150MiB/s - add documentation - tmp require rt-multi-threaded in client for tokio, to not fail cargo check this is prob stable, i tested over 1 hour. after that some optimisations in priomgr. and impl. propper bandwith. Speed is up to 2GB/s write and 150MB/s recv on a single core sync add documentation	2021-02-17 19:37:48 +01:00
Marcel Märtens	ea8ab1ce7a	Great improvements to the codebase: - better logging in network - we now notify the send of what happened in recv in participant. - works with veloren master servers - works in singleplayer, using a actual mid. - add `mpsc` in whole stack incl tests - speed up internal read/write with `Bytes` crate - use `prometheus-hyper` for metrics - use a metrics cache	2021-02-17 16:15:00 +01:00
Marcel Märtens	9884019963	COMPLETE REDESIGN of network crate - Implementing a async non-io protocol crate a) no tokio / no channels b) I/O is based on abstraction Sink/Drain c) different Protocols can have a different Drain Type This allow MPSC to send its content without splitting up messages at all! It allows UDP to have internal extra frames to care for security It allows better abstraction for tests Allows benchmarks on the mpsc variant Custom Handshakes to allow sth like Quic protocol easily - reduce the participant managers to 4: channel creations, send, recv and shutdown. keeping the `mut data` in one manager removes the need for all RwLocks. reducing complexity and parallel access problems - more strategic participant shutdown. first send. then wait for remote side to notice recv stop, then remote side will stop send, then local side can stop recv. - metrics are internally abstracted to fit protocol and network layer - in this commit network/protocol tests work and network tests work someway, veloren compiles but does not work - handshake compatible to async_std	2021-02-17 12:39:47 +01:00
Marcel Märtens	5aa1940ef8	get rid of `async_std::channel` switch to `tokio` and `async_channel` crate. I wanted to do tokio first, but it doesnt feature Sender::close(), thus i included async_channel Got rid of `futures` and only need `futures_core` and `futures_util`. Tokio does not support `Stream` and `StreamExt` so for now i need to use `tokio-stream`, i think this will go in `std` in the future Created `b2b_close_stream_opened_sender_r` as the shutdown procedure does not need a copy of a Sender, it just need to stop it. Various adjustments, e.g. for `select!` which now requieres a `&mut` for oneshots. Future things to do: - Use some better signalling than oneshot<()> in some cases. - Use a Watch for the Prio propergation (impl. it ofc) - Use Bounded Channels in order to improve performance - adjust tests coding bring tests to work	2021-02-17 12:38:53 +01:00
Marcel Märtens	a7b7ae3a2c	fix compiling with metrics	2020-08-27 09:35:06 +02:00
Marcel Märtens	9170622611	reduce load on metrics by ALOT! - first remove participant AND channel in same metric. this caused a matrix full of 0 values which bloated alot. - then did the cid cache to be lazy loading to no longer generate that much 0 values - possible would also be no longer keeping metrics for INIT, HANDSHAKE and PARTICIPANTID	2020-08-27 01:55:13 +02:00
notoria	2be4202d01	Corrected some spelling errors	2020-08-25 12:21:25 +00:00
Marcel Märtens	1eb126736d	workaround for impossible RAW msg	2020-08-22 01:09:07 +02:00
Marcel Märtens	926d334082	Fixed the unclean disconnecting of participants. Till now, we just dropped the TCP connection and registered this as a clean shutdown. The prodocol reader intereted this and send a Frame::Shutdown frame to it's local processor. This is ofc wrong. So now the protocol reader will detect a Frame::Shutdown frame and send it over. if the Tcp connection gets closed it will return an Error up. The processor will then pick up this error and request a unclear shutdown and notifies the user. Also when doing a clean shutdown we are sending a Frame::Shutdown now to the remote side to trigger this behavior. Before we wrongly added the feature of only using a `select` in channel. This is WRONG, as it could mean that the write maybe fails, but the read still had some Frames buffered which then get dropped. Its fixed now by the clean shutdown mechanims defined before. Also when a channel is closed now inside a participant we are closing the whole participant as a protection. However, we must not close the recv channel as the `handle_frames_mgr` might still be working on them, so we only stop writing/sending. Debugging this also let me introduce some smaller fixes: - PID in tests are now 0 and 1+164+164*64+... this makes the traces appear as AAAAAA and BBBBBB instead of ABAAAA and ACAAAA - veloren client now better seperates between clean shutdown and unclear shutdown. - added a new type: C2pFrame for `(cid, Result<Frame, ()>)` - wrong frames inside the handshare are not counted in metrics -	2020-08-21 18:00:28 +02:00
Marcel Märtens	12b46250f5	protocols no longer send a Close Frame in case the read fails. They just fail, let participant handle this! Participant will now handle a close in the `create_channel_mgr` rather then the `send` fn. Its the better place, which makes a HashMap better for delete lookup Since tcp_read now broke but tcp_write didn't and the Participant wasnt updated till both were broke, we changed CHANNEL tcp_read and tcp_write in protocols to be a `select` rather than a `join` However only do this in the CHANNEL, but in the HANDSHAKE. it fails if you try to. Also the handshake will take care of any failed read or write manually and will handle a clear teardown in this case.	2020-08-21 18:00:07 +02:00
Marcel Märtens	6c59caf8e1	make `prometheus` optional in network and fix a panic in the server - an extra interface `new_with_regisitry` was created to make sure the interface doesn't depend on the features	2020-07-15 16:45:49 +02:00
Marcel Märtens	df45d35c0e	tcp protocol hardening - make it harder for the server to crash and also kill invalid sessions properly (instead of waiting for them to close) - introduce macros to reduce code duplication - added tests to check for valid handshake as well as garbage tcp	2020-07-13 13:03:25 +02:00
Marcel Märtens	4cefdcefea	zests fix - capitalize first letter	2020-07-13 13:03:01 +02:00
Marcel Märtens	6535fa5744	fix various clippy issues	2020-07-01 00:37:06 +02:00
Marcel Märtens	2e3d5f87db	StreamError::Deserialize is now triggered when `recv` fails because of wrong type - added PartialEq to StreamError for test purposes (only yet!) - removed async_recv example as it's no longer for any use. It was created before the COMPLETE REWRITE in order to verify that my own async interface on top of mio works. However it's now guaranteed by async-std and futures. no need for a special test - remove uvth from dependencies and replace it with a `FnOnce` - fix ALL clippy (network) lints - basic fix for a channel drop scenario: TODO: this needs some further fixes up to know only destruction of participant by api was covered correctly. we had an issue when the underlying channels got dropped. So now we have a participant without channels. We need to buffer the requests and try to reopen a channel ASAP! If no channel could be reopened we need to close the Participant, while a) leaving the BParticipant in takt, knowing that it only waits for a propper close by scheduler b) close the BParticipant gracefully. Notifying the scheduler to remove its stuff (either scheduler schould detect a stopped BParticipant or BParticipant will send Scheduler it's own destruction, and then Scheduler just does the same like when API forces a close) Keep the Participant alive and wait for the api to acces BParticipant to notice it's closed and then wait for a disconnect which isn't doing anything as it was already cleaned up in the background	2020-06-09 13:16:39 +02:00
Marcel Märtens	3324c08640	Fixing the DEADLOCK in handshake -> channel creation - this bug was initially called imbris bug, as it happened on his runners and i couldn't reproduce it locally at fist :) - When in a Handshake a seperate mpsc::Channel was created for (Cid, Frame) transport however the protocol could already catch non handshake data any more and push in into this mpsc::Channel. Then this channel got dropped and a fresh one was created for the network::Channel. These droped Frames are ofc a BUG! I tried multiple things to solve this: - dont create a new mpsc::Channel, but instead bind it to the Protocol itself and always use 1. This would work theoretically, but in bParticipant side we are using 1 mpsc::Channel<(Cid, Frame)> to handle ALL the network::channel. If now ever Protocol would have it's own, and with that every network::Channel had it's own it would no longer work out Bad Idea... - using the first method but creating the mpsc::Channel inside the scheduler instead protocol neither works, as the scheduler doesnt know the remote_pid yet - i dont want a hack to say the protocol only listen to 2 messages and then stop no matter what So i switched over to the simply method now: - Do everything like before with 2 mpsc::Channels - after the handshake. close the receiver and listen for all remaining (cid, frame) combinations - when starting the channel, reapply them to the new sender/listener combination - added tracing - switched Protocol RwLock to Mutex, as it's only ever 1 - Additionally changed the layout and introduces the c2w_frame_s and w2s_cid_frame_s name schema - Fixed a bug in scheduler which WOULD cause a DEADLOCK if handshake would fail - fixd a but in api_send_send_main, i need to store the stream_p otherwise it's immeadiatly closed and a stream_a.send() isn't guaranteed - add extra test to verify that a send message is received even if the Stream is already closed - changed OutGoing to Outgoing - fixed a bug that `metrics.tick()` was never called - removed 2 unused nightly features and added `deny_code`	2020-06-09 01:24:21 +02:00
Marcel Märtens	6e776e449f	fixing all tests and doc tests including some deadlocks	2020-06-09 01:24:05 +02:00
Marcel Märtens	8b839afcae	move prios from `scheduler` to `participant` in oder to fixing closing of stream/participant however i need to coordinate the prio adjustments in scheduler from now on, so that ParticipantA doesn't get all the network bandwith and ParticipantB nothing	2020-06-09 01:23:58 +02:00
Marcel Märtens	a8f1bc178a	Experiments with a `prometheus bug` which actually worked as designed because i had `client` and `server` running at the same time - https://github.com/tikv/rust-prometheus/issues/321 - split up channel into a hanshake part and channel part. The handshake part is non endless and ends when its either done or aborted. If its okay i will send a request to the BParticipant which then opens a channel on the existing TCP or UDP connection. this streamlines the command chain alot. also the channel is almost empty now, thinking about removing it completly. isnt perfect, as shutdown and udp doesnt work yet - make PID to print as Base64 - replace rouille with tiny_http	2020-06-09 01:23:49 +02:00
Marcel Märtens	9074de533a	handling frames no longer is channel -> scheduler -> participant, but it's directly channel -> participant, removing a lock and a single bottleneck in the scheduler	2020-06-09 01:23:45 +02:00
Marcel Märtens	661060808d	switch from serde to manually for speed, remove async_serde - removing async_serde as it seems to be not usefull the idea was because deserialising is slow parallising it could speed up. Whoever we need to keep the order of frames, (at least for controlframes) so serialising in threads would be quite complicated. Also serialisation is quite fast, about 1 Gbit/s such speed is enough for messaging, it's more important to serve parallel streams better. Thats why i am removing async serde coding for now - frames are no longer serialized by serde, by byte by byte manually, increadible speed upgrade - more metrics - switch channel_creator into for_each_concurrent - removing some pool.spwan_ok() as they dont allow me to use self - reduce features needed	2020-06-09 01:23:42 +02:00
Marcel Märtens	2ee18b1fd8	Examples, HUGE fixes, test, make it alot smother - switch `listen` to async in oder to verify if the bind was successful - Introduce the following examples - network speed - chat - fileshare - add additional tests - fix dropping stream before last messages can be handled bug, when dropping a stream, BParticipant will wait for prio to be empty before dropping the stream and sending the signal - correct closing of stream and participant - move tcp to protocols and create udp front and backend - tracing and fixing a bug that is caused by not waiting for configuration after receiving a frame - fix a bug in network-speed, but there is still a bug if trace=warn after 2.000.000 messages the server doesnt get that client has shut down and seems to lock somewhere. hard to reproduce open tasks [ ] verify UDP works correctly, especcially the connect! [ ] implements UDP shutdown correctly, the one created in connect! [ ] unify logging [ ] fill metrics [ ] fix dropping stream before last messages can be handled bug [ ] add documentation [ ] add benchmarks [ ] remove async_serde??? [ ] add mpsc	2020-06-09 01:23:37 +02:00
Marcel Märtens	595f1502b3	COMPLETE REWRITE - use async_std and implement a async serialisaition - new participant, stream and drop on the participant - sending and receiving on streams	2020-06-09 01:23:30 +02:00
Marcel Märtens	499a895922	shutdown and udp/mpsc - theorectically closing of streams and shutdown - mpsc and udp preparations - cleanup and build better tests	2020-06-09 01:23:26 +02:00
Marcel Märtens	9354952a7f	Code/Dependency Cleanup	2020-06-09 01:23:19 +02:00
Marcel Märtens	641df53f4a	Got some async test to work	2020-06-09 01:23:15 +02:00
Marcel Märtens	74143e13d3	Implement a async recv test	2020-06-09 01:23:12 +02:00
Marcel Märtens	1e948389cc	Switch to iterator based ChannelProtocols	2020-06-09 01:23:09 +02:00
Marcel Märtens	19fb1d3be4	Experiment with TCP buffering	2020-06-09 01:23:05 +02:00
Marcel Märtens	35233d07f9	Cleanup: - We can now get rid of most sleep and get true remote part and stream working, however there seems to be a deadlock after registered new handle trace with 10% spawn chance - removal of the events trait, as we use channels - streams now directly communicate with each other for performance reasons, somewhere are still deadlocks, oonce directly at listening somehow and after the first message has read, but i also got it to run perfectly through at this state without code change, maybe a sleep or more detailed rust-dgb session would help here!	2020-06-09 01:22:58 +02:00
Marcel Märtens	10863eed14	remove worker folder - flatten file structure	2020-06-09 01:22:55 +02:00

43 Commits