veloren

mirror of https://gitlab.com/veloren/veloren.git synced 2024-08-30 18:12:32 +00:00

Author	SHA1	Message	Date
Marcel Märtens	10970841cc	fix master / update toolchain to `2020-08-15`	2020-08-17 10:28:09 +02:00
Marcel Märtens	e618eeb386	Fix a isse that might occur when a participant is dropped while the remote wants to open a Stream and we get some bad time condition. increase the slowlorris timeout. for some reason it seems to trigger alot more often since commit: `75c1d440` but i have no idea why. My guess would be that the initial sync now sends way more data which slows down TCP to be > 10ms and trigger. Note: the fix might cause small lags when slow people try to connect to the server	2020-08-13 12:06:53 +02:00
Marcel Märtens	dd581bc6c0	Participant closure was immeatiatly, even in case a new participant was connected, send a MSG and then dropped immeadiatly. The remote site should see it connect, be open for 1 single stream and read the message before it's notified that the participant is closed actually. This caused the faulure in one of our API tests (in lib, with client and server). Where it was possible that all messages were send and one side was dropped before the other side asked for the opened stream Also introduce better error detection in participant(and scheduler) by removing the std_async::Result and intruduce `Result<(),ParticipantError>` instead	2020-07-22 09:18:15 +02:00
Marcel Märtens	6c59caf8e1	make `prometheus` optional in network and fix a panic in the server - an extra interface `new_with_regisitry` was created to make sure the interface doesn't depend on the features	2020-07-15 16:45:49 +02:00
Marcel Märtens	58cb98deaa	use `type` to reduce complexity	2020-07-15 16:45:44 +02:00
Marcel Märtens	c74e5e4b47	Changes requested in rewiew	2020-07-13 23:41:32 +02:00
Marcel Märtens	6db9c6f91b	fix a followup bug, after a protocol fail now Participant is closed, including all streams, so we get the stream errors. We MUST handle them and we are not allowed to act on a stream after it failed, as i am to lazy to change the structure to ensure the client to be imeadiatly dropped i added a AtomicBool to it.	2020-07-13 13:03:35 +02:00
Marcel Märtens	187ec42aa2	fix Participant shutdown - we had the problem that Participants couldn't shutdown them self, only by scheduler, which was controlled by api. it's needed e.g. to handle the Schudown Frame - my initial solution did a full shutdown, which was a problem if in parallel a 2nd shutdown was requested, no possibility of getting the error - new solution will only deactivate Participant and Stream. and then still functions correctly, till the api closes the participant and calls the scheduler which then calls the bparticipant again - i experimented with a Mutex<oneshot> or 2 and a `select` but it didn't prove that well - also adjusted the Error messages to now either Disconnected when gracefully shutdown or ProtocolFailed when some msg couldn't be delivered (note later might not be 100% returned correctly yet)	2020-07-13 13:03:30 +02:00
Marcel Märtens	df45d35c0e	tcp protocol hardening - make it harder for the server to crash and also kill invalid sessions properly (instead of waiting for them to close) - introduce macros to reduce code duplication - added tests to check for valid handshake as well as garbage tcp	2020-07-13 13:03:25 +02:00
Marcel Märtens	9d32e3f884	proper voxygen connect and code cleanups: - voxygen abort when the server has a invalid veloren_network handshake, e.g. by outdated version instead of try again - rename Network `Address` to `ProtocolAddr` as sugested by zest as it's a combination of Protocol and std::io::Addr - remove the manual byte arrays in `protocols.rs` with something more nice	2020-07-13 13:03:20 +02:00
Marcel Märtens	041349be48	Switch API to return Participant rather than Arc<Participant> - API behavior switched! - the `Network` no longer holds a copy of participant, thus if the return of `connect` (before `Arc<Participant>, now `Participant`) got dropped, the `Participant::Drop` is triggered! - you can close a Participant async via `Particiant::disconnect()`, no more need to know the network at this point - the `Network::Drop` will check and drop not yet disconnected Participants. - you can compare Participants via PartialEq, if they are true they point to the same endpoint (it checks remote_pid) - Note: multiple Participants are only supported in theory, wont work yet Additionally: - fix some `debug!` - veloren-client will now drop the participant gracefully on shutdown - rename `error` to `debug` when 2 times Bparticipant shutdown is called, as it is to be expected in a async runtime	2020-07-13 13:03:14 +02:00
Marcel Märtens	8fb445b0e8	better lz4	2020-07-13 13:03:04 +02:00
Marcel Märtens	4cefdcefea	zests fix - capitalize first letter	2020-07-13 13:03:01 +02:00
Marcel Märtens	5f902b5eab	doing a clean shutdown of the BParticipant once the TCP connection is suddenly interrupted	2020-07-13 13:02:55 +02:00
Marcel Märtens	c212de00c2	updated dependencies and fixed stuff - replace serde_derive by feature of serde incl. source code modifications to compile - reduce futures-timer to "2.0" to be same as async_std - update notify - removed mio, bincode and lz4 compress in common as networking is now in own crate btw there is a better lz4 compress crate, which is newer than 2017 - update prometheus to 0.9 - can't update uvth yet due to usues - hashbrown to 7.2 to only need a single version - libsqlite3 update - image didn't change as there is a problem with `image 0.23` - switch old directories with newer directories-next - no num upgrade as we still depend on num 0.2 anyways - rodio and cpal upgrade - const-tewaker update - dispatch (untested) update - git2 update - iterations update	2020-07-07 09:43:49 +02:00
Joshua Barretto	3c1fddfb0b	Merge branch 'zesterer/server-fixes' into 'master' Server bug fixes See merge request veloren/veloren!1159	2020-07-05 19:41:30 +00:00
Joshua Barretto	dd2a81b1f3	Increased network test timeouts	2020-07-05 19:56:06 +01:00
Marcel Märtens	3a6319f2f6	compress everything	2020-07-05 20:14:47 +02:00
Marcel Märtens	092b1e0d6c	small fix	2020-07-05 18:54:34 +02:00
Marcel	2a7a8b05e6	Merge branch 'network-lockless' into 'master' Network lockless See merge request veloren/veloren!1153	2020-07-05 09:17:29 +00:00
Marcel Märtens	e7195b57ad	extend network with better Error codes for Network	2020-07-04 12:32:52 +02:00
Marcel Märtens	cbfd398035	remove Mutex in server as Stream is now 'Sync'	2020-07-04 12:31:59 +02:00
Marcel Märtens	6de2eadeb0	make crash -> error for now	2020-07-04 10:32:52 +02:00
Marcel Märtens	1da6f15a43	fixing various smaller network party from issue 657	2020-07-04 02:04:33 +02:00
Marcel Märtens	f9895a7800	crossbeam-channel and log spam - swap out std::mpsc with crossbeam-channel in networking crate - remove log spam by only logging when populating a new cache entry and not on every get	2020-07-03 22:35:29 +02:00
Marcel Märtens	e1b27c51f5	fix clippy issues in tests and add it to CI	2020-07-01 00:37:15 +02:00
Marcel Märtens	6535fa5744	fix various clippy issues	2020-07-01 00:37:06 +02:00
Marcel Märtens	11e7b1f922	increase network sleep in order to fix flanky tests	2020-06-29 14:32:20 +02:00
Marcel Märtens	57453291e4	- switched participant to only error!() once a sec instead of every occurence in order to not spam error - switched the behavior of prio to keep the order of a stream, EVEN if messages of different length are used! This needs to be addresses to fulfill PROMISES_ORDERED I think we need to make thoughs if PRIO needs to handle this, or we add additional meta info and handle it on receiver site	2020-06-28 22:29:42 +02:00
Ben Wallis	c1c968f479	Globally suppressed clippy lint option_map_unit_fn for #587	2020-06-14 16:48:07 +00:00
Marcel Märtens	1435d8d6be	remove unused files	2020-06-12 11:53:59 +02:00
Marcel Märtens	0e59ee901e	dependency reduction: - authc no longer uses reqwest - image only supports PNG - replace routille with tiny_http - several other dependencies - cargo upgrade - following improvement was measured on R7 1700X: before: - cargo build: 3076.73s user / 4:45 total / 589 dependencies - cargo test: 6118.38s user / 7:30 total / 959 dependencies after: - cargo build: 2680.54s user / 4:05 total / 480 dependencies - cargo test: 5351.81s user / 7:04 total / 791 dependencies - added xMAC94x to CODEOWNERS for Cargo.toml, he will protect them from now on and hit people with evil looks ;)	2020-06-11 20:55:34 +02:00
Marcel Märtens	2e3d5f87db	StreamError::Deserialize is now triggered when `recv` fails because of wrong type - added PartialEq to StreamError for test purposes (only yet!) - removed async_recv example as it's no longer for any use. It was created before the COMPLETE REWRITE in order to verify that my own async interface on top of mio works. However it's now guaranteed by async-std and futures. no need for a special test - remove uvth from dependencies and replace it with a `FnOnce` - fix ALL clippy (network) lints - basic fix for a channel drop scenario: TODO: this needs some further fixes up to know only destruction of participant by api was covered correctly. we had an issue when the underlying channels got dropped. So now we have a participant without channels. We need to buffer the requests and try to reopen a channel ASAP! If no channel could be reopened we need to close the Participant, while a) leaving the BParticipant in takt, knowing that it only waits for a propper close by scheduler b) close the BParticipant gracefully. Notifying the scheduler to remove its stuff (either scheduler schould detect a stopped BParticipant or BParticipant will send Scheduler it's own destruction, and then Scheduler just does the same like when API forces a close) Keep the Participant alive and wait for the api to acces BParticipant to notice it's closed and then wait for a disconnect which isn't doing anything as it was already cleaned up in the background	2020-06-09 13:16:39 +02:00
Marcel Märtens	3324c08640	Fixing the DEADLOCK in handshake -> channel creation - this bug was initially called imbris bug, as it happened on his runners and i couldn't reproduce it locally at fist :) - When in a Handshake a seperate mpsc::Channel was created for (Cid, Frame) transport however the protocol could already catch non handshake data any more and push in into this mpsc::Channel. Then this channel got dropped and a fresh one was created for the network::Channel. These droped Frames are ofc a BUG! I tried multiple things to solve this: - dont create a new mpsc::Channel, but instead bind it to the Protocol itself and always use 1. This would work theoretically, but in bParticipant side we are using 1 mpsc::Channel<(Cid, Frame)> to handle ALL the network::channel. If now ever Protocol would have it's own, and with that every network::Channel had it's own it would no longer work out Bad Idea... - using the first method but creating the mpsc::Channel inside the scheduler instead protocol neither works, as the scheduler doesnt know the remote_pid yet - i dont want a hack to say the protocol only listen to 2 messages and then stop no matter what So i switched over to the simply method now: - Do everything like before with 2 mpsc::Channels - after the handshake. close the receiver and listen for all remaining (cid, frame) combinations - when starting the channel, reapply them to the new sender/listener combination - added tracing - switched Protocol RwLock to Mutex, as it's only ever 1 - Additionally changed the layout and introduces the c2w_frame_s and w2s_cid_frame_s name schema - Fixed a bug in scheduler which WOULD cause a DEADLOCK if handshake would fail - fixd a but in api_send_send_main, i need to store the stream_p otherwise it's immeadiatly closed and a stream_a.send() isn't guaranteed - add extra test to verify that a send message is received even if the Stream is already closed - changed OutGoing to Outgoing - fixed a bug that `metrics.tick()` was never called - removed 2 unused nightly features and added `deny_code`	2020-06-09 01:24:21 +02:00
Marcel Märtens	2a7c5807ff	overall cleanup, more tests, fixing clashes, removing unwraps, hardening against protocol errors, prepare prio mgr to take commands from scheduler fix async_recv and double block_on panic on Network::drop and participant::drop include Cargo.lock from all examples Found a bug on imbris runners with doc tests of `stream::send` and `stream::recv` As neither a backtrace, nor tracing on runners in the doc tests seems to help, i disable them and add them as unit tests	2020-06-09 01:24:16 +02:00
Marcel Märtens	a86cfbae65	add new tests and increase coverage	2020-06-09 01:24:07 +02:00
Marcel Märtens	6e776e449f	fixing all tests and doc tests including some deadlocks	2020-06-09 01:24:05 +02:00
Marcel Märtens	9550da87b8	speeding up metrics by reducing string generation and Hashmap access with a metrics cache for msg/send and msg/recv	2020-06-09 01:24:01 +02:00
Marcel Märtens	8b839afcae	move prios from `scheduler` to `participant` in oder to fixing closing of stream/participant however i need to coordinate the prio adjustments in scheduler from now on, so that ParticipantA doesn't get all the network bandwith and ParticipantB nothing	2020-06-09 01:23:58 +02:00
Marcel Märtens	bd69b2ae28	renamed all Channels to new naming scheme and fixing shutting down bparticipant and scheduler correctly. Introducing structs to keep Info in `scheduler.rs` and `participant.rs`	2020-06-09 01:23:55 +02:00
Marcel Märtens	007f5cabaa	DOCUMENTATION for everything	2020-06-09 01:23:52 +02:00
Marcel Märtens	a8f1bc178a	Experiments with a `prometheus bug` which actually worked as designed because i had `client` and `server` running at the same time - https://github.com/tikv/rust-prometheus/issues/321 - split up channel into a hanshake part and channel part. The handshake part is non endless and ends when its either done or aborted. If its okay i will send a request to the BParticipant which then opens a channel on the existing TCP or UDP connection. this streamlines the command chain alot. also the channel is almost empty now, thinking about removing it completly. isnt perfect, as shutdown and udp doesnt work yet - make PID to print as Base64 - replace rouille with tiny_http	2020-06-09 01:23:49 +02:00
Marcel Märtens	9074de533a	handling frames no longer is channel -> scheduler -> participant, but it's directly channel -> participant, removing a lock and a single bottleneck in the scheduler	2020-06-09 01:23:45 +02:00
Marcel Märtens	661060808d	switch from serde to manually for speed, remove async_serde - removing async_serde as it seems to be not usefull the idea was because deserialising is slow parallising it could speed up. Whoever we need to keep the order of frames, (at least for controlframes) so serialising in threads would be quite complicated. Also serialisation is quite fast, about 1 Gbit/s such speed is enough for messaging, it's more important to serve parallel streams better. Thats why i am removing async serde coding for now - frames are no longer serialized by serde, by byte by byte manually, increadible speed upgrade - more metrics - switch channel_creator into for_each_concurrent - removing some pool.spwan_ok() as they dont allow me to use self - reduce features needed	2020-06-09 01:23:42 +02:00
Marcel Märtens	2ee18b1fd8	Examples, HUGE fixes, test, make it alot smother - switch `listen` to async in oder to verify if the bind was successful - Introduce the following examples - network speed - chat - fileshare - add additional tests - fix dropping stream before last messages can be handled bug, when dropping a stream, BParticipant will wait for prio to be empty before dropping the stream and sending the signal - correct closing of stream and participant - move tcp to protocols and create udp front and backend - tracing and fixing a bug that is caused by not waiting for configuration after receiving a frame - fix a bug in network-speed, but there is still a bug if trace=warn after 2.000.000 messages the server doesnt get that client has shut down and seems to lock somewhere. hard to reproduce open tasks [ ] verify UDP works correctly, especcially the connect! [ ] implements UDP shutdown correctly, the one created in connect! [ ] unify logging [ ] fill metrics [ ] fix dropping stream before last messages can be handled bug [ ] add documentation [ ] add benchmarks [ ] remove async_serde??? [ ] add mpsc	2020-06-09 01:23:37 +02:00
Marcel Märtens	595f1502b3	COMPLETE REWRITE - use async_std and implement a async serialisaition - new participant, stream and drop on the participant - sending and receiving on streams	2020-06-09 01:23:30 +02:00
Marcel Märtens	499a895922	shutdown and udp/mpsc - theorectically closing of streams and shutdown - mpsc and udp preparations - cleanup and build better tests	2020-06-09 01:23:26 +02:00
Marcel Märtens	9354952a7f	Code/Dependency Cleanup	2020-06-09 01:23:19 +02:00
Marcel Märtens	641df53f4a	Got some async test to work	2020-06-09 01:23:15 +02:00
Marcel Märtens	74143e13d3	Implement a async recv test	2020-06-09 01:23:12 +02:00
Marcel Märtens	1e948389cc	Switch to iterator based ChannelProtocols	2020-06-09 01:23:09 +02:00
Marcel Märtens	ca45baeb76	Fix TCP buffering with a NetworkBuffer struct	2020-06-09 01:23:07 +02:00
Marcel Märtens	19fb1d3be4	Experiment with TCP buffering	2020-06-09 01:23:05 +02:00
Marcel Märtens	a6f1e3f176	Add a speedtest program to benchmark networking	2020-06-09 01:23:01 +02:00
Marcel Märtens	35233d07f9	Cleanup: - We can now get rid of most sleep and get true remote part and stream working, however there seems to be a deadlock after registered new handle trace with 10% spawn chance - removal of the events trait, as we use channels - streams now directly communicate with each other for performance reasons, somewhere are still deadlocks, oonce directly at listening somehow and after the first message has read, but i also got it to run perfectly through at this state without code change, maybe a sleep or more detailed rust-dgb session would help here!	2020-06-09 01:22:58 +02:00
Marcel Märtens	10863eed14	remove worker folder - flatten file structure	2020-06-09 01:22:55 +02:00
Marcel Märtens	e388b40c54	Till now all operations where oneshots, now i actually wait for a participant handshake to complete and being able to return their PID also fixed the correct pid, sid beeing send	2020-06-09 01:22:52 +02:00
Marcel Märtens	88f6b36a4e	Differ Metrics to make it easier to implement your own metric coding! Implement my own metric coding in networking	2020-06-09 01:22:48 +02:00
Marcel Märtens	f3251c0879	Converting the API interface to Async and experimenting with a Channel implementation for TCP, UDP, MPSC, which will later be reverted It should compile and tests run fine now. If not, the 2nd last squashed commit message said it currently only send frames but not incomming messages, also recv would only handle frames. The last one said i added internal messages and a reverse path (prob for .recv)	2020-06-09 01:22:45 +02:00
Marcel Märtens	5c5b33bd2a	Bring networking tests to green - Seperate worker into own directory - implement correct handshakes - implement correct receiving	2020-06-09 01:22:42 +02:00
Marcel Märtens	3d8ddcb4b3	Continue backend for networking and fill gaps, including: - introduce tlid to allow - introduce channel trait - remove old experimental handshake - seperate mio_worker into multiple fn - implement stream in backend	2020-06-09 01:22:38 +02:00
Marcel Märtens	52078f2251	first implementation of connect and tcp using a mio worker protocol and: - introduce a loadtest, for tcp messages - cleanup api - added a unittest - prepared a handshake message, which will in next commits get removed again - experimental mio worker merges - using uuid for participant id	2020-06-09 01:22:35 +02:00
Marcel Märtens	a01afd0c86	initial implementation of a network api	2020-06-09 01:22:32 +02:00

1 2 3

113 Commits