Commit Graph

43 Commits

Author SHA1 Message Date
João Capucho
8b89903d57
Bump tracing-subscriber to 0.3.2
We were pulling two versions of it. Causing some compilation errors
2021-11-26 17:41:51 +00:00
Marcel Märtens
aa93b4b53c update other binaries 2021-11-20 20:19:48 +01:00
Marcel Märtens
e29ede7c97 updating dependencies,
cannot update the following dependencies:
 - vek: Sharps SIMD isnt upstream
 - tracing-subscriber: MakeWriter was adjusted and i was to lazy to fiddle with lifetimes,
 - refinery, rustsql: we have a custom refinery version which is incompatible with newer rustsql
 - equi + egui_winit + egui_wgpu_backend: i tried it in this commit but it turned out that they dependo n wgpu which we cant update
 - wgpu: cant update due new version doesnt support DX11

Got quinn updated which now require some dependencies to be explicit.
2021-11-20 20:17:49 +01:00
Marcel Märtens
88685cc016 update crates 2021-09-20 14:39:01 +02:00
Sam
9db8406e4c Fixed torvus 2021-05-27 19:56:16 -05:00
Marcel Märtens
df7b65289d fix error handling in networking and switch to hashbrown, fixing #1118 2021-05-04 15:29:42 +02:00
Marcel Märtens
760c382ed9 protocoladdr change for listen and connect
(remove a loop in quic protocol which wasnt a actual loop)
2021-04-29 15:58:34 +02:00
Marcel Märtens
9f0aceba4c work on getting quic in the network 2021-04-29 15:58:26 +02:00
Marcel Märtens
383482a36e Quic: We had the followuing problem:
- locally we open a stream, our local Drain is sending OpenStream
 - remote Sink will know this and notify remote Drain
 - remote side sends a message
 - local sink does not know about the Stream. as there is (and CANT) be a wat to notify local Sink from local Drain (it could introduce race conditions).

One of the possible solutions was, that the remote drain will copy the OpenStream Msg ON the Quic::stream before first data is send. This would work but is complicated.

Instead we now just mark such streams as "potentially open" and we listen for the first DataHeader to get it's SID.

add support for unreliable messages in quic protocol, benchmarks
2021-04-29 15:58:23 +02:00
Marcel Märtens
d7df741671 update dependencies, including removal of some tracy deps as they are get through common/tracy 2021-03-09 20:17:29 +01:00
Marcel Märtens
514d5db038 Update Network Protocol
- now last digit version is compatible 0.6.0 will connect to 0.6.1
 - the TCP DATA Frames no longer contain START field, as it's not needed
 - the TCP OPENSTREAM Frames will now contain the BANDWIDTH field
 - MID is not Protocol internal

Update network
 - update API with Bandwidth

Update veloren
 - introduce better runtime and `async` things that are IO bound.
 - Remove `uvth` and instead use `tokio::runtime::Runtime::spawn_blocking`
 - remove futures_execute from client and server use tokio::runtime::Runtime instead
 - give threads a Name
2021-02-22 17:34:55 +01:00
Marcel Märtens
03af9937cf Stabelize Network again:
- completly switch to Bytes, even in api. speed up TCP by fak 2
 - improve benchmarks
 - speed up mpsc metrics
 - gracefully handle shutdown by interpreting Ok(0) as tokio::tcpstream closed now.
 - fix hotloop in participants by adding `Some(n)` to fix endless handing.
 - fix closing bug by closing streams after `recv_mgr` is shutdown even if now shutdown is triggered locally.
 - fix prometheus
 - no longer throw when a `Stream` is dropped while participant still receives a msg for it.
 - fix the bandwith handling, TCP network send speed is up to 1.5GiB/s while recv is 150MiB/s
 - add documentation
 - tmp require rt-multi-threaded in client for tokio, to not fail cargo check

this is prob stable, i tested over 1 hour.
after that some optimisations in priomgr.
and impl. propper bandwith.
Speed is up to 2GB/s write and 150MB/s recv on a single core

sync add documentation
2021-02-17 19:37:48 +01:00
Marcel Märtens
ea8ab1ce7a Great improvements to the codebase:
- better logging in network
 - we now notify the send of what happened in recv in participant.
 - works with veloren master servers
 - works in singleplayer, using a actual mid.
 - add `mpsc` in whole stack incl tests
 - speed up internal read/write with `Bytes` crate
 - use `prometheus-hyper` for metrics
 - use a metrics cache
2021-02-17 16:15:00 +01:00
Marcel Märtens
9884019963 COMPLETE REDESIGN of network crate
- Implementing a async non-io protocol crate
    a) no tokio / no channels
    b) I/O is based on abstraction Sink/Drain
    c) different Protocols can have a different Drain Type
       This allow MPSC to send its content without splitting up messages at all!
       It allows UDP to have internal extra frames to care for security
       It allows better abstraction for tests
       Allows benchmarks on the mpsc variant
       Custom Handshakes to allow sth like Quic protocol easily
 - reduce the participant managers to 4: channel creations, send, recv and shutdown.
   keeping the `mut data` in one manager removes the need for all RwLocks.
   reducing complexity and parallel access problems
 - more strategic participant shutdown. first send. then wait for remote side to notice recv stop, then remote side will stop send, then local side can stop recv.
 - metrics are internally abstracted to fit protocol and network layer
 - in this commit network/protocol tests work and network tests work someway, veloren compiles but does not work
 - handshake compatible to async_std
2021-02-17 12:39:47 +01:00
Marcel Märtens
5aa1940ef8 get rid of async_std::channel
switch to `tokio` and `async_channel` crate.
I wanted to do tokio first, but it doesnt feature Sender::close(), thus i included async_channel
Got rid of `futures` and only need `futures_core` and `futures_util`.

Tokio does not support `Stream` and `StreamExt` so for now i need to use `tokio-stream`, i think this will go in `std` in the future

Created `b2b_close_stream_opened_sender_r` as the shutdown procedure does not need a copy of a Sender, it just need to stop it.

Various adjustments, e.g. for `select!` which now requieres a `&mut` for oneshots.

Future things to do:
 - Use some better signalling than oneshot<()> in some cases.
 - Use a Watch for the Prio propergation (impl. it ofc)
 - Use Bounded Channels in order to improve performance
 - adjust tests coding

bring tests to work
2021-02-17 12:38:53 +01:00
Marcel Märtens
1b77b6dc41 Initial switch to tokio for network, minimum working example. 2021-02-17 12:37:59 +01:00
Caelan
dda4931f46 Clean and update dependencies
* Remove tweak feature
 * Remove const-tweaker
 * Update tiny_http
 * Update bitvec to 0.21.0
 * Downgrade euc to avoid conflict with vek 0.12.0
 * Require exactly vek 0.12.0
 * Update all other dependencies automatically based on these changes
 * Update gilrs to latest at the request of Ada Lovegirls
 * Update meshing benchmarks to new criterion API
2021-02-17 01:27:06 -08:00
jiminycrick
8b97199245 Update rand dependency 2021-01-26 20:35:08 -08:00
Acrimon
9f16a946ee
update a few deps 2021-01-20 15:53:58 +01:00
Marcel Märtens
26918d10c9 update chrossbeam, tracy, prometheus (and reduce server deps to crossbeam-channel) 2020-12-16 00:51:07 +01:00
Marcel Märtens
40f5afc2b0 ci cleanup, dependency update 2020-11-06 14:34:42 +01:00
Marcel Märtens
e914c29728 FIX for hanging participant deletion.
There is a rare bug that recently got triggered more often with the release of xMAC94x/netfixA
if the bug triggeres, a Participant never gets cleaned up gracefully.

Reason:
When `participant_shutdown_mgr` was called it stopped all managers at once.
Especially stream_close_mgr and send_mgr.
The problem with stream_close_mgr is, it's responsible for gracefully flushing streams when the Participant is dropped locally.
So when it was interupted self.streams where no longer flushed gracefully.
The next problem was with send_mgr.
It is triggering the PrioManager, and the PrioManager is responsible for notifying once a stream is completly flushed.
This lead to the problem, that a stream flush could be requested, but was actually never executed (as send_mgr was already down).

Solution:
1. when stream_close_mgr is stopped it MUST flush all remaining streams
2. wait for stream_close_mgr to finish before shutting down the send_mgr
3. no longer delete streams when closing the API (this also wasn't tracked in metrics so far)

Additionally i added a dependency, so that the network/examples compile again, fixed some spelling.
I created a `delete_stream` fn that basically just moved the code over.
2020-10-14 15:03:49 +02:00
Marcel Märtens
8c883c200d Switch veloren_network over to use the official example layout.
adjusted those examples to run again
created a CI TEST to always `check` examples
fixed fmt in examples so that pipeline gets green
2020-08-26 10:07:22 +02:00
Marcel Märtens
144f88f811 Propper Compression support of network.
- Compression is no longer enabled always but can now be enabled per Stream.
   If a Stream is Compression enabled it will compress and decompress all msg (except for `raw` access) before handling them internally.
   You need to handle compression yourself for `raw` fn.
 - added a new feature to the network crate to enable or disable the compression
 - switched to `lz-fear` instead of `lz4-compression`
 - use `bitflags` to represent the `Promises` struct
2020-08-25 23:55:27 +02:00
Marcel Märtens
6c59caf8e1 make prometheus optional in network and fix a panic in the server
- an extra interface `new_with_regisitry` was created to make sure the interface doesn't depend on the features
2020-07-15 16:45:49 +02:00
Marcel Märtens
c212de00c2 updated dependencies and fixed stuff
- replace serde_derive by feature of serde
   incl. source code modifications to compile
 - reduce futures-timer to "2.0" to be same as async_std
 - update notify
 - removed mio, bincode and lz4 compress in common as networking is now in own crate
   btw there is a better lz4 compress crate, which is newer than 2017
 - update prometheus to 0.9
 - can't update uvth yet due to usues
 - hashbrown to 7.2 to only need a single version
 - libsqlite3 update
 - image didn't change as there is a problem with `image 0.23`
 - switch old directories with newer directories-next
 - no num upgrade as we still depend on num 0.2 anyways
 - rodio and cpal upgrade
 - const-tewaker update
 - dispatch (untested) update
 - git2 update
 - iterations update
2020-07-07 09:43:49 +02:00
Marcel Märtens
3a6319f2f6 compress everything 2020-07-05 20:14:47 +02:00
Marcel Märtens
f9895a7800 crossbeam-channel and log spam
- swap out std::mpsc with crossbeam-channel in networking crate
 - remove log spam by only logging when populating a new cache entry and not on every get
2020-07-03 22:35:29 +02:00
Marcel Märtens
0e59ee901e dependency reduction:
- authc no longer uses reqwest
 - image only supports PNG
 - replace routille with tiny_http
 - several other dependencies
 - cargo upgrade
 - following improvement was measured on R7 1700X:
   before:
    - cargo build: 3076.73s user / 4:45 total / 589 dependencies
    - cargo test: 6118.38s user / 7:30 total / 959 dependencies
   after:
    - cargo build: 2680.54s user / 4:05 total / 480 dependencies
    - cargo test: 5351.81s user / 7:04 total / 791 dependencies
 - added xMAC94x to CODEOWNERS for Cargo.toml, he will protect them from now on and hit people with evil looks ;)
2020-06-11 20:55:34 +02:00
Marcel Märtens
2e3d5f87db StreamError::Deserialize is now triggered when recv fails because of wrong type
- added PartialEq to StreamError for test purposes (only yet!)
 - removed async_recv example as it's no longer for any use.
   It was created before the COMPLETE REWRITE in order to verify that my own async interface on top of mio works.
   However it's now guaranteed by async-std and futures. no need for a special test
 - remove uvth from dependencies and replace it with a `FnOnce`
 - fix ALL clippy (network) lints
 - basic fix for a channel drop scenario:
   TODO: this needs some further fixes
   up to know only destruction of participant by api was covered correctly.
   we had an issue when the underlying channels got dropped. So now we have a participant without channels.
   We need to buffer the requests and try to reopen a channel ASAP!
   If no channel could be reopened we need to close the Participant, while
    a) leaving the BParticipant in takt, knowing that it only waits for a propper close by scheduler
    b) close the BParticipant gracefully. Notifying the scheduler to remove its stuff (either scheduler schould detect a stopped BParticipant or BParticipant will send Scheduler it's own destruction, and then Scheduler just does the same like when API forces a close)
       Keep the Participant alive and wait for the api to acces BParticipant to notice it's closed and then wait for a disconnect which isn't doing anything as it was already cleaned up in the background
2020-06-09 13:16:39 +02:00
Marcel Märtens
661060808d switch from serde to manually for speed, remove async_serde
- removing async_serde as it seems to be not usefull
  the idea was because deserialising is slow parallising it could speed up.
  Whoever we need to keep the order of frames, (at least for controlframes) so serialising in threads would be quite complicated.
  Also serialisation is quite fast, about 1 Gbit/s such speed is enough for messaging, it's more important to serve parallel streams better.
  Thats why i am removing async serde coding for now
- frames are no longer serialized by serde, by byte by byte manually, increadible speed upgrade
- more metrics
- switch channel_creator into for_each_concurrent
- removing some pool.spwan_ok() as they dont allow me to use self
- reduce features needed
2020-06-09 01:23:42 +02:00
Marcel Märtens
2ee18b1fd8 Examples, HUGE fixes, test, make it alot smother
- switch `listen` to async in oder to verify if the bind was successful
- Introduce the following examples
  - network speed
  - chat
  - fileshare
- add additional tests
- fix dropping stream before last messages can be handled bug, when dropping a stream, BParticipant will wait for prio to be empty before dropping the stream and sending the signal
- correct closing of stream and participant
- move tcp to protocols and create udp front and backend
- tracing and fixing a bug that is caused by not waiting for configuration after receiving a frame
- fix a bug in network-speed, but there is still a bug if trace=warn after 2.000.000 messages the server doesnt get that client has shut down and seems to lock somewhere. hard to reproduce

open tasks
[ ] verify UDP works correctly, especcially the connect!
[ ] implements UDP shutdown correctly, the one created in connect!
[ ] unify logging
[ ] fill metrics
[ ] fix dropping stream before last messages can be handled bug
[ ] add documentation
[ ] add benchmarks
[ ] remove async_serde???
[ ] add mpsc
2020-06-09 01:23:37 +02:00
Marcel Märtens
595f1502b3 COMPLETE REWRITE
- use async_std and implement a async serialisaition
- new participant, stream and drop on the participant
- sending and receiving on streams
2020-06-09 01:23:30 +02:00
Marcel Märtens
499a895922 shutdown and udp/mpsc
- theorectically closing of streams and shutdown
- mpsc and udp preparations
- cleanup and build better tests
2020-06-09 01:23:26 +02:00
Marcel Märtens
9354952a7f Code/Dependency Cleanup 2020-06-09 01:23:19 +02:00
Marcel Märtens
641df53f4a Got some async test to work 2020-06-09 01:23:15 +02:00
Marcel Märtens
a6f1e3f176 Add a speedtest program to benchmark networking 2020-06-09 01:23:01 +02:00
Marcel Märtens
88f6b36a4e Differ Metrics to make it easier to implement your own metric coding!
Implement my own metric coding in networking
2020-06-09 01:22:48 +02:00
Marcel Märtens
f3251c0879 Converting the API interface to Async and experimenting with a Channel implementation for TCP, UDP, MPSC, which will later be reverted
It should compile and tests run fine now.
If not, the 2nd last squashed commit message said it currently only send frames but not incomming messages, also recv would only handle frames. The last one said i added internal messages and a reverse path (prob for .recv)
2020-06-09 01:22:45 +02:00
Marcel Märtens
5c5b33bd2a Bring networking tests to green
- Seperate worker into own directory
 - implement correct handshakes
 - implement correct receiving
2020-06-09 01:22:42 +02:00
Marcel Märtens
3d8ddcb4b3 Continue backend for networking and fill gaps, including:
- introduce tlid to allow
 - introduce channel trait
 - remove old experimental handshake
 - seperate mio_worker into multiple fn
 - implement stream in backend
2020-06-09 01:22:38 +02:00
Marcel Märtens
52078f2251 first implementation of connect and tcp using a mio worker protocol and:
- introduce a loadtest, for tcp messages
 - cleanup api
 - added a unittest
 - prepared a handshake message, which will in next commits get removed again
 - experimental mio worker merges
 - using uuid for participant id
2020-06-09 01:22:35 +02:00
Marcel Märtens
a01afd0c86 initial implementation of a network api 2020-06-09 01:22:32 +02:00