Jitsi Meet offers very strong protection even if you do not explicitly turn on e2ee. Here are more details:
Jitsi meetings in general operate in 2 ways: peer-to-peer (P2P) or via the Jitsi Videobridge (JVB). This is transparent to the user. P2P mode is only used for 1-to-1 meetings. In this case, audio and video are encrypted using DTLS-SRTP all the way from the sender to the receiver, even if they traverse network components like TURN servers.
In the case of multiparty meetings all audio and video traffic is still encrypted on the network (again, using DTLS-SRTP). This outer layer of DTLS-SRTP encryption is removed while packets are traversing Jitsi Videobridge; however they are never stored to any persistent storage and only live in memory while being routed to other participants in the meeting.
It is very important to note that when packets are also end-to-end encrypted, this second layer of encryption is never removed (nor can it be)
E2EE in Jitsi is implemented by adding an extra layer of encryption, that is, encrypting the audio / video media at the source, before it is encrypted with DTLS-SRTP. This way, when the SFU (videobridge) decrypts the DTLS-SRTP payload it will not be able to access the actual media contained within the payload.
This extra layer of encryption can currently only be implemented in browsers supporting insertable streams. Jitsi Meet implements a slight variant of the SFrame specification for achieving E2EE, it is called as JFrame. So this is not recommended.
Encryption is performed with AES-GCM (with a 128 bit key) and the WebCrypto API. AES-GCM needs a 96 bit initialization vector which is constructed based on the SSRC, the RTP timestamp and a frame counter which is similar to how the IV is constructed in SRTP with GCM.
This IV gets sent along with the packet, adding 12 bytes of overhead. The GCM tag length is the default 128 bits or 16 bytes. At a high level the encrypted frame format looks like this:
It does not encrypt the first few bytes of the packet that form the VP8 payload (10 bytes for key frames, 3 bytes for interframes) nor the Opus TOC byte.
- Allows the SFU to have access to the required metadata to properly route the packet, while not having access to the payload nor knowing if it is E2EE or not
- Fools the decoder in the browser into processing the video frame (as the header is correct), resulting in a
pixelated rainbowpattern of sorts.
Each participant will share their key with every other participant in the meeting using a secure channel. In the current implementation, the builtin signalling transport (XMPP) is used to negotiate an E2EE channel using
As participants join and leave the meeting keys will be replaced so former participants can no longer decrypt any new media, and any new participants cannot decrypt any previous media. When a participant leaves the meeting a full key rotation procedure is carried out. This is the same process as when creating the initial key: new random material will be created, key derived and distributed over the Olm channel. When a new participant joins the meeting each participant will ratchet their key (derive a new key based on the previous one) and share the new key with the new participant
Comparison of E2EE communication products
|Key management||Custom (over TLS)||Double ratchet||Double ratchet|
|Web browser support||No||No||yes|
|Vulnerability to outsiders||low||low||low|
|Vulnerability to insiders||Very High||Medium-High||Medium-High|
|Vulnerability to participants||High||High||High|
You can find more information from the jitsi E2EE whitepaper