How encryption works on Jitsi meet

Introduction

Jitsi Meet offers very strong protection even if you do not explicitly turn on e2ee. Here are more details:

Jitsi meetings in general operate in 2 ways: peer-to-peer (P2P) or via the Jitsi Videobridge (JVB). This is transparent to the user. P2P mode is only used for 1-to-1 meetings. In this case, audio and video are encrypted using DTLS-SRTP all the way from the sender to the receiver, even if they traverse network components like TURN servers.

In the case of multiparty meetings all audio and video traffic is still encrypted on the network (again, using DTLS-SRTP). This outer layer of DTLS-SRTP encryption is removed while packets are traversing Jitsi Videobridge; however they are never stored to any persistent storage and only live in memory while being routed to other participants in the meeting.

It is very important to note that when packets are also end-to-end encrypted, this second layer of encryption is never removed (nor can it be)

End-to-End Encryption

E2EE in Jitsi is implemented by adding an extra layer of encryption, that is, encrypting the audio / video media at the source, before it is encrypted with DTLS-SRTP. This way, when the SFU (videobridge) decrypts the DTLS-SRTP payload it will not be able to access the actual media contained within the payload.

This extra layer of encryption can currently only be implemented in browsers supporting insertable streams. Jitsi Meet implements a slight variant of the SFrame specification for achieving E2EE, it is called as JFrame. So this is not recommended.

Encryption

Encryption is performed with AES-GCM (with a 128 bit key) and the WebCrypto API. AES-GCM needs a 96 bit initialization vector which is constructed based on the SSRC, the RTP timestamp and a frame counter which is similar to how the IV is constructed in SRTP with GCM.

This IV gets sent along with the packet, adding 12 bytes of overhead. The GCM tag length is the default 128 bits or 16 bytes. At a high level the encrypted frame format looks like this:

It does not encrypt the first few bytes of the packet that form the VP8 payload (10 bytes for key frames, 3 bytes for interframes) nor the Opus TOC byte.

This serves,

Allows the SFU to have access to the required metadata to properly route the packet, while not having access to the payload nor knowing if it is E2EE or not
Fools the decoder in the browser into processing the video frame (as the header is correct), resulting in a pixelated rainbow pattern of sorts.

Key distribution

Each participant will share their key with every other participant in the meeting using a secure channel. In the current implementation, the builtin signalling transport (XMPP) is used to negotiate an E2EE channel using Olm.

Key rotation

As participants join and leave the meeting keys will be replaced so former participants can no longer decrypt any new media, and any new participants cannot decrypt any previous media. When a participant leaves the meeting a full key rotation procedure is carried out. This is the same process as when creating the initial key: new random material will be created, key derived and distributed over the Olm channel. When a new participant joins the meeting each participant will ratchet their key (derive a new key based on the previous one) and share the new key with the new participant

Comparison of E2EE communication products

Title	Zoom	Signal	Jitsi Meet
Key management	Custom (over TLS)	Double ratchet	Double ratchet
Media encryption	AES-GCM	AES-CTR	AES-GCM
SAS verification	yes	yes	No
Web browser support	No	No	yes
Participant limit	200	8	20
Open Source	No	yes	yes
Vulnerability to outsiders	low	low	low
Vulnerability to insiders	Very High	Medium-High	Medium-High
Vulnerability to participants	High	High	High

You can find more information from the jitsi E2EE whitepaper

Share on

Twitter Facebook Google+ LinkedIn