Blocking email spam is a constant, ever-evolving battle, and Gmail’s latest technique results in a 38% boost to detection thanks to better text identification.
Spammers often use homoglyphs (characters that look similar to actual letters), invisible characters, keyword stuffing, and other “adversarial text manipulations” to bypass Gmail’s text classification models that identify phishing attacks, scams, and other harmful content.
Google is countering with RETVec (Resilient & Efficient Text Vectorizer). Open sourced by Google Research, this approach “helps models achieve state-of-the-art classification performance and drastically reduces computational cost,” while supporting “every language and all UTF-8 characters without the need for text preprocessing.” This makes it ideal for on-device, web, and other large-scale use cases:
“Models trained with RETVec can be seamlessly converted to TFLite for mobile and edge devices, as a result of a native implementation in TensorFlow Text. For web application model deployment, we provide a TensorflowJS layer implementation that is available on Github and you can check out a demo web page running a RETVec-based model.”
In Gmail, RETVec has improved the “spam detection rate over the baseline by 38%,” while reducing both the false positive rate (by 19.4%) and Tensor Processing Unit usage (by 83%).
RETVec achieves these improvements by sporting a very lightweight word embedding model (~200k parameters), allowing us to reduce the Transformer model’s size at equal or better performance, and having the ability to split the computation between the host and TPU in a…
There's something about the balminess of summer that makes you want to let loose. And the right soundtrack makes all the difference. Mixing hip-hop and R&B with Afrobeats, dancehall, and more, here’s a playlist to take you through those warm nights. Our editors regularly update this playlist, so if you hear something you like, add it to your library and keep the party going. Listen to Apple Music