Bloom filter is a confusing but powerful data structure that could save you memory, time, and maybe a career. What the heck is probabilistic data structure? It’s a structure that allows you to ensure that the element is not existing (for sure) but doesn’t guarantee that element exists.
Let’s say you have to serve some data from slow storage. You also have much smaller but faster storage for the cache. Access to the cache is not immediate but faster than a slow one.
If you can put all the keys of your cache in Hashmap, you are lucky. But if you can’t, you can take a look at Bloom filter structure.
You put cached keys in bloom filter, and every time users request data, you check the key in the bloom filter. If an item not in the bloom filter, you are retrieving the data from slow storage and put it into the cache. If an element exists, then you look at the cache but can expect that element is missing. That’s it. If you would like to decrease the probability of missing cache you can use multiple different Bloom filters.
If my explanation is confusing you can check out this short video.
More detailed explanation in wikipedia article.