Creation of an ML system that automatically detects what product is advertised in an online advertisement. The challenge was a huge classification problem with tens of thousands of possible brands and categories.
Gemius is an international research and technology company that provides both media consumption research and tools to optimize advertising campaigns and ad serving.
Gemius has been awarded five times in the prestigious European IAB Research Awards competition in the Audience Measurement category.
We have created an efficient system that classifies ads in seven different languages from seven different countries.
Our solution has replaced much of the work previously done by humans. Moreover, we have designed the system so that adding support for new languages is very easy.
Our system works in two steps:
1. finds a set of good candidate brands for an ad,
2. for each couple – ad and candidate – it predicts whether it is an appropriate match.
This approach allowed us to scale our solution to millions of ads and tens of thousands of brands.
Candidate brands can be found by combining various techniques – extracting text from images, key phrases from ad descriptions, website analysis and logo detection. We identified suitable candidates using state-of-the-art algorithms – XGBoost, LightGBM, deep neural networks, transfer learning and factorization machines. Our solution was presented at the PyData Warsaw 2018 conference.