adjoe Engineers’ Blog
 /  Data Science  /  DCNs for Replacing Individual Models
Data Science

How Does a Deep and Cross Network Enable Us to Replace Over 100 Individual Models 

Welcome to part two of our article on Neural and Cross Networks.

In this section, I’ll share how we developed a single model that replaced over 100 individual models—enhancing accuracy, speed, and coverage while significantly reducing maintenance costs.

Our Models Architecture 

At adjoe, we process millions of predictions daily across 100+ countries. Managing separate models fine-tuned for different data slices eventually became unsustainable due to growing complexity. We needed a scalable, generalized solution—and that’s where deep and cross networks came into the picture.

Let’s look at how our user engagement prediction system previously worked in detail before jumping into our new approach.

We used to have over 100 models based on gradient-boosted trees. Each model was responsible for a specific slice of the data. This was necessary because each slice of data has its own unique properties. As an example, each ad targets a different audience, and standard models couldn’t effectively capture the differences between these audiences. Using traditional model architectures, a single model couldn’t be trained to handle everything. The tree-based models were chosen for their speed and efficiency.

​​However, one challenge with this approach was that we needed sufficient data from each slice to effectively train a model. If a new data slice appeared, we couldn’t make any predictions until enough data was gathered. But at the same time, we couldn’t create a single model to handle everything either.

Looking at Users and App Install Data

Let’s analyze one example of input features of our model.

In this case, we aimed to estimate the value on the Y-axis, representing the probability of installation (in percents). The X-axis shows the number of ads a user has been exposed to. As we can see, the likelihood of installation decreases as the number of ads viewed increases. This is intuitive: excessive ad exposure typically leads to diminishing interest, with users becoming less engaged and less likely to install the advertised app.

The blue line represents users who have never installed an app from an ad, while the red line indicates users who have previously installed one app by clicking on an ad. As shown, users with prior installs have nearly double the likelihood of installation at every point on the graph.

Furthermore, the likelihood of installation increases with each additional app installed. For instance, after a user installs three apps, their probability of installation could double again. The data reveals a clear pattern: after two installs, the likelihood doubles, and after three, it doubles once more. This highlights the core issue discussed in my previous article (predicting x²).

We can see that previous installs have a multiplicative effect on the installation likelihood. An interesting pattern emerges: users with three previous installs have the highest chance of installing among all the groups shown. However, even at their peak, this likelihood is still lower than that of users with two previous installs.

We expect the purple line to have the dotted-purple-line trajectory, but what the model takes from the data is that the green line has the highest value.

This discrepancy arises because we don’t have enough data for users who have installed three apps. It’s clear that you can’t install three apps after only seeing two ads, even if you install every app shown. The data just doesn’t align. Additionally, it’s uncommon for a user to see three ads and install all of them. As a result, we need more time to collect enough data for this scenario.

As you may notice, there is a lack of sufficient data for users who have made three installs. If we had more data, it would likely be positioned higher on the graph. With limited information, traditional models, including neural networks, might simplify assumptions, such as treating two installs as more valuable than three. This highlights the importance of refining our approach to ensure accurate predictions. This is a common pitfall of standard models. In contrast, deep and cross networks, especially with their automatic feature crossing, are designed to avoid this mistake.

Training Our Deep-and-Cross-Based Model

Now equipped with the power of deep and cross networks, we are no longer forced to make many different models so we chose to train a single deep-and-cross-based model using all the data we had. By combining different slices as an added feature to model input, we created a model that could make predictions across all possible scenarios, demonstrating the advanced capabilities of machine learning in adtech.

What benefits did this switch bring us?

  • ​​This model considers all of the available data, leading to improved accuracy and better overall metrics. We saw an increase in both aggregated metrics and  for each individual slice.
  •  Having a single model to maintain significantly reduced costs, both in terms of infrastructure and the time developers spent on maintenance.
  •  With just one model, we could batch all our ads and requests together and process them in a single operation. This batching led to faster inference time.
  • Previously, when we onboarded a new app advertiser, we couldn’t start making predictions right away due to the lack of data. However, because this model was trained on all available data and was designed to handle a wide range of scenarios, we could quickly start using it for any new case. The new model’s versatility allowed us to significantly increase coverage without any delays.

Final Thoughts: What to Consider When Using DCNs

 When you want to use DCNs, there are some key points to keep in mind. 

  1. Don’t overdo it. While adding layers to deep and cross networks might seem tempting, more isn’t always better. The vanishing gradient problem, common in neural networks, is even more pronounced here. Based on both my experience and the literature, keeping the depth to three or four layers tends to give the best results. Adding more layers can actually hurt performance.
  2. These models are fast at making predictions, but the training process can be much slower. Expect potential delays during training.
  3. There are variations of DCN worth considering. What we’ve covered is based on DCNv2, but for better results, check out GDCN (gated deep and cross networks). A new version of the DCN paper (v3) has also been released, which I haven’t covered here—be sure to check it out.

If you are working with structured data and especially if you have the feeling that your models can learn more from your data, deep and cross networks might be your solution. 

Sources:

  1. Ruoxi Wang, Gang Fu, Bin Fu, and Mingliang Wang.
    Deep & Cross Network for Ad Click Predictions.
    Stanford University and Google Inc.
  2. Fangye Wang, Tun Lu, Hansu Gu, Dongsheng Li, Peng Zhang, and Ning Gu.
    Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction.
    Fudan University, Microsoft Research Asia, and Seattle.
  3. Honghao Li, Hanwei Li, Yiwen Zhang, Lei Sang, Yi Zhang, and Jieming Zhu.
    DCNv3: Towards Next Generation Deep Cross Network for Click-Through Rate Prediction.
    Anhui University and Huawei Noah’s Ark Lab.

Senior Data Scientist (f/m/d)

  • adjoe
  • Playtime Data Science
  • Full-time
adjoe is a leading mobile ad platform developing cutting-edge advertising and monetization solutions that take its app partners’ business to the next level. Part of the applike group ecosystem, adjoe is home to an advanced tech stack, powerful financial backing from Bertelsmann, and a highly motivated workforce to be reckoned with.

Meet Your Team: Playtime
Playtime is a time- and event-based ad unit that continuously rewards users with in-app currency – for the time they spend and events completed while playing mobile games. We connect advertisers to 200+ million Playtime users and serve 2bn requests per day at low latency. We ensure that all parties involved have a positive experience. Advertisers get more users for their apps. Monetizers earn revenue for users on their platforms. Users play fun games while simultaneously getting rewarded. Our data science team powers the engine that distributes our ads. They solve multiple tasks such as developing algorithms to provide the most relevant ads for users, predicting user interests and inclinations, and dynamically adjusting pricing based on these predictions. Because the user base is very diverse we are using deep learning models that we could show to be able to serve the best ads to users.

Within the Playtime team you will be responsible for services and models that we use to provide automated solutions for our advertisers.
What you will do:
  • Build, maintain & develop new and existing models (classifications, recommendations, etc.).
  • Dive into state-of-the-art algorithms and deep learning models to create recommendation systems, predict user behavior, optimize user retention, advertiser’s ROAS (return on advertising spend) and other dynamic values.
  • Drill into data from various sources to generate insights and discuss them with your colleagues.
  • Act as an advocate for data-related topics in the company and become the go to person in your area of expertise.
  • Who you are:
  • You have 5+ years of professional experience in the Data Science field.
  • You have a strong knowledge of Python, R, Scala, Julia or similar typical programming languages for Data Science.
  • You have experience drilling into large amounts of data coming from various sources – including AWS Athena, Kafka, Spark, Flink, S3, MySQL.
  • You have experience developing deep learning models and have applied them already in a production environment with large amount of traffic (>1 million predictions per day)
  • You are able to dive deep into mathematical foundations and explain complex topics in a simple way.
  • You are a strong team player and enjoy helping others.
  • Plus: Tech & Infrastructure knowledge: Airflow, hosting models, deploying models, etc. Plus: Experience in the AdTech industry.

  • Heard of our perks?
  • Work-Life Package: 2 remote days per week, 30 vacation days, 3 weeks per year of remote work, flexible working hours, dog-friendly kick-ass office in the center of the city.
  • Relocation Package: Visa & legal support, relocation bonus, reimbursement of German Classes costs and more.
  • Happy Belly Package: Monthly company lunch, tons of free snacks and drinks, free breakfast & fresh delicious pastries every Monday
  • Physical & Mental Health Package: In-house gym with personal trainer, various classes like Yoga with expert teachers.
  • Activity Package: Regular team and company events, hackathons.
  • Education Package: Opportunities to boost your professional development with courses and trainings directly connected to your career goals 
  • Wealth building: virtual stock options for all our regular employees.
  • Free of charge access to our EAP (Employee Assistance Program) which is a counseling service designed to support your mental health and well-being.
  • We’re programmed to succeed

    See vacancies