Master in Language Learning Apps and Privacy
A blog post about my Master's thesis on privacy in language learning apps.
Hello all, Matt here from Forensics with Matt and in today’s post I am going to speak about the research I did for my master’s degree and share some of the research. This post will also be a bit different than the rest. In this post I will share a full version of the thesis that I wrote, but (like any scientific journal) I will ask that you subscribe to my newsletter with a paid subscription (as low as $8 a month) to read the full research. Otherwise, this post will be a short summary of what I worked on.
I: The Inspiration
In short, this project was inspored by my interest in digital forensics at the time.
To be a little bit less brief, I was inspired to do this project by my usage of language learning apps at the time combined with my deep interest in learning digital forensics.
II: The Original Form
Originally, this master’s project was going to be a fun Capture the Flag (CTF) event that I was going to test with college cybersecurity students. It was going to feature challenges regarding the information on a certain model of an iPhone. It also included reference videos on my YouTube channel that would guide players through challenges that they didn’t feel comfortable with.
I didn’t end up going in this direction for one big reason: my advisor. My CTF idea was shot down multiple times because my advisor didn’t feel that it had enough depth. I came up with the idea in Summer 2024 as I was playing Duolingo exercises.
III: Original Duolingo Idea
In section (II) I mentioned that I did forensics on an iPhone. One of the apps this iPhone had on it was Duolingo. I was doing this to see how Duolingo stored its information. I documented how the app stores its information in this video. At the point I had to rethink, I thought the next idea while playing Duolingo. I thought, what about the privacy characteristics of the ads in Duolingo.
At this point, I had a solid idea of what do do, but no question to guide my research. Like any desperate student who didn’t want to graduate late, I kept on pushing as dilligently as possible. I devised ways to capture and interpret the information found in the ads and make privacy determinations. I did in-depth research.
IV: Experimentation
Over the course of a few months, I did experimentation, which consisted of me playing Duolingo on a new free account I created and looking at the ads it showed, going to each of the advertisers and seeing whether they had suspicious content on their pages. This was done with the Duolingo app, as well as Babbel, Busuu, and Memrise.
In looking at the ads, I found that many, if not all of them led to legit companies online or apps. Some of these apps were legitimate and others were AI aaapps by indie developers with questionable privacy policies at best. You can find details on this in the attached report, if you’re comfortable supporting this publication.
This process went well for the few months that it lasted, but in the end, although I had thorough documentation of what I saw, my advisor, yet again, told me to rethink and I opted to compare this against the aid versions of the apps. How is this different? is a question that you may ask. The answer to that is that, when it comes to ads, they connect to certain third party domains called trackers. These trackers may be present across sites, otther apps, or specific versions of the same app. Since I found that all of these iPhone apps used Unity3D ads for serving their ads, I knew that I had to use a network analyzer to search for traffic to Unity3D trackers.
The new experimentation lasted a couple of months and involved me looking at the paid and free versions of these apps and noting whether or not they made connections to advertising trackers. The specific tracker I found across many instances was *-mediation-unity3d.com.
This domain was present across three of the free versions (Memrise did not connect to it at all!) and two of the paid versions. It was a very interesting and absolutely unexpected result. Next, I will finish by summarizing the result.
V: Results
In section (IV) I mentioned that there were connections to the mediation platform for Unity3D ads. This mediation tracker is what delivers the ads to one’s device. It requires data about the client device and the app that is to play the ad. This is the data that is transmitted in the first image below. This will be for Duolingp
Figure 1: Data that Duolingo generated.
As you can see this data set shoes a great deal of data about the clinet being used. The most important data that it shares is the device tyoe, the region and all of the device specifiers.
Notice the limitAdTrackingEnabled field in the image above as well. This field shares the status of personalized ads in the app. One can change this setting within the app in the settings menu byt toggling the “Enable Personalized Ads” option.
Busuu was the only other one that displayed ad tracker information in the network logs. This data did not have the same level of detail in the information it shared, but it did have a similar level of information.
Figure 2: Information sent to mediation servers in Busuu Premium.
None of the other apps I tested showed connections to mediation servers during the experimentation phase.
VI: Conclusions Drawn
In the end, I reverse-engineered a question that I found simple to write a report on with the info I found. I asked: do various language learning appss (namely Duolingo, Babbel, and Memrise) with free iers share any user information with ads servers on their paid versions?
In rhe end, I found the answer to this question to be that they, in fact, do share information with the mediation servers. This is not allowed for privacy reasons, mainly because the user is not supposed to get ads. Not getting ads also means that the user is not supposed to have their information taken and funnelled into ad servers.
Conclusions
Overall, I really enjoyed the research process, which is documented more thuroughly in my thesis below. I highly recommend that you read it to get the full context. It includes an in-depth literature review and a more detailed explanation of the tools I used and how they were used.
Doing research like this is very fun and I am proud to share this with you, my community. I hope you enjoy this and learn something new from it. This has been matt from Forensics with Matt, talking about my masters research. Until next time, Matt, OUT!
!!The Thesis!!
Keep reading with a 7-day free trial
Subscribe to Forensics With Matt to keep reading this post and get 7 days of free access to the full post archives.