Christina Animashaun/Vox

How covert code enables your phone’s apps to spy on you.

Open Sourced logo

In the earlier days of the coronavirus pandemic, an animated map from a company called Tectonix went viral. It showed spring breakers leaving a Florida beach to return to their homes across the US, as a series of tiny orange dots congregating on a beach in early March scattered across the country over the following two weeks.

“It becomes clear just how massive the potential impact of just one single beach gathering can have in spreading this virus across our nation,” the video’s narrator said. “The data tells the stories we just can’t see.”

But there was another story there that most of us can’t see: how trackers hidden in smartphone apps are the source of incredible amounts of specific data about us, much of which gets sent to companies you’ve never heard of. This has been going on for years and is an essential part of the mobile app economy. But it took the Covid-19 pandemic to bring some of these companies, and what they’re capable of, to the forefront.

Your phone is the ideal tool for advertisers and data brokers, both as a means of collecting your information and serving you ads based on it. This is usually done through software development kits, or SDKs, which these companies provide to app developers for free in exchange for the information they can collect from them, or a cut of the ads they can sell through them. When you turn on location services for a weather app so it can give you a localized forecast, you may be sending your location data back to someone else.

That’s how X-Mode got the data that was used to create Tectonix’s spring breakers map. A company called Unacast used trackers in its SDK to grade counties on how well their residents socially distanced and stayed indoors. Then there’s Cuebiq, which collected location data through its SDK and shared that information with the New York Times for multiple articles about how social distancing changed as stay-at-home orders were lifted and states reopened. This was just a few months after the newspaper gave Cuebiq’s location collection practices a much more critical eye in an expansive feature, and shows a possible shift in public opinion now that this invasive data might be used to save lives or hasten the return to normality.

We’ve also recently seen how this data can be used in ways that many would argue do not contribute to the public good. A recent Wall Street Journal article revealed that location data was not just being sold to marketers or data brokers but also to law enforcement, where it was used to help catch undocumented immigrants. More recently, a data company called Mobilewalla boasted of its ability to track protesters’ cellphones, and despite such data supposedly being anonymized, the company claimed it could identify protesters’ age, gender, and race.

While most, if not all, apps on our phones use several SDKs, the people who use those apps rarely understand what they are or how they can be used to collect their data and power a massive economy behind the scenes. Here’s how it all works.

What is an SDK and how does it track me?

SDKs themselves are not trackers, but they are the means through which most tracking through mobile apps occurs. Simply put, an SDK is a package of tools that helps an app function in some way. Apple and Android offer operating system SDKs so developers can build their apps for their respective devices, and third parties offer SDKs that allow developers to add certain features to those apps quickly and with minimal effort.

“The name of the game for the past dozen years has been to make it as easy as possible for people to develop apps,” Norman Sadeh, director of Carnegie Mellon University’s Mobile Commerce Laboratory and e-Supply Chain Management Laboratory, and co-director of its MSIT-Privacy Engineering Program, told Recode.

Mark Makela/Getty Images
Software Development Kits, or SDKs, are used by app developers to build profiles of its users. However, that information is often sent not only to the app but to third parties that sell the data to marketers and data brokers.

For instance, if a developer wants to let users sign into an app with their Facebook accounts, they’d want Facebook’s Login SDK. If their app needs maps or map data, they could use Google’s Map SDK. Without SDKs, developers would have to build those things entirely from scratch. That’s time-consuming and could be beyond a small developer’s abilities or budgets. SDKs may also help apps communicate with third parties through what is called an Application Programming Interface, or API. Using the Facebook Login SDK as an example again, the SDK helps the developer build and implement the sign-in feature in their app, while the API allows the app and Facebook to communicate with each other so the sign-in can happen.

“You’ve got now all these third-party APIs and libraries that have been introduced into this ecosystem, whether it’s for advertising, to connect to social networks, for analytics purposes,” Sadeh said. “This ecosystem has become extremely complex, and the data flows that result from all this are extremely diverse and very, very concerning.”

Sometimes, SDKs collect and send data back to the third party that provides them, which isn’t part of the app’s functionality. A few months ago, Zoom’s iOS mobile app was caught sending extra data to Facebook through its SDK, which Zoom said was unintentional. Many other apps have done the same.

Here’s where the tracking comes in. The data your device’s app sends to a third party can be used to build a profile of the app’s user, which advertisers can then use for targeted ads. You likely don’t even know what data is leaving your device, how it can be used to track you, or where it’s going. Location data gets the most attention because it feels the most invasive (as the New York Times put it, “Your apps know where you were last night, and they’re not keeping it secret”), but there are plenty of other ways to track you or make inferences about who you are to target ads to you. Companies want to put their SDKs in as many apps as possible in order to collect as much information from as many people as possible. Even developers may not even know (or care) when and how their users’ privacy is being invaded.

“If I’m a startup, I’m bootstrapping an app really quickly — I need to make something fast. I just bundle a bunch of SDKs in there, compile the app, and ship it off to the App Store,” Sean O’Brien, founder and executive director of the Yale Privacy Lab, told Recode. “And I may not even be aware, literally, as a developer, what is in my own app.”

There have also been stories of SDKs that intentionally and maliciously grab much more data than they’re supposed to, possibly without the developers’ knowledge, and certainly without the user’s. O’Brien recommends that developers do privacy audits on their apps to avoid this, but that’s not always something that even large companies like Zoom want to allocate resources to do.

The App Ecosystem

Tracking via SDK is firmly, perhaps inextricably, entrenched in the app ecosystem. In this way, it’s similar to the internet. Pretty much everything we do online has been tracked and monetized since the start (see: cookies). Because apps are on the device itself, rather than accessed through a website — and because we now use apps for so many different things and carry the device they’re on around with us throughout the day — they’re able to collect a ton of information about us.

“SDKs are kind of like the mobile equivalent of cookies at this point, but with more power,” Whitney Merrill, a privacy lawyer and technologist, told Recode.

Developers will install ad network SDKs in their apps, which lets them serve users’ targeted ads as well as collect some user data to send back to the ad network. For instance, Facebook’s ad SDK will show ads targeted to you, based on what Facebook knows about you, in any apps on your device that have the SDK — which, according to SDK and app intelligence company MightySignal, hundreds of thousands of apps do.

App Annie’s 2020 State of Mobile report. These are predominantly targeted ads that use data collected through SDKs as well as other sources, and are largely sent to apps through ad network SDKs. Free apps (and even, sometimes, the apps you pay for) usually only exist because of the money they make from ads or the location data they provide. Ads that aren’t targeted are worth less, and having to hire someone to get ads for your app costs money, whereas an ad network SDK that does it automatically is free.

Most of the companies that produce these SDKs will say that the data they collect is not personally identifiable (usually that just means it’s attached to the device ID, rather than the ID of the device’s owner), that customers must opt into its collection, and that privacy policies keep users informed about how their data is used. But privacy experts say de-identified data can often be re-identified and is never truly anonymous, especially when data brokers have so much of it from so many sources.

“The amount of data they have about us is unbelievable,” Sadeh said. “Brokers basically re-assemble all this data, and they’re pretty good at it.”

X-Mode and Cuebiq, which have SDKs in 300 and 180 apps with a location tracking opt-in rate of 55 to 85 percent and 20 to 45 percent, respectively, both told Recode that privacy is and always has been important to them, that they fully comply with privacy laws, and that they believe there is a way to preserve privacy and while also getting valuable insights about the data collected.

“I am a believer in the importance of big data,” Antonio Tomarchio, CEO of Cuebiq, told Recode. “But I’m also a believer in the fact that it has to be done with the right framework.”

How you can minimize your exposure to SDK tracking

Over the years, app stores and operating systems have cracked down on some of this tracking. They’ve allowed users to select which apps can have access to certain parts of their phone, closed loopholes that allowed apps to track locations even with GPS services turned off, and created advertiser-specific device identifiers to obscure the device’s actual identifier — which can’t be changed and was one of the main ways data companies and advertisers tracked people across apps.

Robyn Beck/AFP via Getty Images
An Apple advertisement that reads, “What happens on your iPhone stays on your iPhone” in Las Vegas, Nevada, on January 6, 2018.

It’s a bit like playing a game of whack-a-mole; data firms are constantly looking for new ways to track users which operating systems, in turn, are constantly looking for ways to stop or better control.

X-Mode and Cuebiq both offer ways to do this directly. Most privacy experts believe it’s impossible to truly stop tracking on these devices and through their apps, but this should at least reduce it.

The uncertain future of tracking via SDK

Up until a few years ago, we largely relied on these companies to regulate themselves, which most of them say they do. But their data handling practices are often too opaque to know for sure if that’s true, and which past precedent indicates probably isn’t — the New York Times alone has gotten access to sensitive location data records not once, but twice. Only external pressure seems to have made any kind of change.

On an operating system level, Apple has instituted several privacy and control improvements over the years, and it recently announced that the upcoming iOS 14 builds on that. Among them: Apps will have to tell you that they want to track you and get your consent to do so; they’ll have to tell users what information about them is being collected by trackers and if it’s being linked to their identity.

But Apple also has to balance the needs of its App Store developers, whose business model may be dependent on ads, with the desires of its customers, who would likely prefer not to be tracked and to spend the minimum amount of effort to prevent it.

“There’s another prevailing school of thought, which is if you give people too much choice, they’ll get notice fatigue,” Merrill said. Opening a newly installed app and having to click through, say, 20 different device permissions likely isn’t the experience users want.

Merrill added, “That will be a horrible experience, because you’re getting all these pop-ups, and you’re like ‘I just want to use the darn app.’”

Apple told Recode that it’s constantly refining its OS to minimize the user data that leaves their device and is sent to apps while still enabling functionality and without forcing users to click through a bunch of permission pop-up windows.

There are also laws that require certain disclosures and consent, and there certainly seems to be momentum to enact more. Along with the European Union’s General Data Protection Regulation, there’s California’s Consumer Privacy Act. Other states are following suit with their own proposed data privacy laws, and several federal versions have been introduced. Many privacy experts believe such legislation, if done correctly, is the only way to truly regulate the data industry. The location data company CEOs say they welcome it.

“I think it’s going to legitimize and mature the industry,” Joshua Anton, founder and CEO of X-Mode, told Recode. “I think what we’re going through is similar to CAN-SPAM in the early 2000s. … Legislation is a positive thing. And I’m hoping that our company and many other companies like ours are part of the conversation in creating legislation that gives consumers more control over their location data.”

O’Brien, on the other hand, thinks the mobile ad tracking problem won’t be solved by laws, but by the same thing that created it: money.

“I do think there’s going to be a bit of a reckoning,” O’Brien said. “There already has started to be one for some of these companies especially as the economy starts tanking and as the bottom starts falling out of the targeted ad business — which seems to actually be happening. The companies pulling out of Facebook right now aren’t just pulling out of Facebook because they’re aghast that Mark Zuckerberg doesn’t moderate the platform or allowed Trump to do whatever. They are doing it because they have not seen the returns they have paid for, for a decade now, to Facebook for ads.”

Some research has now shown that targeted ads are only marginally more valuable to brands than non-targeted ones, and may even be worth less when the loss of user trust, ad network fees, and the expense of privacy law-compliant tools are factored in.

“The corporations that are traditionally funneling money into the Googles and the Facebooks and so on, they’re on very shaky ground right now,” O’Brien said. “And the ability for them to just treat Big Tech as sort of a casino, where they’re tossing money into the slot machine, that’s not going to happen much longer.”

Then again, Twitter’s advertising business suffered last year because, the company said, it had to cut down on how much data it collected (it was “accidentally” collecting too much information from users, even after they specifically asked the company not to) which would then be used to target ads. But this just goes to show that new regulations and user privacy desires are indeed having an effect on the targeted ad business — which could, in turn, lead to change.

For now, however, your data is what advertisers want and what the mobile app ecosystem has been set up to provide. If the information gathered about you through SDK trackers can be used to help stop the coronavirus, that might be a trade you’re willing to make. If it’s being used for disturbingly specific protester insights, that might not be so palatable. In the absence of good federal laws regulating how your data is collected and used, you just have to trust that location data companies and app developers really do care as much about your privacy as they say they do.

Open Sourced is made possible by Omidyar Network. All Open Sourced content is editorially independent and produced by our journalists.


Support Vox’s explanatory journalism

Every day at Vox, we aim to answer your most important questions and provide you, and our audience around the world, with information that has the power to save lives. Our mission has never been more vital than it is in this moment: to empower you through understanding. Vox’s work is reaching more people than ever, but our distinctive brand of explanatory journalism takes resources — particularly during a pandemic and an economic downturn. Your financial contribution will not constitute a donation, but it will enable our staff to continue to offer free articles, videos, and podcasts at the quality and volume that this moment requires. Please consider making a contribution to Vox today.