For the past two years, researchers at Northwestern University have been analyzing the habits of tens of thousands of scientists—using Dropbox. Looking at data about academics’ folder-sharing habits, they found the most successful scientists share some collaboration behaviors in common. And on Friday, they published their results in an article for the Harvard Business Review.
The study quickly attracted the notice of academics—but not for the reason Dropbox and the researchers had hoped. One sentence in particular caught readers’ attention: “Dropbox gave us access to project-folder-related data, which we aggregated and anonymized, for all the scientists using its platform over the period from May 2015 to May 2017—a group that represented 1,000 universities.” Written by Northwestern University Institute on Complex Systems professors Adam Pah and Brian Uzzi and Dropbox Manager of Enterprise Insights Rebecca Hinds, that wording suggested Dropbox had handed over personally identifiable information on hundreds of thousands of customers.
“Before sharing the activity data with NICO, we randomized or hashed the dataset and grouped it into wide ranges to further ensure that no identifying information could be derived,” Dropbox elaborated. “In addition, our research partners at NICO are bound by strict confidentiality obligations.” Northwestern’s Pah supported that statement, telling WIRED that he and his team were never able to see any personal information or the content of any Dropbox folders or files. His team sent Dropbox citation information from the Web of Science—an index that ranks researchers according to how often their work is cited—which Dropbox then paired with folder data, anonymized and aggregated, and sent back for analysis.
Even if the personal names are removed, folder titles and file structures can potentially be used to identify individuals, according to Colorado University Boulder professor Casey Fiesler, who teaches in the Department of Information Science. In a blog post Dropbox’s Hinds published on Friday, she appears to directly address that concern, writing “information like university ranks and number of citations were grouped into ranges,” and representatives for Dropbox say the techniques they used to anonymize and aggregate the data would make reverse identification impossible, though they couldn’t share details about how that process worked.
But it still appears this research was conducted without the express consent of the thousands of customers whose information Dropbox and the researchers accessed (the HBR article originally suggested that 400,000 users’ data was analyzed, while Dropbox says that the study dealt with data from 16,000 customers). Late Tuesday HBR added a second editors’ note indicating that the researchers started with information on 400,000 “unique users” but pared the data set down to 16,000 after incorporating data from Web of Science. HBR editors also updated the article to indicate that it wasn’t 1,000 universities that were included, but rather 1,000 separate departments.
Informed consent, one of the cornerstones of academic research, is one of the things that got Facebook in so much trouble back in 2014 when it published results from its controversial “Emotional Contagion Study.” That study was never approved by an internal review board, which is tasked with maintaining ethical standards in research; since the data had already been collected by Facebook and was not identifiable, the university where it was conducted reportedly considered it IRB-exempt. Dropbox representatives said that the same was true for this study, because the data was delivered to the researchers deidentified.
Dropbox representatives told WIRED that users gave consent when they agreed to the company’s privacy terms, and pointed to a section of that policy about how data will be used to improve Dropbox services. That section reads: “We collect information related to how you use the Services, including actions you take in your account (like sharing, editing, viewing, and moving files or folders). We use this information to improve our Services, develop new services and features, and protect Dropbox users.” They also pointed to language about sharing data with third parties, which says “Dropbox uses certain trusted third parties (for example, providers of customer support and IT services) to help us provide, improve, protect, and promote our Services.”
Exactly how the study improved Dropbox services was not clear from the HBR article or the Dropbox blog post, though Dropbox representatives told WIRED the insights into how teams collaborate would help the company design better features.
Normally, research of this kind would be published in a peer-reviewed academic journal, and include clear information about authorship and the provenance of data. Because this research was presented in a non-peer-reviewed journal, it makes it very hard to assess. Hinds has not responded to request for comment from WIRED, and on Tuesday her Twitter and LinkedIn pages were deleted. Dropbox representatives would not put WIRED in touch with Hinds directly.
“What’s the secret to a high-performing team? A star player? Veteran experience? In a joint study by Dropbox and the Northwestern Institute on Complex Systems (NICO), we set out to answer questions like these,” Hinds wrote in the Dropbox blog post Friday. But academics like Fiesler and Brudy have different questions. They wonder who had access to this data, and for how long. What kinds of Dropbox accounts were affected—paid or free? Are there other studies in the works like this? Will this research be submitted for a peer review? Those answers matter for the scientists at more than 6,000 universities who use Dropbox.