Apple Searches Your iPhone for Kiddie Porn
Early reports about Apple’s new anti-child porn initiative indicated that Apple intended to scan the contents of users’ devices and computers for the presence of child sexual abuse material (CSAM), report whether such materials were found back to the National Center for Missing and Exploited Children (NCMEC) and then to law enforcement. In a blog post of FAQs, Apple attempted to clarify what it was doing. According to Apple, they will be doing two things—first, scanning files uploaded by users to their iCloud storage for CSAM, and second, allowing parents to set up accounts for children that block the sending or receiving of sexually explicit content or images (not just CSAM) in things like iMessage.
Assuming that this is all Apple does, it is substantially less troubling than scanning people’s computers and iPhones but still has both technological and legal repercussions. And that is assuming that this is all that Apple does. Let’s take each of these separately.
On the Cloud
Apple proposes to scan every file uploaded by users to see if it matches a database of billions of images of both CSAM and of “related-to-CSAM” files maintained by NCMEC, a non-profit organization to which certain entities are required to report (but not required to scan for) child pornography. Currently, internet service providers like Comcast, Verizon and others, as well as web hosting companies and social media platforms, routinely scan for CSAM (voluntarily, sort of) and report what they find to NCMEC. Cloud platforms like Microsoft’s SkyDrive have been doing the same thing, using PhotoDNA to identify suspected CSAM.
What makes what Apple proposes to do a bit different are (1) the Apple ecosystem, and (2) Apple’s promised security. The Apple ecosystem is designed to take the entire contents of multiple devices—your iPad, your iPhone, your MacBook, etc.—and seamlessly upload that data to Apple’s iCloud. The iCloud is intended to seamlessly act as an extension of your device; not just to back up data for preservation, but to act as part of the device itself. Take a picture? It’s on the cloud (as victims of The Fappening—related to leaked nude pictures of celebrities—learned). No need to buy a phone with a TB of storage, since you can use iCloud. Store and access corporate documents, chat logs, messages, etc. not on the device itself, but in the cloud. It’s quick. It’s easy. It’s effective. It’s secure. Right?
From a technical and legal perspective, however, there is a huge difference between accessing a file on your device and accessing the same file on your Apple iCloud account. The “transfer” from one drive to another likely means that you have abandoned—to one degree or another—your expectations of privacy as it relates to those files. It means that Apple can now be compelled to produce your files if requested. It means that they can be compelled to “hack” your user ID and password and deliver the contents of your files—without your consent and without your knowledge. It means that you can no longer assert things like privacy rights and privileges with respect to these documents (at least not initially) because you simply don’t know that they are being given to some law enforcement agency or some private litigant. It can also mean that your documents and records are being simultaneously stored in Cupertino, California and in Dublin, Ireland or Belarus. Whatever meets Apple’s business needs. That’s the nature of the cloud. The cloud—like the honey badger—don’t care.
All of this is in the nature of the cloud, and it’s no different for Microsoft’s OneDrive, Google’s Drive or Apple’s iCloud. Except that Apple (like MS) ties the drive to the hardware. If you have an Apple device, you pretty much have to set up an Apple account. Pretty much. And while you don’t have to use iCloud, there are tremendous incentives to do so.
If you own or use an Apple device, Apple will search the contents of the hard drive for certain files, compare these files against a list of files prohibited by the U.S. government, and then, presumably, inform on its customers. But don’t worry—the stuff for which they will turn their customers in is only CSAM, which is prohibited by law. While the goals are laudable—prevention of the dissemination of child pornography, as well as the child sexual abuse necessarily implied in the creation of such CSAM—the means—having a multi-billion dollar company searching all of the files on any device they manufacture—are creepy at best, and provides ample opportunity for abuse either by Apple, by governments or by hackers.
News reports about Apple’s plans are somewhat cryptic about the how and when, but they note that “The software uses a matching technique, where photos stored on iPhones will be scanned and then compared with known child pornography.” Typically, this would involve a so-called MD5 or other hash matching formula. The government, or some quasi-government agency like NCMEC with government support, creates a massive database of “known” (that is to mean “suspected”) child pornography. They then create a hash function of the photo that contains that image. Hashes of the unknown files on the Apple user’s computer are then compared to the hashes of the “known” CSAM, and if there’s a match, they are the same picture. In fact, according to Apple, “there is a one-in-a-trillion chance of a person being incorrectly flagged.” Therefore, the image on the computer is kiddie porn, right?
The problem with the hash matching approach is that it can be massively underinclusive and slightly overinclusive. Because it is looking for matching hashes—that is, “identical” files—any changes or modifications to either of the “matching” files—a change of a single bit—will result in a different hash function and, therefore, a mismatch of the files. Open a picture, change the contrast, apply a filter, save it, and voila! A different picture that doesn’t match the original. Additionally, the NCMEC CSAM database contains something close to hundreds of thousands, if not millions, of hashed images and growing. (A sad commentary on the population and its propensities).
While the database attempts to distinguish between images of pornography that include nudity and that which does not, or pornography which includes 17-year-olds from that which includes those who have passed their 18th birthday (but are trying to look younger), or those which are sexual as apposed to sensual, or those which are actual and those which are purely virtual (under current law, the images must be of an actual minor), the database as a whole may include images which are not, as a matter of fact and law, CSAM. For a subject with thousands of images, this likely is not a problem. With a person who has just one, the problem of false identification is enhanced. Plus, “mere” possession of CSAM is not a crime. The law requires such possession to be “knowing.” A computer program can help identify characteristics that a jury could use to distinguish a “knowing” from an unintentional “possession,” e.g., how and when the image was taken, the context, whether the image is a full .tiff or .jpg image or merely a stub or placeholder, how many images, whether it was “searched for” in the browser, or simply found inadvertently, etc., but it would be a mistake to say that the programs look for crimes. They look for images, possession of which may—or may not—be crimes (but usually are). Also, the program will have to exclude from coverage certain law enforcement officials and employees of NCMEC as well, right? And that’s just for hash matching.
Because of the problems with hash matching, those in the CSAM enforcement community have shifted their efforts to image analysis programs—and AI-enhanced image analysis. That’s part of what makes the Apple proposal so scary. First, Apple does not say how it proposes to detect CSAM. While AI programs are pretty damned good at looking at images and saying “This is a person,” or “This is a person without clothes on,” or “This is a child,” or “This is a child without clothes on,” based upon things like the amount of “skin-colored” pixels versus “non-skin-colored pixels” or what is called the “Bag of Words” model to define context, other AI models include using things like dense trajectories, time interest points or temporal robust features to attempt to distinguish pornography from non-pornography and CSAM from pornography. Of course, they cannot take into account the MTA problem—the New York City subway experience wherein you ask, “Is that dude really doing what I think he is doing?”
The problem of distinguishing pornography in either static or moving images and CSAM from adult pornography is computationally difficult and error-prone. The use of AI protocols, including deep learning solutions, convolutional neural networks (CNN) and recurrent neural network (RNN) architectures can help “train” computers to “learn” what people think is CSAM, but this introduces elements of randomness and lack of accountability into the program. When asked why the computer program thinks the image is CSAM, the answer will no longer make reference to the design and parameters of the program, pixel density and the like. The “answer” will be, “Why, artificial intelligence, my dear Watson!” as if that alone settled the argument.
Because these AI programs are not “perfect,” (and may not even be “good”) Apple has an infallible backup system. The Cupertino company said “Each instance [of suspected CSAM] will be manually reviewed by the company before an account is shut down and authorities are alerted.” I really would not want that job. This presupposes that humans are better than computers or even “good” at detecting CSAM. As Justice Potter Stewart famously wrote in Jacobellis v. Ohio about adult obscenity after setting out the legal standard, “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.” People—cops, judges, juries and Apple employees—are the most dangerous combination; they are bad at distinguishing CSAM, but think they are good at it. Canadian courts and UK standards agencies banned American Apparel ads as child pornography. And the FBI investigated Calvin Klein for its “suggestive” ads involving children. While “true” CSAM (children engaging in sexual activity) may be easy to spot (it may not, as noted with respect to actual age and actual people), on the margins, this is really really difficult and involves judgment calls. And that involves human sensibilities and judgment. And that is difficult.
Some years ago, a friend sent out a New Years’ card which included her toddler’s “butt shot,”—that is, an image of her toddler walking away from the camera with a naked butt, with the caption, “Here’s to the end of another year!” You know, the kind that you can find at Walmart or CVS. It took almost seven years, tens of thousands of dollars in legal fees, and the judgments of half a dozen trial and appellate courts, as well as dozens of foster families for the client to get custody of her children back from the Missouri authorities. Such cases, while rare, are not an anomaly—particularly in our understandable desire to “protect children.” In addition, the computer programs don’t distinguish between images that are taken of minors and images that minors take of themselves. Thus, a 17-year-old that takes a sexually suggestive picture for his or her boyfriend or girlfriend (or a video) will be flagged by the Apple program for possession (and creation) of child pornography and is subject to arrest and conviction. This may also flag streaming video or the use of things like FaceTime for racy video chats between minors. The program cannot tell the difference between two 17-year-olds “sexting,” and a 50-year-old guy in his mother’s basement “luring” some 15-year-old into performing sex acts for his benefit. It just knows nudity and sexuality—if it even knows that. While Potter Stewart might “know it when he sees it,” others may disagree. Context is important, and the AI program lacks the necessary context. So does the human.
OK, so, big deal. The point here is to “flag” bad stuff for investigation—the ultimate decision is up to the cops, right? Improper flagging can have horrible and even deadly consequences. Plus, there’s a huuuuge privacy problem here.
Private Search
Let’s get one thing out of the way first. Under the PROTECT Our Children Act of 2008, companies like ISPs and other online providers (including social media companies) may (but are not required to) search for CSAM that is on their networks — including a user’s “private” use of the network. If they find evidence of such CSAM, they are required to report it — initially to NCMEC and ultimately to law enforcement. Moreover, if they don’t report it, they run the risk that they will be deemed to now unlawfully “possess” the CSAM, since they now have knowledge of its existence on their network. Most courts have ruled that, even with the mandatory reporting requirements and the close cooperation between ISPs, social media and law enforcement, these “searches” are private searches not governed by the Fourth Amendment, but rather governed by the terms of the End User Agreement, or Terms of Service. As the United States Court of Appeals for the Fifth Circuit recently noted, likening a search of hash functions by an ISP to the inadvertent opening of a package by Federal Express, and then disclosure of the findings to police:
“When [the defendant] uploaded files to SkyDrive, Microsoft’s PhotoDNA program automatically reviewed the hash values of those files and compared them against an existing database of known child pornography hash values. In other words, his “package” (that is, his set of computer files) was inspected and deemed suspicious by a private actor. Accordingly, whatever expectation of privacy [the defendant] might have had in the hash values of his files was frustrated by Microsoft’s private search.”
Once the “private search” finds something, essentially, all bets are off. But in the cited case, the defendant did something with the image. They uploaded it to cloud-based storage SkyDrive. As the court noted, “SkyDrive uses a program called PhotoDNA to automatically scan the hash values of user-uploaded files and compare them against the hash values of known images of child pornography.” It’s the same thing that other “providers” do when files are uploaded to social media, shared, transmitted or stored online. Your cloud is their cloud.
The idea that private companies have the authority to search users’ data and then turn over the results to law enforcement for a more intrusive (and possibly warranted—in both senses) search is itself alarming. In the fight against CSAM, we have created a gigantic legal loophole. If a company uses a cloud provider for storage, analysis or transmission of its data, that provider considers that it has the legal authority to examine the contents of every file and communication that the company entrusts to the cloud provider, and reserves to itself the authority to report to the cops anything it finds. Well, at least if it’s CSAM. Well, or terrorist activity. We don’t like that. Or tax fraud. Or drugs. Or securities fraud. Or racism. You get the idea. If an automated program can look for “evil” and people who use cloud services in furtherance of evil have abandoned their expectations of privacy (with respect to both private searches and subsequent police searches) then we need to address exactly what companies’ and users’ expectations of privacy are in data they think they are securely uploading to cloud providers. If Google is “reading” my email, then can I still assert attorney-client privilege when I use supposed end-to-end encrypted communications, or is the presence of a third party outside the scope of the privilege a waiver of the privilege? Can I still assert that something is a trade secret if I know that some tech company is examining it, and that, even though I think I am taking reasonable precautions to restrict access, those precautions are dependent upon the platform that reserves the right to examine my files? In for a penny, in for a pound. Privacy is sometimes binary—you either have it or you don’t. If Microsoft or Google or Facebook or Verizon can scan my emails or uploaded documents for CSAM, why can’t they turn their AI to other vices or harms?
OK. All of these issues predate the Apple announcement. What makes the Apple proposal more invasive and more frightening from both a technical and legal perspective? All of the other scanning technologies depend on the user transmitting the potentially offending material. Verizon scans transmitted files. Facebook scans files uploaded to the social media site. Google scans emails. Others scan text messages, SMS or other communications. Maybe—just maybe—the act of uploading or transmitting something (over what you think is a secure and encrypted means) constitutes a diminuation of privacy. Certainly, by posting something to Facebook, YouTube, TikTok, etc., you give up some privacy, right?
Apple’s proposal is fundamentally different—and in ways that should frighten not only civil libertarians but technologists, as well. Apple’s proposed system searches files on devices, not just files in storage on the cloud. It also searches supposed end-to-end encrypted communications. And, according to the announcement, “The new initiative will not be limited to just photos. It will also scan messages sent using Apple’s iMessage service for text and photos that are inappropriate for minors.”
So much scary.
We have long been told that we have the ability to encrypt—with strong encryption that cannot be broken (well, until a new protocol comes around)—the contents of devices. While governments may be able to force us to decrypt our devices, encryption at rest and encryption in transit is the bedrock principle behind a great deal of data security. We presume that networks, transmission protocols and devices are insecure, and rely on end-to-end and whole device encryption to provide a level of security in an insecure world.
This breaks that.
It means that device encryption is fake. It means that Apple can scan what we think are secured devices and data on those devices. And if Apple can, the NSA can. And if the NSA can, the FBI can. And if the FBI can, the IRS can. And if the IRS can, the GRU can. And if the GRU can, then FancyBear can. And so on and so on. Insecure is insecure is insecure. There is no middle ground.
From a legal perspective, it weakens Apple’s argument—asserted in the San Bernardino terrorist case—that it was unable to examine the contents of the terrorists’ phone and that the courts could not force it to develop or deploy a mechanism to “crack” the encryption under the All Writs Act. If you can scan for CSAM, you can scan for other things—especially with a warrant.
The same is true for supposed end-to-end encrypted communications. If Apple can scan these communications for CSAM, then they ain’t end-to-end encrypted or the “end” ends with Apple, not the user. Steve Jobs becomes the “man in the middle” of every communication that uses an Apple device or software. (Well, Tim Cook, but you get the idea.) Again, if you can scan for CSAM, you can scan for anything—or you can be forced to scan for it.
Equally disturbing is the fact that the Apple proposal does not just scan for CSAM. As noted, image detection is notoriously hard to do, but it’s a discrete problem. The announcement noted that Apple will scan messages sent using Apple’s iMessage service for text[s] … that are inappropriate for minors.”
Wow. Just. Wow.
So if a 17-year-old sends or receives a message that they are “DTF” (an obvious reference to the New York State Department of Taxation and Finance) Apple may determine that this is “inappropriate” for that minor, and flag and report that message to … child protective services?! While a tool to alert parents of dangerous activities by minors would be kinda cool (but also incredibly invasive), this is different. There’s no consent by the minor or an adult to this text analysis. Putting aside the myriad judgments involved in deciding what is “inappropriate” for minors, this puts Apple in the business of reading everyone’s messages and determining propriety. Also, it puts them in the business of knowing who is a minor and who is not. And it creates a powerful tool for offline, client-side, encryption-bypassing context-specific data analysis. Nothing could go wrong there, right? Oh, and since the company can scan files at rest, from a legal standpoint, the owner of the at-rest (and theoretically encrypted) files has lost their expectation of privacy in those files, so all bets are off.
File this one under “great intentions, terrible idea.” I’m sure that teams of Apple lawyers wanted to deter the scourge of CSAM and child endangerment, which is a great idea. But the solution subjects hundreds of millions of people and tens of billions of files and messages to decryption and scanning by computers and humans, and ultimately by law enforcement agencies. Bad Apple. Bad.