UNIVERSITY of NOTRE DAME

PART I. Scraping Public and Private Data: Does the CFAA Apply?

Data scraping is a method of gathering and automatically downloading targeted data from a data source. Scraping, in itself, isn’t illegal. For instance, you could employ specialized software (“bots”) on your own website. However, scraping using bots is most commonly employed on third-party websites without the owner’s permission. The process can be performed by way of human scrapers; however, bots are most frequently used because they are capable of processing and downloading immense amounts of data at a much higher rate than humans. In either case, the result is the same: Entity A visits Company B’s website and downloads data without B’s permission. In addition to employing bots without authorization, scraping raises further legal questions when bots are unleashed without regard to the owner’s Terms of Service, when the bot operator continues scraping activities after the site owner has issued a cease and desist letter, or when bots are employed in an abusive manner, resulting in disruption of website traffic.

Scraping is also commonly used in an attempt to gain a competitive advantage. After scraping data from someone else’s website, companies can aggregate and then repackage that same data to generate their own products and services. As an example, a web scraper could extract and analyze data from the National Weather Service and then repackage this data as a new service. He could then sell the data to airlines, tourism companies, or other entities whose businesses are largely weather-dependent. The data would be presented as an entirely new product/service even though it was initially generated by (and then taken from) another party. Lawmakers, lawyers, and judges alike have struggled to conceptualize these issues within the existing legal landscape as data scraping becomes increasingly common.

The Computer Fraud and Abuse Act (“CFAA”)1 has been one of the main legal tools used by website owners to challenge illegal scraping activities. Section (a)(2)(C) of the CFAA creates civil and criminal liability for any person who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains . . .  information from any protected computer.”2 As the Supreme Court has explained, the statute creates two forms of improper access: (1) obtaining access without authorization or (2) obtaining access with authorization but then using that access improperly.3

In recent years, when determining whether scrapers are liable under the CFAA, an important distinction has emerged: Whether the targeted data was public or private.  Courts have generally found that certain uses of data-scraping software to gain access to private, password-protected data are actionable under the CFAA. But what about public data? Is the CFAA also applicable to the scaping of data on public websites? The act of making data publicly available on the internet implies, at least according to some, that the owner of the website is impliedly granting permission to web users to access the data. Yet, some owners have contended that CFAA liability should apply when the owner revokes this implied permission through a cease and desist letter. 

Two cases, both from the Ninth Circuit, are frequently cited to support claims of CFAA liability for scraping private data: Facebook, Inc. v. Power Ventures, Inc.4 and United States v. Nosal (Nosal II).5

In Power Ventures, the court held that “a defendant can run afoul of the CFAA when he or she has no permission to access a computer or when such permission has been revoked explicitly.”6 In this case, the defendant operated a site that extracted and aggregated users’ social networking information from Facebook. The defendant had obtained this data by accessing password-protected Facebook member profiles.7 Upon realizing the presence of the defendant’s bots, Facebook sent a cease and desist letter demanding that Power Ventures stop accessing information on users’ pages.8 Yet, Power Ventures refused to cease its scraping operations.9 Subsequently, Facebook filed suit in the Northern District of California. When looking at the issue of CFAA applicability, the court found that Power Ventures had, in fact, violated the CFAA by continuing to access Facebook’s servers “without authorization” after receiving written notice from Facebook.10

In United States v. Nosal (Nosal II), the Ninth Circuit attempted to clarify the meaning of “without authorization” within the context of the CFAA.11 The defendant in this case, David Nosal, had resigned from his position at an executive search and recruiting company, Korn/Ferry International.12 In the process of departing from the company, Nosal had agreed not to compete with Korn/Ferry for one year.13 Yet, a few months after leaving, Nosal recruited three of the company’s current employees to help him start a competing business.14 The employees, using their company usernames and passwords, downloaded a significant volume of highly confidential and proprietary data from company computers, including source lists, names, and contact information for executives.15 Nosal then used this information to start his own contracting business. When Korn/Ferry received a tip advising that Nosal was conducting his own business in violation of his non-compete agreement, it filed suit. The Nosal II court ultimately found that Nosal had violated the CFAA when he used login credentials to gain access to his former employer’s computer systems after his “credentials [had been] affirmatively revoked.”16 The court reasoned that because Nosal’s authorization had been revoked when he left the company, his actions therefore constituted accessing a protected computer without authorization. 

Together, Power Ventures and Nosal II are two of the leading cases creating liability for scraping under the CFAA. Yet, in both instances, the targeted data was private data. Until recently, the question of liability under the CFAA forpublic data remained largely unclear. However, a recent, monumental case speaks to the applicability (or lack thereof) of the CFAA when the targeted data is public.

In hiQ Labs, Inc. v. LinkedIn Corporation, the plaintiff was a data science company that develops tools to help corporate HR departments by “providing information to businesses about their workforces based on statistical analysis of publicly available data.”17 The company’s entire business model involved scraping data from LinkedIn’s site and, from that data, developing tools to help corporate HR departments. After years of engaging in this practice, LinkedIn sent hiQ Labs a cease and desist letter threatening action under the CFAA.18 At the same time, LinkedIn employed “various blocking techniques designed to prevent hiQ’s automated data collection methods.”19 Two weeks after receiving the cease and desist letter, hiQ labs filed a lawsuit in the Northern District of California, requesting a declaration that its scraping of LinkedIn’s data was lawful.20 hiQ further contended that LinkedIn’s actions “constitute[d] unfair business practices” and “contend[ed] that LinkedIn’s actions constitute[d] a violation of free speech under the California Constitution.”21

The hiQ court first looked at the balance of hardships and then at the question of CFAA applicability.22 In weighing the potential hardships, the court seemed to heavily consider the fact that, in the absence of injunctive relief, hiQ would likely “suffer immediate and irreparable harm because its entire business model depends on access to LinkedIn’s site.”23 The court noted that, if LinkedIn prevailed, hiQ would “simply go out of business . . . breach its agreements with its customers . . . and shutter its operations.”24 For its part, LinkedIn argued that it too would face significant harm as a result of hiQ’s data collection practices, contending that such practices threatened the privacy interests of its users.25 However, after hearing LinkedIn’s argument, the court noted that “there [were] a number of reasons to discount to some extent the harm claimed by LinkedIn.”26 Among other reasons, the court reasoned that LinkedIn’s professed privacy concerns were “somewhat undermined by the fact that LinkedIn allows other third-parties to access user data without its members’ knowledge or consent.”27 In the court’s view, the balance of hardships tipped heavily in hiQ’s favor.28

Next, the court reviewed the key CFAA question at hand: Whether hiQ’s continued access to the LinkedIn public profiles violated the CFAA. More specifically, the court considered whether, “by continuing to access public LinkedIn profiles after LinkedIn ha[d] explicitly revoked permission to do so, hiQ ha[d] ‘accessed a computer without authorization’ within the meaning of the CFAA.”29 In arguing that hiQ had, in fact, violated the CFAA, LinkedIn relied primarily on Power Ventures and Nosal II. Although both cases upheld CFAA liability, the fact that the data in those cases involved private data, rather than public data, did not go unnoticed. As the hiQ court put it, both Power Ventures andNosal II involved “unauthorized intruders reach[ing] into what would fairly be characterized as the private interior of a computer system not visible to the public.”30 The hiQ court distinguished Power Ventures and Nosal II from the case at hand by explaining that LinkedIn was attempting to apply the CFAA to the defendant’s aggregation and downloading of public data. 

But how does one determine when the virtual world is considered open versus private? According to Professor Kerr, authorization in the context of the CFAA should be tied to an authentication system, such as password protection.31 In this way, an authentication requirement would act as a gate, creating a barrier that would divide open spaces form closed spaces on the web.32 Kerr specifies that the web is generally perceived as “inherently open,” so CFAA liability should not apply.33 While it is “generally impermissible to enter into a private home without permission in any circumstances,” it is presumably not trespass “to open the unlocked door of a business during daytime hours because [there is a] shared [social] understanding that shop owners are normally open to potential customers.”34 These same social norms can, at least in some sense, be applied to the virtual world as well, where password-protection indicates a desire to keep data locked and private. Notably, this physical-world approach also “square[s] with the results in both Nosal II and Power Ventures while avoiding the negative consequences of an overly broad reading of ‘authorization.’”35

hiQ also pointed to the fact that the CFAA was never intended to be applied to public internet data. That is, “[t]he CFAA was not intended to police traffic to publicly available websites on the Internet – the Internet did not exist in 1984.”36 Instead, the CFAA was intended to deal with “‘hacking’ or ‘trespass’ onto private, often password-protected mainframe computers.”37 hiQ further added that applying the CFAA to the accessing of public data “would have sweeping consequences well beyond anything Congress would have contemplated.”38 On the other hand, according to LinkedIn’s interpretation of the CFAA, a website could revoke authorization with respect to “any person, at any time, for any reason, and invoke the CFAA for enforcement.”39 In this way, merely viewing a website “in contravention of a unilateral directive for a private entity” would constitute a crime, thereby “effectuating the digital equivalence of Medusa.”40

The court, opting for Professor Kerr’s “physical-world” approach, concluded that hiQ had raised serious questions as to the applicability of the CFAA to its conduct. The court ultimately granted hiQ’s motion for a preliminary injunction, enjoining defendant LinkedIn from preventing hiQ’s access, copying, or use of public profiles on LinkedIn’s website. An appeal of this decision is currently pending. Nonetheless, scrapers should not interpret hiQ as a green light for scraping public data. Scrapers should note that the hiQ decision focused specifically on sites which make data “publicly available” and that, without additional case law or legal direction, the exact meaning of “access without authorization” remains somewhat unresolved. The Ninth Circuit expressed clear reluctance in hiQ to apply the CFAA to the scraping of public data, so it will be interesting to see how other circuits approach the same issue in coming years.

Note: At the time this post was written, only the Northern District of California had reviewed the case, ruling in favor of hiQ. Now, the Ninth Circuit has affirmed the district court’s grant of a preliminary injunction in favor of hiQ. Click here to read the follow-up post.

Notre Dame Journal on Emerging Technologies ©2020  

Scroll to Top