Is Scraping Copying?
Karen Lu
A growing dispute in the digital economy concerns whether companies may restrict the large-scale collection of publicly available online data. As professional networking and business activity increasingly take place on online platforms, firms have begun using automated tools to gather and analyze user-generated information, raising questions about who controls access to and use of that data once it is posted in the public domain.
This issue presented itself in a conflict involving LinkedIn’s professional networking platform and hiQ Labs, a small data analytics company. LinkedIn hosts millions of user profiles containing publicly visible information such as job titles, work history, and education. hiQ used automated software to “scrape” this information in bulk and turn it into workforce analytics to sell to employers. After discovering this activity, LinkedIn sent hiQ a cease-and-desist letter and attempted to block further access to its site. Nonetheless, hiQ continued its data collection but was stopped through legal action. The dispute was ultimately resolved on contractual grounds because scraping was in direct violation of LinkedIn’s terms of service for any user who creates an account. But putting the contractual questions aside, there is a larger question: did hiQ violate any rights that LinkedIn had, by scraping a publicly accessible website?
This internet dispute can also be viewed through copyright doctrine as an intellectual property issue as illustrated in Feist Publications, Inc. v. Rural Telephone Service Co. Rural provided telephone service and published a directory from which it earned revenue. Feist, which also published wide-area directories, used Rural’s white pages listings without consent, prompting Rural to sue for copyright infringement. The Supreme Court held that Rural’s white pages were not copyrightable because they lacked the requisite originality, and therefore Feist’s use of the listings did not constitute infringement. “To qualify for copyright protection, a work must be original to the author….Original, as the term is used in copyright, means only that the work was independently created by the author (as opposed to copied from other works), and that it possesses at least some minimal degree of creativity…[T]he requisite level of creativity is extremely low; even a slight amount will suffice.” The Court concluded that while Rural’s efforts in compiling the directory were “industrious,” copyright rewards originality, not effort. The selection and arrangement in Rural’s listings were uncreative and therefore not protected under copyright law.
The doctrine from Feist suggests that LinkedIn has a tenuous copyright claim over the user profiles. A work is protected only if it is original to the author. Pure facts are not protected because they lack creative input, but a compilation of facts may be protected if the author makes original choices in selecting, organizing, or arranging them. Here, LinkedIn would argue that its “work” is not the substantive content of individual profiles, but the platform’s structured compilation of professional information across users. LinkedIn’s creative contribution lies in designing a standardized profile structure that prompts users to present specific categories of information (such as headline, current position, past experience, education, skills, endorsements, and connections) that together form a coherent picture of professional identity and employability. LinkedIn could contend that these coordinated design decisions represent expressive choices about how professional information should be organized and experienced by viewers, distinguishing the platform from a mere repository of uncopyrightable facts.
However, hiQ would counter that the underlying profile information consists primarily of uncopyrightable facts created and supplied by users themselves. Although compilations may receive limited copyright protection, LinkedIn’s arrangement resembles a conventional resume format conforming largely to preexisting industry expectations rather than creative selection or original expression. Moreover, the facts of the dispute do not sufficiently indicate that hiQ reproduced LinkedIn’s visual layout, formatting, profile architecture, or other expressive design elements. Because copyright does not prohibit the extraction or use of uncopyrightable facts, hiQ would contend that it copied only factual information supplied by users, not LinkedIn’s protected expression. Accordingly, LinkedIn likely does not have a successful copyright claim over its user profiles because the compilation of factual information is not sufficiently original and creative.
Although I am skeptical of characterizing LinkedIn’s compilation of user profiles as “creative,” since most of the content’s originality stems from the users’ own experiences, I would disagree with the conclusion primarily for public policy reasons. As a LinkedIn user and a user of many online services that collect personal data, I would prefer my information to be controlled by fewer rather than more business entities. This is especially true when obscure companies scrape personal information without users’ consent or knowledge, or even without the knowledge of the platform hosting it. At the same time, the Court in Feist rejected the “sweat-of-the-brow” theory because rewarding effort alone would effectively create monopolies over facts. In that sense, hiQ could present a compelling policy argument that restricting access to data may undermine the constitutional goal of promoting progress by keeping facts free for public use.
Karen Lu is a law student at the American University Washington College of Law.
Image: Open Grid Scheduler / Grid Engine, LinkedInOfficeToronto2.
Related
- Disappearing Digital Media
- Influencers and Originality
- Data Property in a Platform World
- Kentucky Bourbon, Printer Ink, and Property Law
- Skorting the Law
- Takings of Virtual Places
- From AM Radio to AI Music
- When a Country Sinks into the Sea
- New Technologies in Old Barrels
- An Uncooperative Basketball Franchisee
- The Warhammer of Fair Use
- Taking the Internet by Law
- Drawing Outside Copyright's Lines
- Giving a Voice to Publicity Rights
- The Human Values Behind Drug Compounding