The Usable Privacy Policy Project

Natural language privacy policies have become a de facto standard to address expectations of “notice and choice” on the Web. Yet, there is ample evidence that users generally do not read these policies and that those who occasionally do struggle to understand what they read. Initiatives aimed at addressing this problem through the development of machine implementable standards or other solutions that require website operators to adhere to more stringent requirements have run into obstacles, with many website operators showing reluctance to commit to anything more than what they currently do.

This NSF Frontier project brings together researchers from Carnegie Mellon University, the Center on Law and Information Policy at Fordham University and the Center for Internet Society at Stanford University. The project builds on recent advances in natural language processing, privacy preference modeling, crowdsourcing, formal methods, and privacy interfaces to overcome this situation. It combines fundamental research with the development of scalable technologies to:

  1. Semi-automatically extract key privacy policy features from natural language website privacy policies, and
  2. Present these features to users in an easy-to-digest format that enables them to make more informed privacy decisions as they interact with different websites.

As such, this project offers the prospect of overcoming the limitations of current natural language privacy policies without imposing new requirements on website operators. Work in this project will also involve the systematic collection and analysis of website privacy policies, looking for trends and deficiencies both in the wording and content of these policies across different sectors and using this analysis to inform ongoing public policy debates.

You can learn more about the Usable Privacy Policy Project at www.usableprivacy.org

Analysis of Privacy Policies

Policy Annotation Tool

This website provides the opportunity to explore some of the data and analysis results created in the Usable Privacy Policy Project. You can search for specific websites to see their privacy policy overlaid with highlights for specific data practices. The shown data practice statements are the result of a large-scale annotation effort of privacy policies from U.S.-based websites. We developed a fine-grained annotation scheme in our project to describe and extract privacy practice statements from privacy policies. These data practice statements relate to the collection and use of information by the first party (e.g., a website or mobile app operator), sharing of information with third parties, user access and choice options, and other relevant data practice categories. The results presented on this site were obtained using annotations crowdsourced from law students at Fordham University and the University of Pittsburgh.

This rich corpus of annotations is used in our research on machine learning, natural language processing and crowdsourcing techniques to semi-automate the extraction of annotations from privacy policies. It is also intended to provide insights into the composition and content of privacy policies, and how different policies compare with one another. We plan to make more detailed views of our data available in the future, and also plan to release larger datasets over time. We are also developing tools and interfaces to make this information more readily accessible to lay web users.

Contact Us

If you have questions concerning our project or if you are interested in collaborating with us, please contact the project’s lead principal investigator Prof. Norman Sadeh. Please subscribe to our public mailing list to receive news and updates about the project.