Natural language privacy policies have become a de facto standard to address expectations of “notice and choice” on the Web. Yet, there is ample evidence that users generally do not read these policies and that those who occasionally do struggle to understand what they read. The Usable Privacy Policy Project builds on recent advances in machine learning, natural language processing, privacy preference modeling, crowdsourcing, formal methods, and privacy interfaces to overcome this situation.
You can learn more about the Usable Privacy Policy Project, including our approach, affiliated organizations, publications, and recent news, at www.usableprivacy.org.
In addition to the human-annotated OPP-115 Privacy Policy Corpus you can also explore more than 7,000 policies that contain annotations automatically created by our supervised machine learning technique. We used the human-annotated policies to train and test our classifiers in order to evaluate the extent to which the automatic policy analysis can be scaled with minimal human intervention. Beyond the classification many other parts of the policy analysis pipeline are automated, most notably, finding privacy policy links on webpages, downloading the policies, and dividing each policy into smaller segments that can be classified with various data practices.
Please note that the automatic link detection, download of policies, and use of machine learning techniques to analyze the policies sometimes produce erroneous annotations. This remains an experimental system and we are keen to collect feedback and further improve our techniques. You may notice that our system will omit some labels rather than take a chance and have an inaccurate label. In other words, when you look at our labels, you may find that some statements made in the policy are not reflected in the labels generated by the machine learning techniques. We are aware of this, but are more interested in feedback about labels that correspond to statements that are simply not present anywhere in the text highlighted by our techniques or other similar errors (e.g. mislabeling first party collection as third party collection) made by our machine learning techniques. If you see such errors (we know there are some), please take the time to notify us by email or by using the "Feedback" button. Finally, we also realize that occasionally, the text downloaded by our system may not be a privacy policy (e.g. our system might mistakenly download Terms of Service). If you find text that is not a privacy policy, we would also appreciate if you could report it with the "Feedback" button. Thank you for your assistance!
People interested in additional details on the performance of our machine learning techniques are invited to read the following two articles:
If you have questions concerning our project or if you are interested in collaborating with us, please contact the project’s lead principal investigator Prof. Norman Sadeh. Please subscribe to our public mailing list to receive news and updates about the project.