Data Labelling – Making Cookies Smart and OpenRTB Safe

by Editor

One of the foundational debates in digital today is around the handling of sensitive or personal data. From the debate around cookies through to issues with sensitive content, everyone from government to the tech platforms has been struggling with how to ensure that data is handled in a responsible and legal fashion.

To date, the solutions have often relied on a top-down approach – for example Google’s failed attempt to block the use of third-party cookies on Chrome – but these approaches are neither competitively fair nor sustainable on a web-wide basis. What is needed is a bottom up approach that can create a mechanism for data’s privacy or sensitivity status to be easily recognized by all.

Data labels
To answer this requirement, a proposal has been developed that would enable the creation of data labels which clearly set out the lawful purpose under which data is collected, stored used and restricted in order that recipients can understand their legal obligations with regards to that data.

The project, ‘Data Labels – a universal, open-standard method of signalling the legal basis associated with the data collection, processing, use, and restrictions in cookies and OpenRTB’ adds the option for a label to be attached to each piece of data that can be used by web browsers, OpenRTB recipients, or other entities to simply manage data in accordance with contracts and laws, improving transparency and control over the necessary exchanges of data required to support decentralized websites.

At a technical level, the data labels proposal consists of a data model that attaches a label in the form of an immutable Universal Resource Identifier (URI) field to web browser storage or OpenRTB messages. The URL links to a document describing the lawful purpose used for the collection, storage, use, any onward permitted or restricted uses associated with this data. This ensures recipients can only legally share or use the data based on rules defined by the sender and facilitates reporting on compliance against those rules to any other entity.

Labelling

The concept is perhaps best illustrated with regards to cookies, the text files that are used by web browsers to remember various information from preferences to personal information. By adding a label to a cookie indicating the lawful purpose for the collection and use of the data, the cookie ceases to be arbitrary data. For example; a cookie that does not contain personal data can be labelled as such and ensure restrictions adhere that it must be kept in its deidentified state by recipients to remain non-personal data. Likewise, data derived from personal data that might still be personal data in some conditions can be marked and restrictions applied to it.

The cookie will be made smart, capable of communicating the nature of its content.

The same approach can be used across other data types. For example, within the OpenRTB supply chain data labels would allow simple, nuanced decision to be made around data handling. The proposal could even be applied to content, labelling it as sensitive/non-sensitive as required.

Data risks

For their own, anti-competitive reasons, the tech platforms, primarily Google and Apple, have attempted to create a false dichotomy between first and third party cookies. First party cookies – those created by the website that someone is visiting – they claim, are safe and could never harm a visitor’s interests. Conversely, third party cookies – those created by another website – however, are somehow evil and pernicious and should be destroyed.

Putting aside the fact that the platforms’ primary motivation in this was to increase the value of their own, vast first party data resources and shift real-time digital communication open-web standards to proprietary mechanisms controlled by their app stores, this assumption is completely unsubstantiated. Can you think of any online harm consumers would want a first party to inflict on them?! The privacy and sensitivity of data does not reside solely in the hands of who first created or stored it, it lies in the risk of harm that the data presents which may vary based on who possesses it and the restrictions in place.

Non-personal, non-sensitive data is not a privacy risk whether it’s created by Google or an unknown tech firm. Personal or sensitive data is personal or sensitive whoever it’s created by. Indeed, some would argue that the vast pools of highly personal data, linked with historic precise location data, held by the platforms are far more dangerous than the smaller quantities of low-risk data that might be held by an independent small business, for example.

The power of contracts

The power of the data labels proposal is that it returns the handling of data to the scope of contractual law. Whilst purely technical “privacy-enhancing” projects such as the Privacy Sandbox cannot rely on a 100% technical solution for responsible data management, as they inevitably require trust in the operator and its contractual protections with senders and receivers of data. In contrast, the Data Labels approach simply assigns labels to the data it describes that can then be handled in line with the contractual obligations between the senders and receivers involved. If a publisher or advertiser has specific policies about how its customer data should be handled then its labels will describe that policy and ensure its supplier contracts spell out such obligations.

For the web browser vendor, data labels help them to retain their proper role as facilitator to real-time communication and interaction between people and web sites. Projects such as Apple’s ITP ceased supporting third party cookies by default, in effect making the browser the judge, jury and executioner in decisions about online data despite Apple being blind to how any data recipients would be handling the data communicated. Under the Data Labels proposal, the browser simply needs to read the appended label and compare it to ‘allow’ lists decided by its users and other data controllers. Web browser vendors will cease to set the rules or be a player. They will become honest brokers merely keeping score.

Bad actors

But what of bad actors? If data is labelled by those who are creating it then surely a nefarious organisation could inaccurately label data to enable them to misuse personal information? Yes, they could – and some surely will. No solution can rid the world of bad actors – short of banning interoperability some actors will lie. But data labelling will make identifying and pursuing those bad actors far simpler. If data is inaccurately labelled it will become apparent and the actor who mislabelled it can be identified and brought to justice under the terms of their contract or applicable laws.

The future

The data labels proposal is at an early stage and needs to be presented to the relevant standards bodies (W3C, IAB, IETF) to become a reality. However, it shows a route through which easily interoperable data such as Cookies, OpenRTB, or any content can be classified in an intelligent, decentralized, and nuanced way.

By making previously ‘dumb’ data somewhat smarter, the data labels proposal offers a simple solution to unlock the power of the open web whilst ensuring a safer and more compliant experience for users and businesses. It might just be the least worst option reasonable people can support!

Privacy

What happened to the W3C?

A Good Start From the DOJ, but There’s More To Do