A data schema is a description of a set of data. P3P includes a way to describe data schemas so that services can communicate to user agents about the data they collect. A data schema is built from a number of data elements , which are specific items of data a service might collect.
Data elements in a data schema can have the following properties:
Data elements are organized into a hierarchy. A data element automatically includes all of the data elements below it in the hierarchy. For example, the data element representing "the user's name" includes the data elements representing "the user's given name," "the user's family name," and so on. The hierarchy is based on the data element name. Thus the data elements user.name.given, user.name.family , and user.name.nickname are all children of the data element user.name , which is in turn a child of the data element user .
P3P has defined a data schema called the P3P base data schema that includes a large number of data elements commonly used by services.
Services may declare new data elements by creating and publishing their own data schemas. This is done with the <DATASCHEMA> element. These can either be published in standalone XML files, which are then referenced by policies that use them, or they can be embedded in the policies files that reference them. The <DATASCHEMA> element is defined as follows :
A standalone data schema has the <DATASCHEMA> element as the first XML element in the file. It must have the appropriate namespace defined in the xmlns attribute to identify it as a P3P data schema, as follows:
<DATASCHEMA xmlns="http://www.w3.org/2001/09/P3Pv1"> <DATA-STRUCT ... /> ... <DATA-DEF ... /> </DATASCHEMA>
When a data schema is declared inside a policy file, then the <DATASCHEMA> element is still used (as described in Section 3.2.1, "The <POLICIES> Element"), but no namespace attribute is given.
5.1 Natural Language Support for Data Schemas
Data schemas contain a number of fields in natural language. Services publishing a data schema MAY wish to translate these fields into multiple languages. The data element short and long names MAY be translated, but the data element name MUST NOT be translated ”this field needs to stay constant across translations of a data schema.
If a service is going to provide a data schema in multiple natural languages, then it SHOULD examine the Accept-Language HTTP request-header on requests for that data schema to pick the best available alternative.
5.2 Data Structures
Data schemas often need to reuse a common group of data elements. P3P data schemas support this through data structures. A data structure is a named, abstract definition of a group of data elements. When a data element is defined, it can be defined as being of an unstructured type, in which case it has no child elements. The data element can also be defined as being of a specific structured type, in which case the data element will be automatically expanded to include as sub-elements all of the elements defined in the data structure. For example, the following structure is used to represent a date and time:
<!-- "date" Data Structure --> <DATA-STRUCT name="date.ymd.year" short-description="Year"/> <DATA-STRUCT name="date.ymd.month" short-description="Month"/> <DATA-STRUCT name="date.ymd.day" short-description="Day"/> <DATA-STRUCT name="date.hms.hour" short-description="Hour"/> <DATA-STRUCT name="date.hms.minute" short-description="Minute"/> <DATA-STRUCT name="date.hms.second" short-description="Second"/>
Now we shall define a "meeting" data element, which has a time and place for the meeting:
<DATA-DEF name="meeting.time" short-description="Meeting time" structref="#date"/> <DATA-DEF name="meeting.place" short-description="Meeting place/>
Since meeting.place does not reference a structure, it is of an unstructured type, and has no child elements. The meeting.time element uses the date structure. By declaring this, the following sub-elements are created:
meeting.time.ymd.year meeting.time.ymd.month meeting.time.ymd.day meeting.time.hms.hour meeting.time.hms.minute meeting.time.hms.second
A P3P policy can now declare that it collects the meeting data element, which implies that it collects all of the sub-elements of meeting , or it can use data elements lower down the hierarchy ” meeting.time , for example, or meeting.time.ymd.day .
5.3 The DATA-DEF and DATA-STRUCT Elements
<DATA-DEF> and <DATA-STRUCT>
Define a data element or a data structure, respectively. Data structures are reusable structured type definitions that can be used to build data elements. Data elements are declared within a <STATEMENT> in a P3P policy to describe data covered by that statement.
The following attributes are common to these two elements:
Data elements can be structured, much like in common programming languages: structures are hierarchical (tree-like) descriptions of data elements: this hierarchical description is performed in the name attribute using a full stop (".") character as separator.
P3P provides the P3P base data schema , which has built-in definitions of a number of widely used structures and data elements. All P3P implementations are required to understand the P3P base data schema, so the structures and elements it defines are always available to P3P implementers.
5.3.1 Categories in P3P Data Schemas
Categories can be assigned to data structures or data elements. The following rules define how those category definitions are meant to be used:
5.3.2 P3P Data Schema Example
Consider the case where the company HyperSpeedExample wishes to describe the features of a vehicle, using a structure called vehicle . This structure includes:
If HyperSpeedExample also wants to include in the definition of a vehicle the location of manufacture, it could add other fields to the structure with all the relevant data like country, street address, postal code, and so on. But, each part of a structure can use other structures as well: structures can be composed . In this case, the P3P base data schema already provides a structure postal , describing all the postal information of a location. So, the final definition of the structure vehicle is
The structure postal has fields postal.street , postal.city , and so on. Since we have applied the structure postal to vehicle.built.where , it means that we can access the street and city of a vehicle using the descriptions vehicle.built.where.street and vehicle.built.where.city respectively. So, by applying a structure (in this case, postal ) we can build very complex descriptions in a modular way.
HyperSpeedExample wants to declare that all of the vehicle information will be in the <preference/> category. The vehicle.model , vehicle.color , vehicle.price , and vehicle.built.year fields are all unstructured types, so assigning them to the <preference/> category accomplishes this for those fields. Since vehicle is a structure definition, assigning the <preference/> category to vehicle.built.where will override (replace) the categories defined on all of the sub-elements of vehicle.built.where , placing all of them in the <preference/> category, even though the postal structure was originally defined as being in other categories.
As said, structures do not contain data elements; they are just abstract data types. We can use them to rapidly build structured collections of data elements. Going on with the example, HyperSpeedExample needs this abstract description of the features of a vehicle because it wants to actually exchange data about cars and motorcycles. So, it could define two data elements called car and motorcycle , both with the above structure vehicle .
This description of the data elements and data structures is encoded in XML using a data schema. In the HyperSpeedExample case, it would be something like:
<DATASCHEMA xmlns="http://www.w3.org/2001/09/P3Pv1"> <DATA-STRUCT name="vehicle.model" short-description="Model"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.color" short-description="Color"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.year" short-description="Construction Year"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.where" structref="http://www.w3.org/TR/P3P/base#postal" short-description="Construction Place"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-DEF name="car" structref="#vehicle"/> <DATA-DEF name="motorcycle" structref="#vehicle"/> </DATASCHEMA>
Continuing with the example, in order to reference a car model and construction year, Hyperspeed or any other service could send the following references inside a P3P policy:
<DATA-GROUP> <!-- First, the "car.model" data element, whose definition is in the data schema at http://www.HyperSpeed.example.com/models-schema --> <DATA ref="http://www.HyperSpeed.example.com/models-schema#car.model"/> <!-- And second, the "car.built.year" data element, whose definition is the data schema at http://www.HyperSpeed.example.com/ models-schema --> <DATA ref="http://www.HyperSpeed.example.com/ models-schema#car.built.year"/> </DATA-GROUP>
Using the base attribute, the above references can be written in an even more compact way:
<DATA-GROUP base="http://www.HyperSpeed.example.com/models-schema"> <DATA ref="#car.model"/> <DATA ref="#car.built.year"/> </DATA-GROUP>
Alternatively, the data schema could be embedded directly into a policy file. In this case, the policy file could look like:
<POLICIES xmlns="http://www.w3.org/2001/09/P3Pv1"> <!-- Embedded data schema --> <DATASCHEMA> <DATA-STRUCT name="vehicle.model" short-description="Model"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.color" short-description="Color"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.year" short-description="Construction Year""> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.where" structref="http://www.w3.org/TR/P3P/base#postal" short-description="Construction Place"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-DEF name="car" structref="#vehicle"/> <DATA-DEF name="motorcycle" structref="#vehicle"/> </DATASCHEMA> <!-- end of embedded data schema --> <POLICY name="policy1" discuri="http://www.example.com/disc1"> ... <DATA-GROUP base=""> <DATA ref="#car.model"/> <DATA ref="#car.built.year"/> </DATA-GROUP> ... </POLICY> <POLICY name="policy2" discuri="http://www.example.com/disc2"> .... </ POLICY> <POLICY name="policy3" discuri="http://www.example.com/disc3"> .... </ POLICY> </POLICIES>
Note that in any case there MUST NOT be more than one data schema per file.
5.3.3 Use of Data Element Names
Note that the data element names specified in the base data schema or in extension data schemas may be used for purposes other than P3P policies. For example, Web sites may use these names to label HTML form fields. By referring to data the same way in P3P policies and forms, automated form-filling tools can be better integrated with P3P user agents.
5.4 Persistence of Data Schemas
An essential requirement on data schemas is the persistence of data schemas : data schemas that can be fetched at a certain URI can only be changed by extending the data schema in a backward-compatible way (that is to say, changing the data schema does not change the meaning of any policy using that schema). This way, the URI of a policy acts in a sense like a unique identifier for the data elements and structures contained therein: any data schema that is not backward-compatible must therefore use a new different URI .
Note that a useful application of the persistence of data schema is given for example in the case of multi-lingual sites: multiple language versions (translations) of the same data schema can be offered by the server, using the HTTP " Content-Language" response header field to properly indicate that a particular language has been used for the data schema.
5.5 Basic Data Structures
The Basic Data Structures are structures used by the P3P base data schema (and possibly, due to their basic nature, they should be reused as much as possible by other different data schemas). All P3P-compliant user agent implementations MUST be aware of the Basic Data Structures. Each table below specifies the elements of a basic data structure, the categories associated, their structures, and the display names shown to users. More than one category may be associated with a fixed data element. However, each base data element is assigned to only one category whenever possible. Data schema designers are recommended to do the same.
The date structure specifies a date. Since date information can be used in different ways, depending on the context, all date information is tagged as being of "variable" category (see Section 5.7.2). For example, schema definitions can explicitly set the corresponding category in the element referencing this data structure, where soliciting the birthday of a user might be "Demographic and Socioeconomic Data," while the expiration date of a credit card might belong to the "Purchase Information" category.
The "time zone" information is for example described in the time standard [ISO8601]. Note that "date.ymd" and "date.hms" can be used to fast reference the year/month/day and hour/minutes/seconds blocks respectively.
The personname structure specifies information about the naming of a person.
The login structure specifies information (IDs and passwords) for computer systems and Web sites which require authentication. Note that this data element should not be used for computer systems or Web sites which use digital certificates for authentication: in those cases, the certificate structure should be used.
The "id" field represents the ID portion of the login information for a computer system. Often, user IDs are made public, while passwords are kept secret. This does not include any type of biometric authentication mechanisms.
The "password" field represents the password portion of the login information for a computer system. This is a secret data value, usually a character string, that is used in authenticating a user. Passwords are typically kept secret, and are generally considered to be sensitive information
The certificate structure is used to specify identity certificates (like, for example, X.509).
The "format" field is used to represent the information of an IANA registered public key or authentication certificate format, while the "key' field is used to represent the corresponding certificate key.
The telephonenum structure specifies the characteristics of a telephone number.
5.5.6 Contact Information
The contact structure is used to specify contact information. Services can specify precisely which set of data they need, postal, telecommunication, or online address information.
The postal structure specifies a postal mailing address.
The "country" field represents the information of the name of the country (for example, one among the countries listed in [ISO3166]).
The telecom structure specifies telecommunication information about a person.
The online structure specifies online information about a person.
5.5.7 Access Logs and Internet Addresses
Two structures used for representing forms of Internet addresses are provided. The uri structure covers Universal Resource Identifiers (URI), which are defined in more detail in [URI]. The ipaddr structure represents IP addresses and Domain Name System (DNS) hostnames.
The authority of a URI is defined as the authority component in [URI]. The stem of a URI is defined as the information contained in the portion of the URI after the authority and up to (and including) the first "?" character in the URI, and the querystring is the information contained in the portion of the URI after the first "?" character. For URIs which do not contain a "?"character, the stem is the entire URI, and the querystring is empty.
Since URI information can be used in different ways, depending on the context, all the fields in the uri structure are tagged as being of "variable" category. Schema definitions MUST explicitly set the corresponding category in the element referencing this data structure.
The ipaddr structure represents the hostname and IP address of a system.
The hostname element is used to represent collection of either the simple hostname of a system, or the full hostname including domain name. The partialhostname element represents the information of a fully-qualified hostname which has had at least the host portion removed from the hostname. In other words, everything up to the first "." in the fully-qualified hostname MUST be removed for an address to qualify as a "partial hostname."
The fullip element represents the information of a full IP version 4 or IP version 6 address. The partialip element represents an IP version 4 address (only ”not a version 6 address) which has had at least the last 7 bits of information removed. This removal MUST be done by replacing those bits with a fixed pattern for all visitors (for example, all 0s or all 1s).
Certain Web sites are known to make use not of the visitor's entire IP address or hostname, but rather make use of a reduced form of that information. By collecting only a subset of the address information, the site visitor is given some measure of anonymity. It is certainly not the intent of this specification to claim that these "stripped" IP addresses or hostnames are impossible to associate with an individual user, but rather that it is significantly more difficult to do so. Sites which perform this data reduction MAY wish to declare this practice in order to more-accurately reflect their practices.
18.104.22.168 Access Log Information
The loginfo structure is used to represent information typically stored in Web-server access logs.
The resource in the HTTP request is captured by the uri field. The time at which the server processes the request is represented by the timestamp field. Server implementations are free to define this field as the time the request was received, the time that the server began sending the response, the time that sending the response was complete, or some other convenient representation of the time the request was processed . The IP address of the client system making the request is given by the clientip field.
The other data fields represent other information commonly stored in Web server access logs. other.httpmethod is the HTTP method (such as GET , POST , etc.) in the client's request. other.bytes indicates the number of bytes in the response-body sent by the server. other.statuscode is the HTTP status code on the request, such as 200, 302, or 404 (see section 6.1.1 of [HTTP1.1] for details).
22.214.171.124 Other HTTP Protocol Information
The httpinfo structure represents information carried by the HTTP protocol which is not covered by the loginfo structure.
The useragent field represents the information in the HTTP User-Agent header (which gives information about the type and version of the user's Web browser), and/or the HTTP accept* headers.
The referer field represents the information in the HTTP Referer header, which gives information about the previous page visited by the user. Note that this field is misspelled in exactly the same way as the corresponding HTTP header.
5.6 The Base Data Schema
All P3P-compliant user agent implementations MUST be aware of the data elements in the P3P base data schema. The P3P base data schema includes the definition of the basic data structures, and four data element sets: user , thirdparty , business and dynamic . The user , thirdparty and business sets include elements that users and/or businesses might provide values for, while the dynamic set includes elements that are dynamically generated in the course of a user's browsing session. User agents may support a variety of mechanisms that allow users to provide values for the elements in the user set and store them in a data repository, including mechanisms that support multiple personae. Users may choose not to provide values for these data elements.
The formal XML definition of the P3P base data schema is given in Appendix 3. In the following sections, the base data elements and sets are explained one by one. In the future there will be in all likelihood demand for the creation of other data sets and elements . Obvious applications include catalogue , payment, and agent/system attribute schemas (an extensive set of system elements is provided for example in http://www.w3.org/TR/NOTE-agent-attributes).
Each table below specifies a set, the elements within the set, the category associated with the element, its structure, and the display name shown to users. More than one category may be associated with a fixed data element. However, each base data element is assigned to only one category whenever possible. It is recommended that data schema designers do the same.
5.6.1 User Data
The user data set includes general information about the user.
Note that this data set includes elements that are actually sets of data themselves . These sets are defined in the Data Structures subsection of this document. The short display name for an individual element contained within a data set is defined as the concatenation of the short display names that have been defined for the set and the element, separated by a separator appropriate for the language/script in question, e.g., a comma for English. For example, the short display name for user.home-info.postal.postalcode could be "User's Home Contact Information, Postal Address Information, Postal code." User agent implementations may prefer to develop their own short display names rather than using the concatenated names when displaying information for the user.
5.6.2 Third Party Data
The thirdparty data set allows users and businesses to provide values for a related third party. This can be useful whenever third party information needs to be exchanged, for example when ordering a present online that should be sent to another person, or when providing information about one's spouse or business partner. Such information could be stored in a user repository alongside the user data set. User agents may offer to store multiple such thirdparty data sets and allow users to select the appropriate values from a list when necessary.
The thirdparty data set is identical with the user data set. See section 5.6.1 User Data for details.
5.6.3 Business Data
The business data set features a subset of user data relevant for organizations. In P3P1.0, this data set is primarily used for declaring the policy entity, though it should also be applicable to business-to-business interactions.
5.6.4 Dynamic Data
In some cases, there is a need to specify data elements that do not have fixed values that a user might type in or store in a repository. In the P3P base data schema, all such elements are grouped under the dynamic data set. Sites may refer to the types of data they collect using the dynamic data set only, rather than enumerating all of the specific data elements.
These elements are often implicit in navigation or Web interactions. They should be used with categories to describe the type of information collected through these methods . A brief description of each element follows.
The clickstream element is expected to apply to practically all Web sites. It represents the combination of information typically found in Web server access logs: the IP address or hostname of the user's computer, the URI of the resource requested, the time the request was made, the HTTP method used in the request, the size of the response, and the HTTP status code in the response. Web sites that collect standard server access logs as well as sites which do URI path analysis can use this data element to describe how that data will be used. Web sites that collect only some of the data elements listed for the clickstream element MAY choose to list those specific elements rather than the entire dynamic.clickstream element. This allows sites with more limited data-collection practices to accurately present those practices to their visitors.
The http element contains additional information contained in the HTTP protocol. See the definition of the httpinfo structure for descriptions of specific elements. Sites MAY use the dynamic.http field as a shorthand to cover all the elements in the httpinfo structure if they wish, or they MAY reference the specific elements in the httpinfo structure.
The cookies element should be used whenever HTTP cookies are set or retrieved by a site. Please note that cookies is a variable data element and requires the explicit declaration of usage categories in a policy.
The miscdata element references information collected by the service that the service does not reference using a specific data element. Categories have to be used to better describe these data: sites MUST reference a separate miscdata element in their policies for each category of miscellaneous data they collect.
The searchtext element references a specific type of solicitation used for searching and indexing sites. For example, if the only fields on a search engine page are search fields, the site only needs to disclose that data element.
The interactionrecord element should be used if the server is keeping track of the interaction it has with the user (i.e., information other than clickstream data, for example account transactions, etc.).
5.7 Categories and Data Elements/Structures
5.7.1 Fixed-Category Data Elements/Structures
Most of the elements in the base data schema are so-called " fixed " data elements: they belong to one or at most two category classes. By assigning a category invariably to elements or structures in the base data schema, services and users are able to refer to entire groups of elements simply by referencing the corresponding category. For example, using [APPEL], the privacy preferences exchange language, users can write rules that warn them when they visit a site that collects any data element in a certain category.
When creating data schemas for fixed data elements, schema creators have to explicitly enumerate the categories that these elements belong to. For example:
<DATA-STRUCT name="postal.street" structref="#text" short-description="Street Address"> <CATEGORIES><physical/></CATEGORIES> </DATA-STRUCT>
If an element or structure belongs to multiple categories, multiple elements referencing the appropriate categories can be used. For example, the following piece of XML can be used to declare that the data elements in user.name have both category "physical" and "demographic":
<DATA-STRUCT name="user.name" structref="#personname" short-description="User's Name"> <CATEGORIES><physical/><demographic/></CATEGORIES> </DATA-STRUCT>
Please note that the category classes of fixed data elements/structures can not be overridden, for example by writing rules or policies that assign a different category to a known fixed base data element. User Agents MUST ignore such categories and instead use the original category (or set of categories) listed in the schema definition. User Agents MAY preferably alert the user that a fixed data element is used together with a non-standard category class.
5.7.2 Variable-Category Data Elements/Structures
Not all data elements/structures in the base data schema belong to a pre-determined category class. Some can contain information from a range of categories, depending on a particular situation. Such elements/structures are called variable-category data elements/structures (or "variable data element/structure" for short). Although most variable data elements in the P3P base data schema are combined in the dynamic element set, they can appear in any data set, even mixed with fixed-category data elements.
When creating a schema definition for such elements and/or structures, schema authors MUST NOT list an explicit category attribute, otherwise the element/structure becomes fixed . For example when specifying the "Year" Data Structure , which can take various categories depending on the situation (e.g., when used for a credit card expiration date vs. for a birth date), the following schema definition can be used:
<DATA-STRUCT name="date.ymd.year" short-description="Year"/> <!-- Variable Data Structure-->
This allows new schema extensions that reference such variable-category Data Structures to assign a specific category to derived elements, depending on their usage in that extension. For example, an e-commerce schema extension could thus define a credit card expiration date as follows:
<DATA-STRUCT name="Card.ExpDate" structref="#date.ymd" short-description="Card Expiration Date"> <CATEGORIES><purchase/></CATEGORIES> </DATA-STRUCT>
Under these conditions, the variable Data Structure date is assigned a fixed category "Purchase Information" when being used for specifying a credit card expiration date.
Note that while user preferences can list such variable data elements without any additional category information (effectively expressing preferences over any usage of this element), services MUST always explicitly specify the categories that apply to the usage of a variable data element in their particular policy. This information has to appear as a category element in the corresponding DATA element listed in the policy, for example as in:
<POLICY ... > ... <DATA ref="#dynamic.cookies"><CATEGORIES><uniqueid/></CATEGORIES></DATA> ... </POLICY>
where a service declares that cookies are used to recognize the user at this site (i.e., category Unique Identifiers).
If a service wants to declare a data element that is in multiple categories, it simply declares the corresponding categories (as shown in the above section):
<POLICY ... > ... <DATA ref="#dynamic.cookies"><CATEGORIES><uniqueid/><preference/> </CATEGORIES></DATA> ... </POLICY>
Finally, note that categories can be inherited as well: Categories inherit downward when a field is structured, but only into fields which have no predefined category. Therefore, we suggest to schema authors that they do their best to insure that all applicable categories are applied to new data elements they create.
5.8 Using Data Elements
P3P offers Web sites a great deal of flexibility in how they describe the types of data they collect.
Any of these three methods may be combined within a single policy.
By using the dynamic.miscdata element, sites can specify the types of data they collect without having to enumerate every individual data element. This may be convenient for sites that collect a lot of data or sites belonging to large organizations that want to offer a single P3P policy covering the entire organization. However, the disadvantage of this approach is that user agents will have to assume that the site might collect any data element belonging to the categories referenced by the site. So, for example, if a site's policy states that it collects dynamic.miscdata of the physical contact information category, but the only physical contact information it collects is business address, user agents will nonetheless assume that the site might also collect telephone numbers. If the site wishes to be clear that it does not collect telephone numbers or any other physical contact information other than business address, than it should disclose that it collects user.business-info.contact.postal . Furthermore, as user agents are developed with automatic form-filling capabilities, it is likely that sites that enumerate the data they collect will be able to better integrate with these tools.
By defining new data schemas, sites can precisely specify the data they collect beyond the base data set. However, if user agents are unfamiliar with the elements defined in these schemas, they will be able to provide only minimal information to the user about these new elements. The information they provide will be based on the category and display names specified for each element.