5 DATA SCHEMAS

A data schema is a description of a set of data. P3P includes a way to describe data schemas so that services can communicate to user agents about the data they collect. A data schema is built from a number of data elements , which are specific items of data a service might collect.

Data elements in a data schema can have the following properties:

  • Data element name . The name of the data element is used when a P3P policy includes this data element in a <DATA> element. This is required on all data elements.

  • Descriptive name or short name. A data element's short name provides a short, human-understandable name for the data element. The short name is not required, but it is strongly recommended.

  • Long description. The long description of a data element provides a more detailed, human-understandable definition of the data element. Like the short name, the long description is not required, but it is strongly recommended.

  • Category or categories. Most data elements have categories assigned to them when they are defined in a data schema. See Categories for more information on categories.

Data elements are organized into a hierarchy. A data element automatically includes all of the data elements below it in the hierarchy. For example, the data element representing "the user's name" includes the data elements representing "the user's given name," "the user's family name," and so on. The hierarchy is based on the data element name. Thus the data elements user.name.given, user.name.family , and user.name.nickname are all children of the data element user.name , which is in turn a child of the data element user .

P3P has defined a data schema called the P3P base data schema that includes a large number of data elements commonly used by services.

Services may declare new data elements by creating and publishing their own data schemas. This is done with the <DATASCHEMA> element. These can either be published in standalone XML files, which are then referenced by policies that use them, or they can be embedded in the policies files that reference them. The <DATASCHEMA> element is defined as follows :

[62]

dataschema

=

 "<DATASCHEMA" [` xmlns="http://www.w3.org/2001/ 09/P3Pv1"`] ">" *(datadefdatastructextension) "</DATASCHEMA>" 

A standalone data schema has the <DATASCHEMA> element as the first XML element in the file. It must have the appropriate namespace defined in the xmlns attribute to identify it as a P3P data schema, as follows:

 <DATASCHEMA xmlns="http://www.w3.org/2001/09/P3Pv1"> <DATA-STRUCT ... /> ... <DATA-DEF ... /> </DATASCHEMA> 

When a data schema is declared inside a policy file, then the <DATASCHEMA> element is still used (as described in Section 3.2.1, "The <POLICIES> Element"), but no namespace attribute is given.

5.1 Natural Language Support for Data Schemas

Data schemas contain a number of fields in natural language. Services publishing a data schema MAY wish to translate these fields into multiple languages. The data element short and long names MAY be translated, but the data element name MUST NOT be translated ”this field needs to stay constant across translations of a data schema.

If a service is going to provide a data schema in multiple natural languages, then it SHOULD examine the Accept-Language HTTP request-header on requests for that data schema to pick the best available alternative.

5.2 Data Structures

Data schemas often need to reuse a common group of data elements. P3P data schemas support this through data structures. A data structure is a named, abstract definition of a group of data elements. When a data element is defined, it can be defined as being of an unstructured type, in which case it has no child elements. The data element can also be defined as being of a specific structured type, in which case the data element will be automatically expanded to include as sub-elements all of the elements defined in the data structure. For example, the following structure is used to represent a date and time:

 <!-- "date" Data Structure --> <DATA-STRUCT name="date.ymd.year"     short-description="Year"/> <DATA-STRUCT name="date.ymd.month"     short-description="Month"/> <DATA-STRUCT name="date.ymd.day"     short-description="Day"/> <DATA-STRUCT name="date.hms.hour"     short-description="Hour"/> <DATA-STRUCT name="date.hms.minute"     short-description="Minute"/> <DATA-STRUCT name="date.hms.second"     short-description="Second"/> 

Now we shall define a "meeting" data element, which has a time and place for the meeting:

 <DATA-DEF name="meeting.time"     short-description="Meeting time"     structref="#date"/> <DATA-DEF name="meeting.place"     short-description="Meeting place/> 

Since meeting.place does not reference a structure, it is of an unstructured type, and has no child elements. The meeting.time element uses the date structure. By declaring this, the following sub-elements are created:

 meeting.time.ymd.year meeting.time.ymd.month meeting.time.ymd.day meeting.time.hms.hour meeting.time.hms.minute meeting.time.hms.second 

A P3P policy can now declare that it collects the meeting data element, which implies that it collects all of the sub-elements of meeting , or it can use data elements lower down the hierarchy ” meeting.time , for example, or meeting.time.ymd.day .

5.3 The DATA-DEF and DATA-STRUCT Elements

<DATA-DEF> and <DATA-STRUCT>

Define a data element or a data structure, respectively. Data structures are reusable structured type definitions that can be used to build data elements. Data elements are declared within a <STATEMENT> in a P3P policy to describe data covered by that statement.

The following attributes are common to these two elements:

  • name ( mandatory attribute )

    Indicates the name of the data element or data structure. Remember that names of data element and data structures are case-sensitive , so, for example, user.gender is different from USER.GENDER or User.Gender . Furthermore, in names of data elements and structures no number character can appear immediately following a dot.

  • structref

    URI reference ([URI]), where the fragment identifier part denotes the structure , and the URI part denotes the corresponding data schema where it is defined. The default base URI is a same-document reference ([URI]). Data elements or data structures without a structref attribute (and, so, without an associated structure) are called unstructured .

  • short-description

    A string denoting the short display name of the data element or structure, no more than 255 characters .

    The DATA-DEF and DATA-STRUCT elements can also contain a long description of the data element or structure, using the LONG-DESCRIPTION element.

[63]

datadef

=

 "<DATA-DEF name=" quotedstring  [` structref="` URI-reference `"`]  [" short-description=" quotedstring]  ">"  [categories] ; the categories of the                 data element.  [longdescription] ; the long description of                      the data element "</DATA-DEF>" 

[64]

datastruct

=

 "<DATA-STRUCT name=" quotedstring  [` structref="` URI-reference `"`]  [" short-description=" quotedstring]  ">"  [categories] ; the categories of the                 Data Structure.  [longdescription] ; the long description of the                      Data Structure "</DATA-STRUCT>" 

Here, URI-reference is defined as in [URI].

Data elements can be structured, much like in common programming languages: structures are hierarchical (tree-like) descriptions of data elements: this hierarchical description is performed in the name attribute using a full stop (".") character as separator.

P3P provides the P3P base data schema , which has built-in definitions of a number of widely used structures and data elements. All P3P implementations are required to understand the P3P base data schema, so the structures and elements it defines are always available to P3P implementers.

5.3.1 Categories in P3P Data Schemas

Categories can be assigned to data structures or data elements. The following rules define how those category definitions are meant to be used:

  1. <DATA-STRUCT> elements MAY include category definitions with them. If a structure definition includes categories, then all uses of those structures in data definitions and data structures pick up those categories. If a structure contains no categories, then the categories for that structure MAY be defined when it is used in another structure or data element. Otherwise , a data element using this structure is a variable-category element. Any uses of a variable-category data element in a policy require that its categories be listed in the policy.

  2. A <DATA-DEF> with an unstructured type is a variable-category data element if no categories are defined in the <DATA-DEF> , and has exactly those categories listed in the <DATA-DEF> if any categories are included.

  3. A <DATA-DEF> or <DATA-STRUCT> with a structured type which has no categories defined on that structure produces a variable-category data element/structure if no categories are defined in the <DATA-DEF> or <DATA-STRUCT >. If the <DATA-DEF> or <DATA-STRUCT> does have categories listed, then those categories are applied to that data element, and all of its sub-elements. In other words, categories are pushed down into sub-elements when defining a data element to be of a structured type, and the structured type does not define any categories.

  4. A <DATA-DEF> using a structured type which has categories defined on that structure picks up all the categories listed on the structure. In addition, categories may be listed in the <DATA-DEF> , and these are added to the categories defined in the structure. These categories are defined only at the level of that data element, and are not "pushed down" to any sub-elements.

  5. A <DATA-STRUCT> that has no categories assigned to it, and which is using a structured subtype which has categories defined on the subtype picks up all the categories listed on the subtype.

  6. A <DATA-STRUCT> that has categories assigned to it, and which is using a structured subtype replaces all of the categories listed on the subtype.

  7. There is a "bubble-up" rule for categories when referencing data elements: data elements, must at a minimum, include all categories defined by any of its children. This rule applies recursively, so for example, all categories defined by data elements foo.a.w , foo.a.y , and foo.b.z MUST be considered to apply to data element foo .

  8. A <DATA-STRUCT> cannot be defined with some variable-category elements and some fixed-category elements. Either all of the sub-elements of a category must be in the variable category, or else all of them must have one or more assigned categories.

5.3.2 P3P Data Schema Example

Consider the case where the company HyperSpeedExample wishes to describe the features of a vehicle, using a structure called vehicle . This structure includes:

  • The vehicle's model type ( vehicle.model ),

  • The vehicle's color ( vehicle.color ),

  • The vehicle's year of manufacture ( vehicle.built.year ), and

  • The vehicle's price ( vehicle.price ).

If HyperSpeedExample also wants to include in the definition of a vehicle the location of manufacture, it could add other fields to the structure with all the relevant data like country, street address, postal code, and so on. But, each part of a structure can use other structures as well: structures can be composed . In this case, the P3P base data schema already provides a structure postal , describing all the postal information of a location. So, the final definition of the structure vehicle is

  • vehicle.model (unstructured)

  • vehicle.color (unstructured)

  • vehicle.price (unstructured)

  • vehicle.built.year (unstructured)

  • vehicle.built.where (with structure postal from the base data schema)

The structure postal has fields postal.street , postal.city , and so on. Since we have applied the structure postal to vehicle.built.where , it means that we can access the street and city of a vehicle using the descriptions vehicle.built.where.street and vehicle.built.where.city respectively. So, by applying a structure (in this case, postal ) we can build very complex descriptions in a modular way.

HyperSpeedExample wants to declare that all of the vehicle information will be in the <preference/> category. The vehicle.model , vehicle.color , vehicle.price , and vehicle.built.year fields are all unstructured types, so assigning them to the <preference/> category accomplishes this for those fields. Since vehicle is a structure definition, assigning the <preference/> category to vehicle.built.where will override (replace) the categories defined on all of the sub-elements of vehicle.built.where , placing all of them in the <preference/> category, even though the postal structure was originally defined as being in other categories.

As said, structures do not contain data elements; they are just abstract data types. We can use them to rapidly build structured collections of data elements. Going on with the example, HyperSpeedExample needs this abstract description of the features of a vehicle because it wants to actually exchange data about cars and motorcycles. So, it could define two data elements called car and motorcycle , both with the above structure vehicle .

This description of the data elements and data structures is encoded in XML using a data schema. In the HyperSpeedExample case, it would be something like:

 <DATASCHEMA xmlns="http://www.w3.org/2001/09/P3Pv1"> <DATA-STRUCT name="vehicle.model"     short-description="Model">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.color"     short-description="Color">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.year"     short-description="Construction Year">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.where"     structref="http://www.w3.org/TR/P3P/base#postal"     short-description="Construction Place">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-DEF name="car" structref="#vehicle"/> <DATA-DEF name="motorcycle" structref="#vehicle"/> </DATASCHEMA> 

Continuing with the example, in order to reference a car model and construction year, Hyperspeed or any other service could send the following references inside a P3P policy:

 <DATA-GROUP>   <!-- First, the "car.model" data element, whose definition is in    the data schema at http://www.HyperSpeed.example.com/models-schema    --> <DATA ref="http://www.HyperSpeed.example.com/models-schema#car.model"/>   <!-- And second, the "car.built.year" data element, whose definition    is the data schema at http://www.HyperSpeed.example.com/    models-schema    --> <DATA ref="http://www.HyperSpeed.example.com/    models-schema#car.built.year"/> </DATA-GROUP> 

Using the base attribute, the above references can be written in an even more compact way:

 <DATA-GROUP base="http://www.HyperSpeed.example.com/models-schema">     <DATA ref="#car.model"/>     <DATA ref="#car.built.year"/> </DATA-GROUP> 

Alternatively, the data schema could be embedded directly into a policy file. In this case, the policy file could look like:

 <POLICIES xmlns="http://www.w3.org/2001/09/P3Pv1"> <!-- Embedded data schema --> <DATASCHEMA> <DATA-STRUCT name="vehicle.model"     short-description="Model">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.color"     short-description="Color">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.year"     short-description="Construction Year"">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.where"     structref="http://www.w3.org/TR/P3P/base#postal"     short-description="Construction Place">     <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-DEF name="car" structref="#vehicle"/> <DATA-DEF name="motorcycle" structref="#vehicle"/> </DATASCHEMA> <!-- end of embedded data schema --> <POLICY name="policy1" discuri="http://www.example.com/disc1"> ... <DATA-GROUP base=""> <DATA ref="#car.model"/> <DATA ref="#car.built.year"/> </DATA-GROUP> ... </POLICY> <POLICY name="policy2" discuri="http://www.example.com/disc2"> .... </ POLICY> <POLICY name="policy3" discuri="http://www.example.com/disc3"> .... </ POLICY> </POLICIES> 

Note that in any case there MUST NOT be more than one data schema per file.

5.3.3 Use of Data Element Names

Note that the data element names specified in the base data schema or in extension data schemas may be used for purposes other than P3P policies. For example, Web sites may use these names to label HTML form fields. By referring to data the same way in P3P policies and forms, automated form-filling tools can be better integrated with P3P user agents.

5.4 Persistence of Data Schemas

An essential requirement on data schemas is the persistence of data schemas : data schemas that can be fetched at a certain URI can only be changed by extending the data schema in a backward-compatible way (that is to say, changing the data schema does not change the meaning of any policy using that schema). This way, the URI of a policy acts in a sense like a unique identifier for the data elements and structures contained therein: any data schema that is not backward-compatible must therefore use a new different URI .

Note that a useful application of the persistence of data schema is given for example in the case of multi-lingual sites: multiple language versions (translations) of the same data schema can be offered by the server, using the HTTP " Content-Language" response header field to properly indicate that a particular language has been used for the data schema.

5.5 Basic Data Structures

The Basic Data Structures are structures used by the P3P base data schema (and possibly, due to their basic nature, they should be reused as much as possible by other different data schemas). All P3P-compliant user agent implementations MUST be aware of the Basic Data Structures. Each table below specifies the elements of a basic data structure, the categories associated, their structures, and the display names shown to users. More than one category may be associated with a fixed data element. However, each base data element is assigned to only one category whenever possible. Data schema designers are recommended to do the same.

5.5.1 Dates

The date structure specifies a date. Since date information can be used in different ways, depending on the context, all date information is tagged as being of "variable" category (see Section 5.7.2). For example, schema definitions can explicitly set the corresponding category in the element referencing this data structure, where soliciting the birthday of a user might be "Demographic and Socioeconomic Data," while the expiration date of a credit card might belong to the "Purchase Information" category.

date

Category

Structure

Short Display Name

ymd.year

(variable-category)

unstructured

Year

ymd.month

(variable-category)

unstructured

Month

ymd.day

(variable-category)

unstructured

Day

hms.hour

(variable-category)

unstructured

Hour

hms.minute

(variable-category)

unstructured

Minute

hms.second

(variable-category)

unstructured

Second

fractionsecond

(variable-category)

unstructured

Fraction of Second

timezone

(variable-category)

unstructured

Time Zone

The "time zone" information is for example described in the time standard [ISO8601]. Note that "date.ymd" and "date.hms" can be used to fast reference the year/month/day and hour/minutes/seconds blocks respectively.

5.5.2 Names

The personname structure specifies information about the naming of a person.

personname

Category

Structure

Short Display Name

prefix

Demographic and Socioeconomic Data

unstructured

Name Prefix

given

Physical Contact Information

unstructured

Given Name (First Name)

family

Physical Contact Information

unstructured

Family Name (Last Name)

middle

Physical Contact Information

unstructured

Middle Name

suffix

Demographic and Socioeconomic Data

unstructured

Name Suffix

nickname

Demographic and Socioeconomic Data

unstructured

Nickname

5.5.3 Logins

The login structure specifies information (IDs and passwords) for computer systems and Web sites which require authentication. Note that this data element should not be used for computer systems or Web sites which use digital certificates for authentication: in those cases, the certificate structure should be used.

login

Category

Structure

Short Display Name

id

Unique Identifiers

unstructured

Login ID

password

Unique Identifiers

unstructured

Login Password

The "id" field represents the ID portion of the login information for a computer system. Often, user IDs are made public, while passwords are kept secret. This does not include any type of biometric authentication mechanisms.

The "password" field represents the password portion of the login information for a computer system. This is a secret data value, usually a character string, that is used in authenticating a user. Passwords are typically kept secret, and are generally considered to be sensitive information

5.5.4 Certificates

The certificate structure is used to specify identity certificates (like, for example, X.509).

certificate

Category

Structure

Short Display Name

key

Unique Identifiers

unstructured

Certificate Key

format

Unique Identifiers

unstructured

Certificate Format

The "format" field is used to represent the information of an IANA registered public key or authentication certificate format, while the "key' field is used to represent the corresponding certificate key.

5.5.5 Telephones

The telephonenum structure specifies the characteristics of a telephone number.

telephonenum

Category

Structure

Short Display Name

intcode

Physical Contact Information

unstructured

International Telephone Code

loccode

Physical Contact Information

unstructured

Local Telephone Area Code

number

Physical Contact Information

unstructured

Telephone Number

ext

Physical Contact Information

unstructured

Telephone Extension

comment

Physical Contact Information

unstructured

Telephone Optional Comments

5.5.6 Contact Information

The contact structure is used to specify contact information. Services can specify precisely which set of data they need, postal, telecommunication, or online address information.

contact

Category

Structure

Short Display Name

postal

Physical Contact Information, Demographic and Socioeconomic Data

postal

Postal Address Information

telecom

Physical Contact Information

telecom

Telecommunications Information

online

Online Contact Information

online

Online Address Information

5.5.6.1 Postal

The postal structure specifies a postal mailing address.

postal

Category

Structure

Short Display Name

name

Physical Contact Information, Demographic and Socioeconomic Data

personname

Name

street

Physical Contact Information

unstructured

Street Address

city

Demographic and Socioeconomic Data

unstructured

City

stateprov

Demographic and Socioeconomic Data

unstructured

State or Province

postalcode

Demographic and Socioeconomic Data

unstructured

Postal Code

country

Demographic and Socioeconomic Data

unstructured

Country Name

organization

Demographic and Socioeconomic Data

unstructured

Organization Name

The "country" field represents the information of the name of the country (for example, one among the countries listed in [ISO3166]).

5.5.6.2 Telecommunication

The telecom structure specifies telecommunication information about a person.

telecom

Category

Structure

Short Display Name

telephone

Physical Contact Information

telephonenum

Telephone Number

fax

Physical Contact Information

telephonenum

Fax Number

mobile

Physical Contact Information

telephonenum

Mobile Telephone Number

pager

Physical Contact Information

telephonenum

Pager Number

5.5.6.3 Online

The online structure specifies online information about a person.

online

Category

Structure

Short Display Name

email

Online Contact Information

unstructured

Email Address

uri

Online Contact Information

unstructured

Home Page Address

5.5.7 Access Logs and Internet Addresses

Two structures used for representing forms of Internet addresses are provided. The uri structure covers Universal Resource Identifiers (URI), which are defined in more detail in [URI]. The ipaddr structure represents IP addresses and Domain Name System (DNS) hostnames.

5.5.7.1 URI

uri

Category

Structure

Short Display Name

authority

(variable-category)

unstructured

URI Authority

stem

(variable-category)

unstructured

URI Stem

querystring

(variable-category)

unstructured

Query-string Portion of URI

The authority of a URI is defined as the authority component in [URI]. The stem of a URI is defined as the information contained in the portion of the URI after the authority and up to (and including) the first "?" character in the URI, and the querystring is the information contained in the portion of the URI after the first "?" character. For URIs which do not contain a "?"character, the stem is the entire URI, and the querystring is empty.

Since URI information can be used in different ways, depending on the context, all the fields in the uri structure are tagged as being of "variable" category. Schema definitions MUST explicitly set the corresponding category in the element referencing this data structure.

5.5.7.2 ipaddr

The ipaddr structure represents the hostname and IP address of a system.

ipaddr

Category

Structure

Short Display Name

hostname

Computer Information

unstructured

Complete Host and Domain Name

partialhostname

Demographic

unstructured

Partial Hostname

fullip

Computer Information

unstructured

Full IP Address

partialip

Demographic

unstructured

Partial IP Address

The hostname element is used to represent collection of either the simple hostname of a system, or the full hostname including domain name. The partialhostname element represents the information of a fully-qualified hostname which has had at least the host portion removed from the hostname. In other words, everything up to the first "." in the fully-qualified hostname MUST be removed for an address to qualify as a "partial hostname."

The fullip element represents the information of a full IP version 4 or IP version 6 address. The partialip element represents an IP version 4 address (only ”not a version 6 address) which has had at least the last 7 bits of information removed. This removal MUST be done by replacing those bits with a fixed pattern for all visitors (for example, all 0s or all 1s).

Certain Web sites are known to make use not of the visitor's entire IP address or hostname, but rather make use of a reduced form of that information. By collecting only a subset of the address information, the site visitor is given some measure of anonymity. It is certainly not the intent of this specification to claim that these "stripped" IP addresses or hostnames are impossible to associate with an individual user, but rather that it is significantly more difficult to do so. Sites which perform this data reduction MAY wish to declare this practice in order to more-accurately reflect their practices.

5.5.7.3 Access Log Information

The loginfo structure is used to represent information typically stored in Web-server access logs.

loginfo

Category

Structure

Short Display Name

uri

Navigation and click-stream data

uri

URI of Requested Resource

timestamp

Navigation and click-stream data

date

Request Timestamp

clientip

Computer Information, Demographic and Socioeconomic Data

ipaddr

Client's IP Address or Hostname

other.httpmethod

Navigation and click-stream data

unstructured

HTTP Request Method

other.bytes

Navigation and click-stream data

unstructured

Data Bytes in Response

other.statuscode

Navigation and click-stream data

unstructured

Response Status Code

The resource in the HTTP request is captured by the uri field. The time at which the server processes the request is represented by the timestamp field. Server implementations are free to define this field as the time the request was received, the time that the server began sending the response, the time that sending the response was complete, or some other convenient representation of the time the request was processed . The IP address of the client system making the request is given by the clientip field.

The other data fields represent other information commonly stored in Web server access logs. other.httpmethod is the HTTP method (such as GET , POST , etc.) in the client's request. other.bytes indicates the number of bytes in the response-body sent by the server. other.statuscode is the HTTP status code on the request, such as 200, 302, or 404 (see section 6.1.1 of [HTTP1.1] for details).

5.5.7.4 Other HTTP Protocol Information

The httpinfo structure represents information carried by the HTTP protocol which is not covered by the loginfo structure.

httpinfo

Category

Structure

Short Display Name

referer

Navigation and click-stream data

uri

Last URI Requested by the User

useragent

Computer Information

unstructured

User Agent Information

The useragent field represents the information in the HTTP User-Agent header (which gives information about the type and version of the user's Web browser), and/or the HTTP accept* headers.

The referer field represents the information in the HTTP Referer header, which gives information about the previous page visited by the user. Note that this field is misspelled in exactly the same way as the corresponding HTTP header.

5.6 The Base Data Schema

All P3P-compliant user agent implementations MUST be aware of the data elements in the P3P base data schema. The P3P base data schema includes the definition of the basic data structures, and four data element sets: user , thirdparty , business and dynamic . The user , thirdparty and business sets include elements that users and/or businesses might provide values for, while the dynamic set includes elements that are dynamically generated in the course of a user's browsing session. User agents may support a variety of mechanisms that allow users to provide values for the elements in the user set and store them in a data repository, including mechanisms that support multiple personae. Users may choose not to provide values for these data elements.

The formal XML definition of the P3P base data schema is given in Appendix 3. In the following sections, the base data elements and sets are explained one by one. In the future there will be in all likelihood demand for the creation of other data sets and elements . Obvious applications include catalogue , payment, and agent/system attribute schemas (an extensive set of system elements is provided for example in http://www.w3.org/TR/NOTE-agent-attributes).

Each table below specifies a set, the elements within the set, the category associated with the element, its structure, and the display name shown to users. More than one category may be associated with a fixed data element. However, each base data element is assigned to only one category whenever possible. It is recommended that data schema designers do the same.

5.6.1 User Data

The user data set includes general information about the user.

user

Category

Structure

Short Display Name

name

Physical Contact Information, Demographic and Socioeconomic Data

personname

User's Name

bdate

Demographic and Socioeconomic Data

date

User's Birth Date

login

Unique Identifiers

login

User's Login Information

cert

Unique Identifiers

certificate

User's Identity Certificate

gender

Demographic and Socioeconomic Data

unstructured

User's Gender (Male or Female)

employer

Demographic and Socioeconomic Data

unstructured

User's Employer

department

Demographic and Socioeconomic Data

unstructured

Department or Division of Organization Where User is Employed

jobtitle

Demographic and Socioeconomic Data

unstructured

User's Job Title

home- info

Physical Contact Information, Online Contact Information, Demographic and Socioeconomic Data

contact

User's Home Contact Information

business-info

Physical Contact Information, Online Contact Information, Demographic and Socioeconomic Data

contact

User's Business Contact Information

Note that this data set includes elements that are actually sets of data themselves . These sets are defined in the Data Structures subsection of this document. The short display name for an individual element contained within a data set is defined as the concatenation of the short display names that have been defined for the set and the element, separated by a separator appropriate for the language/script in question, e.g., a comma for English. For example, the short display name for user.home-info.postal.postalcode could be "User's Home Contact Information, Postal Address Information, Postal code." User agent implementations may prefer to develop their own short display names rather than using the concatenated names when displaying information for the user.

5.6.2 Third Party Data

The thirdparty data set allows users and businesses to provide values for a related third party. This can be useful whenever third party information needs to be exchanged, for example when ordering a present online that should be sent to another person, or when providing information about one's spouse or business partner. Such information could be stored in a user repository alongside the user data set. User agents may offer to store multiple such thirdparty data sets and allow users to select the appropriate values from a list when necessary.

The thirdparty data set is identical with the user data set. See section 5.6.1 User Data for details.

5.6.3 Business Data

The business data set features a subset of user data relevant for organizations. In P3P1.0, this data set is primarily used for declaring the policy entity, though it should also be applicable to business-to-business interactions.

business

Category

Structure

Short Display Name

name

Demographic and Socioeconomic Data

unstructured

Organization Name

department

Demographic and Socioeconomic Data

unstructured

Department or Division of Organization

cert

Unique Identifiers

certificate

Organization Identity Certificate

contact-info

Physical Contact Information, Online Contact Information, Demographic and Socioeconomic Data

contact

Contact Information for the Organization

5.6.4 Dynamic Data

In some cases, there is a need to specify data elements that do not have fixed values that a user might type in or store in a repository. In the P3P base data schema, all such elements are grouped under the dynamic data set. Sites may refer to the types of data they collect using the dynamic data set only, rather than enumerating all of the specific data elements.

dynamic

Category

Structure

Short Display Name

clickstream

Navigation and Click-stream Data, Computer Information

loginfo

Click-stream Information

http

Navigation and Click-stream Data, Computer Information

httpinfo

HTTP Protocol Information

clientevents

Navigation and Click-stream Data

unstructured

User's Interaction with a Resource

cookies

(variable-category)

unstructured

Use of HTTP Cookies

miscdata

(variable-category)

unstructured

Miscellaneous Non-base Data Schema Information

searchtext

Interactive Data

unstructured

Search Terms

interactionrecord

Interactive Data

unstructured

Server Stores the Transaction History

These elements are often implicit in navigation or Web interactions. They should be used with categories to describe the type of information collected through these methods . A brief description of each element follows.

clickstream

The clickstream element is expected to apply to practically all Web sites. It represents the combination of information typically found in Web server access logs: the IP address or hostname of the user's computer, the URI of the resource requested, the time the request was made, the HTTP method used in the request, the size of the response, and the HTTP status code in the response. Web sites that collect standard server access logs as well as sites which do URI path analysis can use this data element to describe how that data will be used. Web sites that collect only some of the data elements listed for the clickstream element MAY choose to list those specific elements rather than the entire dynamic.clickstream element. This allows sites with more limited data-collection practices to accurately present those practices to their visitors.

http

The http element contains additional information contained in the HTTP protocol. See the definition of the httpinfo structure for descriptions of specific elements. Sites MAY use the dynamic.http field as a shorthand to cover all the elements in the httpinfo structure if they wish, or they MAY reference the specific elements in the httpinfo structure.

clientevents

The clientevents element represents data about how the user interacts with their Web browser while interacting with a resource. For example, an application may wish to collect information about whether the user moved their mouse over a certain image on a page, or whether the user ever brought up the help window in a Java applet. This kind of information is represented by the dynamic.clientevents data element. Much of this interaction record is represented by the events and data defined by the Document Object Model (DOM) Level 2 Events [DOM2-Events]. The clientevents data element also covers any other data regarding the user's interaction with their browser while the browser is displaying a resource. The exception is events which are covered by other elements in the base data schema. For example, requesting a page by clicking on a link is part of the user's interaction with their browser while viewing a page, but merely collecting the URL the user has clicked on does not require declaring this data element; clickstream covers that event. However, the DOM event DOMFocusIn (representing the user moving their mouse over an object on a page) is not covered by any other existing element, so if a site is collecting the occurrence of this event, then it needs to state that it collects the dynamic.clientevents element. Items covered by this data element are typically collected by client-side scripting languages, such as JavaScript, or by client-side applets, such as ActiveX or Java applets. Note that while the previous discussion has been in terms of a user viewing a resource, this data element also applies to Web applications which do not display resources visually ”for example, audio-based Web browsers.

cookies

The cookies element should be used whenever HTTP cookies are set or retrieved by a site. Please note that cookies is a variable data element and requires the explicit declaration of usage categories in a policy.

miscdata

The miscdata element references information collected by the service that the service does not reference using a specific data element. Categories have to be used to better describe these data: sites MUST reference a separate miscdata element in their policies for each category of miscellaneous data they collect.

searchtext

The searchtext element references a specific type of solicitation used for searching and indexing sites. For example, if the only fields on a search engine page are search fields, the site only needs to disclose that data element.

interactionrecord

The interactionrecord element should be used if the server is keeping track of the interaction it has with the user (i.e., information other than clickstream data, for example account transactions, etc.).

5.7 Categories and Data Elements/Structures

5.7.1 Fixed-Category Data Elements/Structures

Most of the elements in the base data schema are so-called " fixed " data elements: they belong to one or at most two category classes. By assigning a category invariably to elements or structures in the base data schema, services and users are able to refer to entire groups of elements simply by referencing the corresponding category. For example, using [APPEL], the privacy preferences exchange language, users can write rules that warn them when they visit a site that collects any data element in a certain category.

When creating data schemas for fixed data elements, schema creators have to explicitly enumerate the categories that these elements belong to. For example:

 <DATA-STRUCT name="postal.street"     structref="#text"              short-description="Street Address"> <CATEGORIES><physical/></CATEGORIES> </DATA-STRUCT> 

If an element or structure belongs to multiple categories, multiple elements referencing the appropriate categories can be used. For example, the following piece of XML can be used to declare that the data elements in user.name have both category "physical" and "demographic":

 <DATA-STRUCT name="user.name"     structref="#personname"              short-description="User's Name"> <CATEGORIES><physical/><demographic/></CATEGORIES> </DATA-STRUCT> 

Please note that the category classes of fixed data elements/structures can not be overridden, for example by writing rules or policies that assign a different category to a known fixed base data element. User Agents MUST ignore such categories and instead use the original category (or set of categories) listed in the schema definition. User Agents MAY preferably alert the user that a fixed data element is used together with a non-standard category class.

5.7.2 Variable-Category Data Elements/Structures

Not all data elements/structures in the base data schema belong to a pre-determined category class. Some can contain information from a range of categories, depending on a particular situation. Such elements/structures are called variable-category data elements/structures (or "variable data element/structure" for short). Although most variable data elements in the P3P base data schema are combined in the dynamic element set, they can appear in any data set, even mixed with fixed-category data elements.

When creating a schema definition for such elements and/or structures, schema authors MUST NOT list an explicit category attribute, otherwise the element/structure becomes fixed . For example when specifying the "Year" Data Structure , which can take various categories depending on the situation (e.g., when used for a credit card expiration date vs. for a birth date), the following schema definition can be used:

 <DATA-STRUCT name="date.ymd.year"              short-description="Year"/> <!-- Variable Data Structure--> 

This allows new schema extensions that reference such variable-category Data Structures to assign a specific category to derived elements, depending on their usage in that extension. For example, an e-commerce schema extension could thus define a credit card expiration date as follows:

 <DATA-STRUCT name="Card.ExpDate"         structref="#date.ymd"              short-description="Card Expiration Date"> <CATEGORIES><purchase/></CATEGORIES> </DATA-STRUCT> 

Under these conditions, the variable Data Structure date is assigned a fixed category "Purchase Information" when being used for specifying a credit card expiration date.

Note that while user preferences can list such variable data elements without any additional category information (effectively expressing preferences over any usage of this element), services MUST always explicitly specify the categories that apply to the usage of a variable data element in their particular policy. This information has to appear as a category element in the corresponding DATA element listed in the policy, for example as in:

 <POLICY ... >    ...    <DATA ref="#dynamic.cookies"><CATEGORIES><uniqueid/></CATEGORIES></DATA>    ... </POLICY> 

where a service declares that cookies are used to recognize the user at this site (i.e., category Unique Identifiers).

If a service wants to declare a data element that is in multiple categories, it simply declares the corresponding categories (as shown in the above section):

 <POLICY ... >   ...    <DATA ref="#dynamic.cookies"><CATEGORIES><uniqueid/><preference/>    </CATEGORIES></DATA>    ... </POLICY> 

With the above declaration a service announces that it uses cookies both to recognize the user at this site and for storing user preference data. Note that for the purpose of P3P there is no difference whether this information is stored in two separate cookies or in a single one.

Finally, note that categories can be inherited as well: Categories inherit downward when a field is structured, but only into fields which have no predefined category. Therefore, we suggest to schema authors that they do their best to insure that all applicable categories are applied to new data elements they create.

5.8 Using Data Elements

P3P offers Web sites a great deal of flexibility in how they describe the types of data they collect.

  • Sites may describe data generally using the dynamic.miscdata element and the appropriate categories.

  • Sites may describe data specifically using the data elements defined in the base data schema.

  • Sites may describe data specifically using data elements defined in new data schemas.

Any of these three methods may be combined within a single policy.

By using the dynamic.miscdata element, sites can specify the types of data they collect without having to enumerate every individual data element. This may be convenient for sites that collect a lot of data or sites belonging to large organizations that want to offer a single P3P policy covering the entire organization. However, the disadvantage of this approach is that user agents will have to assume that the site might collect any data element belonging to the categories referenced by the site. So, for example, if a site's policy states that it collects dynamic.miscdata of the physical contact information category, but the only physical contact information it collects is business address, user agents will nonetheless assume that the site might also collect telephone numbers. If the site wishes to be clear that it does not collect telephone numbers or any other physical contact information other than business address, than it should disclose that it collects user.business-info.contact.postal . Furthermore, as user agents are developed with automatic form-filling capabilities, it is likely that sites that enumerate the data they collect will be able to better integrate with these tools.

By defining new data schemas, sites can precisely specify the data they collect beyond the base data set. However, if user agents are unfamiliar with the elements defined in these schemas, they will be able to provide only minimal information to the user about these new elements. The information they provide will be based on the category and display names specified for each element.

Regardless of whether a site wishes to make general or specific data disclosures, there are additional advantages to disclosing specific elements from the dynamic data set. For example, by disclosing dynamic.cookies a site can indicate that it uses cookies and explain the purpose of this use. User agent implementations that offer users cookie control interfaces based on this information are encouraged. Likewise, user agents that by default do not send the HTTP_REFERER header, might look for the dynamic.http.referer element in P3P policies and send the header if it will be used for a purpose the user finds acceptable.



Mobile Location Servies(c) The Definitive Guide
Software Project Management in Practice
ISBN: 0201737213
EAN: 2147483647
Year: 2005
Pages: 150
Authors: Pankaj Jalote

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net