

Consider a population I of individuals and a database {ri: i ∊ I} of individual records which contain identifying (or key) attributes (e.g., name, SSN, taxcode), and descriptive attributes, which may be both qualitative (e.g., sex) and quantitative (e.g., salary). A category predicate is an arbitrary condition on descriptive attributes built up with the logical connectives (conjunction, disjunction and negation).
A count query Q is a question such as:
What is the number of individuals whose descriptive attributes satisfy condition P?
where P is a category predicate. If by I[Q] we denote the subset of I selected by P, the value of Q is the number of individuals in I[Q]; that is, I[Q].
A sum query Q on a quantitative descriptive attribute a is a question such as:
What is the sum of a over the individuals whose descriptive attributes satisfy condition P?
where P is a category predicate involving descriptive attributes other than a. The attribute a is called the summary attribute of the query Q. If by I[Q] we denote the subset of I selected by P, the value of Q, denoted by q, is the sum of the values a_{i} of a for all individuals i in I[Q]; that is,
Note that, if the category predicate of an additive query Q is logically inconsistent, then I[Q] = Ø and the value of Q is taken to be zero. Such as query will be referred to as a null query.
Example 1: Consider the following database
name
sex
age
salary
John
M
25
2.0
Andrea
M
30
3.0
Mike
M
30
2.5
Mary
F
35
3.7
Helen
F
40
3.0
Anna
F
55
3.8
The following are four examples of sum queries on salary:
Q1: What is the sum of the salaries of individuals with (sex = M)?
Answer: 7.5
Q2: What is the sum of the salaries of individuals with (sex = F)?
Answer: 10.5
Q3: What is the sum of the salaries of individuals with (age < 50)?
Answer: 14.2
Q4: What is the sum of the salaries of individuals with (age > 50)?
Answer: 3.8
Suppose that our database contains a descriptive attribute a which is confidential. We saw in the previous section some sensitivity criteria; we now apply some of them to state precisely what queries are considered sensitive. A count query Q whose category predicate contains a is considered sensitive according to the n threshold rule if I[Q] ≤ n. Consider now a sum query Q with summary attribute a which is of numeric type. Let us distinguish two cases depending on whether the domain of a is the set of reals or integers, or is the set of nonnegative reals or integers. In the former case, we apply the n threshold criterion so that Q is sensitive if I[Q] ≤ n. In the latter case, we can apply either the n threshold criterion or the (n, k%) dominance rule. The security problem for additive queries can be stated as follows: Given a sensitivity criterion, what measures suffice to avoid disclosing values of sensitive queries? The following two examples on sum queries show that memoryless control methods (which only refuse to answer sensitive queries) are not secure.
Example 1 (continued): Assume that salary is a confidential attribute and that the sensitivity criterion in use is the n threshold criterion with n = 2. Thus, a sum query Q on salary is considered sensitive if I[Q] ≤ 2. Consider now the abovementioned queries Q_{1}, Q_{2}, Q_{3} and Q_{4}; then Q_{1}, Q_{2} and Q_{3} are not sensitive, and Q_{4} is sensitive. Suppose that Q_{1}, Q_{2} and Q_{3} were answered and Q_{4} was left unanswered; nevertheless, the value of Q_{4} can be simply computed from the values of Q_{1}, Q_{2} and Q_{3} by subtracting the answer (14.2) to Q_{3} from the sum (18.0) of the answers to Q_{1} and Q_{2}.
Example 2: Consider the following database
name
department
salary
e1
Direction
10.05
e2
Direction
3.0
e3
Direction
1.95
e4
Administration
4.0
e5
Administration
2.55
e6
Administration
2.45
e7
Services
3.0
e8
Services
2.0
e9
Services
1.0
e10
Marketing
1.5
Again assume that salary is confidential and that the (2, 80%) dominance criterion is in use. Consider now the following four sum queries on salary:
Q1: What is the sum of the salaries of employees with (department = Direction or department = Administration)?
Answer: 24.0
Q2: What is the sum of the salaries of employees with (department = Administration or department = Services)?
Answer: 15.0
Q3: What is the sum of the salaries of employees with (department = Direction or department = Services)?
Answer: 21.0
Q4: What is the sum of the salaries of employees with (department = Direction)?
Answer: 15.0.
The queries Q_{1}, Q_{2} and Q_{3} are not sensitive, and Q_{4} is sensitive. Suppose that Q_{1}, Q_{2} and Q_{3} have been answered; then, even if Q_{4} was left unanswered, its value is uniquely determined from the values of Q_{1}, Q_{2} and Q_{3} since it can be computed as
From the foregoing it follows that an effective control method should also take into account previously answered queries before deciding whether a new query can or cannot be answered. Such a control method is usually referred to as "auditing" (Chin & Ozsoyoglu, 1982; Malvestuto & Moscarini, 1999). In view of getting a secure control method, we need to identify those queries which, given the values of a set of answered queries, are implicitly answered in an exact or approximate way. To achieve this, in the next section we present a formal model of inference.

