Database Problems | Programming Interviews Exposed: Secrets to Landing Your Next Job, 2nd Edition (Programmer to Programmer)

If you indicate on your résumé that you have some database experience, an interviewer will probably ask you a few questions to determine the depth of your knowledge in this area.

Simple SQL

Important

Given a database with the table

 Olympics(     city CHAR(16),     year INT(4) )

write a SQL statement to insert Montreal and 1976 into the database.

This is an extremely easy problem that an interviewer might use to determine whether you have ever used SQL before or whether you were padding your résumé when you mentioned it. If you know SQL, you’re all set. It’s a straightforward SQL INSERT statement; no tricks at all. If you don’t really know SQL, you’re in trouble. The correct answer is

 INSERT INTO Olympics VALUES( 'Montreal', 1976 )

Company and Employee Database

Important

You are given a database with the following tables:

 Company (     companyName CHAR(30),     id          INT(4) PRIMARY KEY ) EmployeesHired (     id            INT(4) PRIMARY KEY,     numHired      INT(4),     fiscalQuarter INT(4),     FOREIGN KEY id REFERENCES Company )

You may make the assumption that the only possible fiscal quarters are 1 through 4. Sample data for this schema is presented in Table 10-3.

Table 10-3: Company and Employee Sample Data
Open table as spreadsheet
companyName	id
Hillary Plumbing	6
John Lawn Company	9
Dave Cookie Company	19
Jane Electricity	3

Open table as spreadsheet

Id	numHired	fiscalQuarter
3	3	3
3	2	4
19	4	1
6	2	1

Write a SQL statement that returns the names of all the companies that hired employees in fiscal quarter 4.

This problem involves retrieving data from two tables. You will have to join the two tables to get all of the needed information. id is the only key common to both tables so you want to join on the value id. Once you have joined the two tables, you can select the company name where the fiscal quarter is 4. This SQL statement looks like this:

 SELECT companyName FROM Company, EmployeesHired WHERE Company.id = EmpolyeesHired.id AND fiscalQuarter = 4

There is a small problem with this SQL statement. Consider what might happen if a company did not hire anyone in Q4. There could still be a tuple (a row of data) such as EmployeesHired(6, 0, 4). The company with id 6 would be returned by the preceding query even though it didn’t hire anyone during fiscal quarter 4. To fix this bug, you need to ensure that numHired is greater than 0. The revised SQL statement looks like this:

 SELECT companyName FROM Company, EmployeesHired WHERE Company.id = EmpolyeesHired.id AND fiscalQuarter = 4 AND numHired > 0

Important

Now, using the same schema, write a SQL statement that returns the names of all companies that did not hire anyone in fiscal quarters 1 through 4.

The best way to start this problem is by looking at the previous answer. You know how to get the names of all of the companies that hired an employee in quarter 4. If you remove the WHERE condition that fiscalQuarter = 4, you will have a query that returns the names of all companies that hired employees during all fiscal quarters. If you use this query as a subquery and select all of the companies that are not in the result, you will get all of the companies that did not hire anyone in fiscal quarters 1 through 4. As a slight optimization, you can select just the id from the EmployeesHired table and print out the companies that do not have an id returned. The query looks like this:

 SELECT companyName FROM Company WHERE id NOT IN (SELECT id from EmployeesHired WHERE numHired > 0)

Important

Finally, return the names of all companies and the total number of employees that each company hired during fiscal quarters 1 through 4.

You’re asked to retrieve the totals of some sets of values, which indicates that you will have to use the SUM aggregate. In this problem, you don’t want the sum of the entire column, you want only a sum of the values that have the same id. To accomplish this task, you will need to use the GROUP BY feature. This feature enables you to apply SUM over grouped values of data. Other than the GROUP BY feature, this query is very similar to the first query except you omit the fiscalQuarter = 4 in the WHERE clause. The query looks like this:

 SELECT companyName, SUM(numHired) FROM Company, EmployeesHired WHERE Company.id = EmployeesHired.id GROUP BY companyName

Max, No Aggregates

Important

Given the following SQL database schema

 Test (     num INT(4) )

write a SQL statement that returns the maximum value from num without using an aggregate (MAX, MIN, etc.).

In this problem, your hands are tied behind your back - you have to find a maximum without using the feature designed for finding the maximum. A good way to start is by drawing a table with some sample data as shown in Table 10-4.

Table 10-4: Sample Values for num
num
5
23
-6
7

In this sample data, you want to print out the value 23. 23 has the property that all other numbers are less than it. Though true, this way of looking at things doesn’t offer much help with constructing the SQL statement. A similar but more useful way to say the same thing is that 23 is the only number that does not have a number that is greater than it. If you could return every value that does not have a number greater than it, you would return only 23, and you would have solved the problem. Try designing a SQL statement to print out every number that does not have a number greater than it.

First, you will want to figure out which numbers do have numbers greater than themselves. This is a more manageable query. Begin by joining the table with itself to create all possible pairs for which each value in one column is greater than the corresponding value in the other column, as in the following query:

 SELECT Lesser.num, Greater.num FROM Test AS Greater, Test AS Lesser WHERE Lesser.num < Greater.num

Using the sample data, this yields the results shown in Table 10-5.

Table 10-5: Temporary Table Formed after Join
Open table as spreadsheet
Lesser	Greater
-6	23
5	23
7	23
-6	7
5	7
-6	5

As desired, every value is in the lesser column except the maximum value of 23. Thus, if you use the previous query as a subquery and select every value not in it, you will get the maximum value. This query would look like the following:

 SELECT num FROM Test WHERE num NOT IN (SELECT Lesser.num FROM Test AS Greater, Test AS Lesser WHERE Lesser.num < Greater.num)

There is one minor bug in this query. If the maximum value is repeated in the Test table, it will be returned twice. To prevent this, use the DISTINCT keyword. This changes the query to the following:

 SELECT DISTINCT num FROM Test WHERE num NOT IN (SELECT Lesser.num FROM Test AS Greater, Test AS Lesser WHERE Lesser.num < Greater.num)

Three-Valued Logic

Important

Given the following table

 Address (     street CHAR(30) NOT NULL,     apartment CHAR(10),     city CHAR(40) NOT NULL, )

write a SQL statement that returns nonapartment addresses only.

This problem seems very simple. The solution that comes to mind is most probably this query, which the interviewer might even supply unprompted:

 SELECT * FROM Address WHERE apartment = null

This won’t return any addresses, however, and the reason has to do with SQL’s use of ternary, or three-valued, logic. Ternary logic is not familiar to many programmers, but it’s an important concept to grasp in order to write effective queries.

Logical operations in SQL have three possible values, not two. Those three values are TRUE, FALSE and UNKNOWN. As you might expect, UNKNOWN means that a value is unknown or unrepresentable. The familiar AND, OR, and NOT operations, for example, return different values in the presence of an UNKNOWN value, as shown in Tables 10-6, 10-7, and 10-8.

Table 10-6: Ternary AND Operations
Open table as spreadsheet
AND	TRUE	FALSE	UNKNOWN
TRUE	TRUE	FALSE	UNKNOWN
FALSE	FALSE	FALSE	FALSE
UNKNOWN	UNKNOWN	FALSE	UNKNOWN

Table 10-7: Ternary OR Operations
Open table as spreadsheet
OR	TRUE	FALSE	UNKNOWN
TRUE	TRUE	TRUE	TRUE
FALSE	TRUE	FALSE	UNKNOWN
UNKNOWN	TRUE	UNKNOWN	UNKNOWN

Table 10-8: Ternary NOT Operations
Open table as spreadsheet
NOT
TRUE	FALSE
FALSE	TRUE
UNKNOWN	UNKNOWN

The trick is in the use of the equality operator (‘=’) to test for a NULL column value. In most databases, a comparison to NULL returns UNKNOWN - even when comparing NULL to NULL. The proper way to check for a NULL or non-NULL column is to use the IS NULL or IS NOT NULL syntax. Thus, the original query should be restated as follows:

 SELECT * FROM Address WHERE apartment IS NULL

Not accounting for UNKNOWN values in WHERE clause conditions is a common error.