Sunday, 7 December 2014

SQL Theory: UNPIVOT operator to help normalize output

Problem
I often see people struggle with poorly normalized schemas, where they have to transpose a query against a wide table into skinnier, more normalized output.
For example, imagine a table with data like this:
SELECT * FROM dbo.CustomerPhones
For a certain report, the desired output is actually as follows (apologies for the messy arrows):
Desired output
Ideally, you would just fix the schema, but I do acknowledge that this is not a realistic possibility in a lot of scenarios.
Solution
So, we need to code our way around this to present the data as it should be stored, in spite of how it is being stored. Here is some sample data:
CREATE TABLE dbo.CustomerPhones
(
  CustomerID INT PRIMARY KEY, -- FK
  Phone1 VARCHAR(32),
  Phone2 VARCHAR(32),
  Phone3 VARCHAR(32)
);
INSERT dbo.CustomerPhones
  (CustomerID, Phone1, Phone2, Phone3)
VALUES
  (1,'705-491-1111', '705-491-1110', NULL),
  (2,'613-492-2222', NULL, NULL),
  (3,'416-493-3333', '416-493-3330', '416-493-3339');
One way that might seem like a good idea is to perform a UNION:
SELECT CustomerID, Phone = Phone1 
  FROM dbo.CustomerPhones WHERE Phone1 IS NOT NULL
UNION ALL
SELECT CustomerID, Phone = Phone2 
  FROM dbo.CustomerPhones WHERE Phone2 IS NOT NULL
UNION ALL
SELECT CustomerID, Phone = Phone3 
  FROM dbo.CustomerPhones WHERE Phone3 IS NOT NULL;
The problem with this solution is that it does not scale - for every phone column you have, you add an additional scan, and you also have to add an entirely new UNION ALL query for every phone column.
Instead, we can use SQL Server's UNPIVOT operator. Which, contrary to popular belief, is not quite the opposite of PIVOT. An example using the above sample data:
SELECT CustomerID, Phone
FROM
(
  SELECT CustomerID, Phone1, Phone2, Phone3 
  FROM dbo.CustomerPhones
) AS cp
UNPIVOT 
(
  Phone FOR Phones IN (Phone1, Phone2, Phone3)
) AS up;
Results:
unpivot output
Now, I'll admit, the syntax is not as intuitive as you might expect. Most notably, that "Phones" alias is completely made up, and you could put anything there (except "Phone"). Basically what it is saying is, "extract a new row for every value you find in the columns Phone1, Phone2, and Phone3."

Another case

Another situation you might find is when you have multiple sets of related columns. So, for example, let's say you have a column for the type of each phone number:
CREATE TABLE dbo.CustomerPhones2
(
  CustomerID INT PRIMARY KEY, -- FK
  Phone1 VARCHAR(32),
  PhoneType1 CHAR(4),
  Phone2 VARCHAR(32),
  PhoneType2 CHAR(4),
  Phone3 VARCHAR(32),
  PhoneType3 CHAR(4)
);
INSERT dbo.CustomerPhones2 VALUES
  (1,'705-491-1111', 'cell', '705-491-1110', 'home', NULL,NULL),
  (2,'613-492-2222', 'home', NULL, NULL, NULL, NULL),
  (3,'416-493-3333', 'work', '416-493-3330', 'cell',
     '416-493-3339', 'home');
Now, we could also solve this with a UNION query, but let's take a look at another capability we have: using multiple UNPIVOTs. The only complication here is matching the output phone to the corresponding phone type - for this we need to do some string interrogation to ensure that Phone1 matches to PhoneType1, Phone2 matches to PhoneType2, etc.
SELECT CustomerID, Phone, PhoneType
FROM 
(
  SELECT CustomerID, Phone, PhoneType,
    idp = SUBSTRING(Phones, LEN(Phones) - PATINDEX('%[^0-9]%', REVERSE(Phones)) + 2, 32),
    idpt = SUBSTRING(PhoneTypes, LEN(PhoneTypes) - PATINDEX('%[^0-9]%', REVERSE(PhoneTypes)) + 2, 32)
  FROM
  (
    SELECT CustomerID, Phone1, Phone2, Phone3,
           PhoneType1, PhoneType2, PhoneType3
    FROM dbo.CustomerPhones2
  ) AS cp
  UNPIVOT 
  (
    Phone FOR Phones IN (Phone1, Phone2, Phone3)
  ) AS p
  UNPIVOT
  (
    PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3)
  ) AS pt
) AS x
WHERE idp = idpt;
Results:
In my next tip, I'll show how to derive these UNPIVOT queries dynamically, without having advanced knowledge of how many Phone/PhoneType columns there will be. Since a weakness of this particular design pattern is that when a 4th phone is added, you add new columns and change all the queries, and when a 5th phone is added, you add new columns and change all the queries, a more dynamic query can be useful to minimize or eliminate queries of this type.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home