Beware: collation database_default

I have seen in so many blog posts a proposed solution to the ‘cannot resolve the collation conflict’ problem; many of the blog posts and forums suggest that the collate database_default will solve the problem.

As I got a similar problem a few days ago, I stopped for a second and I though ‘hm, can this be the optimal solution and what problems can it cause?’

 

If you need to read about the collation concept, its levels, settings and restrictions, read it here.

 

Here is the simple test I performed today:

 

I created a database with a default collation Latin1_General_CI_AI.

-- first create a folder and specify it in the script
USE [master]
GO
 
CREATE DATABASE [CollationTest] ON  PRIMARY
( NAME = N'CollationTest', FILENAME = N'D:\Databases\CollationTest.mdf' , SIZE = 3072KB , FILEGROWTH = 1024KB )
 LOG ON
( NAME = N'CollationTest_log', FILENAME = N'D:\Databases\CollationTest_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)
 COLLATE Latin1_General_CI_AI
GO
 
USE [CollationTest]
GO
IF NOT EXISTS (SELECT name FROM sys.filegroups WHERE is_default=1 AND name = N'PRIMARY') ALTER DATABASE [CollationTest] MODIFY FILEGROUP [PRIMARY] DEFAULT
GO

As a next step I create the following two tables, with 2 columns each: one nvarchar and one int. One of the nvarchar columns picks up the collation setting from the database collation, and the other one has a finnishswedish cs as collation explicitly specified by me.

-- create the tables
/****** Object:  Table [dbo].[Table_Cyr_CI_AI]    ******/
USE [CollationTest]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Table_Cyr_CI_AI]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Table_Cyr_CI_AI](
        [Name] [nvarchar](50) NULL,
        [ID] [int] NULL
) ON [PRIMARY]
END
GO
 
/****** Object:  Table [dbo].[Table_FinnishSwedish_CS_AS]   ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Table_FinnishSwedish_CS_AS]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Table_FinnishSwedish_CS_AS](
        [Name] [nvarchar](50) COLLATE Finnish_Swedish_CS_AS NULL,
        [ID] [int] NULL
) ON [PRIMARY]
END
GO

Let’s insert some date in the tables:

-- Insert some test data
USE [CollationTest]
GO
INSERT INTO [CollationTest].[dbo].[Table_Cyr_CI_AI] ([Name],[ID])
SELECT N'Göran Lindén' ,1
UNION ALL
SELECT N'Göran Linden' ,2
GO
 
USE [CollationTest]
GO
INSERT INTO [CollationTest].[dbo].[Table_FinnishSwedish_CS_AS] ([Name],[ID])
SELECT N'Göran Lindén' ,3
UNION ALL
SELECT N'Göran Linden' ,4
GO
 
-- query the data
select * from [CollationTest].[dbo].[Table_Cyr_CI_AI]
select * from [CollationTest].[dbo].[Table_FinnishSwedish_CS_AS]

After this, I wrote some queries to join the two tables on the nvarchar columns.

-- test the queries with different collations
 
-- test 1: join cyr and fin, collate database_default on the join predicates
select * from [Table_Cyr_CI_AI] c
join [CollationTest].[dbo].[Table_FinnishSwedish_CS_AS] f
on c.Name collate database_default = f.Name collate database_default
where c.Name = N'Göran Lindén'
GO
 
-- test 2: join cyr and fin, collate Finnish_Swedish_CS_AS on the join predicates
select * from [Table_Cyr_CI_AI] c
join [CollationTest].[dbo].[Table_FinnishSwedish_CS_AS] f
on c.Name collate Finnish_Swedish_CS_AS = f.Name collate Finnish_Swedish_CS_AS
where c.Name = N'Göran Lindén'
GO
-- test 3: join cyr and fin, collate database_default, and specify the collation database_default for the search predicate
select * from [Table_Cyr_CI_AI] c
join [CollationTest].[dbo].[Table_FinnishSwedish_CS_AS] f
on c.Name collate database_default = f.Name collate database_default
where c.Name collate database_default = N'Göran Lindén'
GO
-- test 4: join cyr and fin, collate Finnish_Swedish_CS_AS, and specify the collation Finnish_Swedish_CS_AS for the search predicate
select * from [Table_Cyr_CI_AI] c
join [CollationTest].[dbo].[Table_FinnishSwedish_CS_AS] f
on c.Name collate Finnish_Swedish_CS_AS = f.Name collate Finnish_Swedish_CS_AS
where c.Name collate Finnish_Swedish_CS_AS = N'Göran Lindén'
GO

First things first – if I do not specify the collation setting for the columns of the join predicates, then the query fails with the following error:

Msg 468, Level 16, State 9, Line 3
Cannot resolve the collation conflict between "Finnish_Swedish_CS_AS" and "Latin1_General_CI_AI" in the equal to operation.

 

Let’s look at the results now. To keep this post compact, I will just post a screenshot of the results of the 4 queries above:

The results of the queries are different

Hm, so the COLLATION does really affect the results...

 

So, what is the bottom line? The reality is that we get different resultsets, depending on the collation settings expressed in our query. We are looking for Göran Lindén – the guy that has that funny [´] sígn in his name, and only for him. (We do not want any other boring records without [´] sígns in our resultset! :) )

 

It turns out that the collate database_default is not an ultimate way to solve the problem of the collation resolution when joining tables. It really depends on what records we are looking for, and what character encoding we want to acknowledge or ignore.

 

After all, think about it this way: if there is a column with a different collation in your database, the chances are that maybe there is a good reason for it. :)

Work on itWell, now...OKGoodGood job! (8 votes, average: 4.75 out of 5)

If you gave a rating less than the maximum, you must leave a comment with suggestions on how to improve the post. :)

Loading ... Loading ...

 

Comments are closed.