Collation Sequences

Background

A collation sequence is a database setting that controls how string values are ordered.

In Synergetic V67 and all prior versions the collation sequence was required to be the SQL collation 'SQL_Latin1_General_CP1_CI_AS'. The SQL server must have been installed with that specific collation sequence and that determined the collation sequence of the master and temp databases of that SQL Server instance.

This caused the following limitations at a client organisation:

The Synergetic databases had to be installed on a SQL instance configured with the specific SQL collation sequence required by Synergetic.
If the host SQL Server instance was installed with a different collation sequence the instance could not be used and had to be reinstalled.
The client organisation could not change to an accent insensitive collation sequence to change search functionality.

Alongside these limitations, it seems that SQL_ collation sequences are no longer being updated and will eventually be phased out in favour of Windows collation sequences. Some of the newer Unicode characters are not in the SQL collation but do exist in Windows ones.

Changes in V68

Changes have been made to support different collation sequences to remove these limitations and future proof Synergetic.

In V68, the collation sequence of the Synergetic database can be changed to any Case Insensitive (CI) collation sequence and is also permitted to be different to the collation sequence of the SQL Server instance.

By default, the Synergetic database in V68 will be changed to the Windows collation sequence Latin1_General_CI_AS

For most client organisations this change will be invisible. However, if the organisation has created their own user triggers or user stored procedures that use the tempdb and the Synergetic database uses a different collation sequence to tempdb, they may encounter an error such as the below:

Collation error

Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS" in the equal to operation.

If this type of error is encountered the collation sequence will need to be explicitly specified as outlined in the next section.

SQL code across databases with different collation sequences

The main issue with collation sequences is when you are comparing table string fields across databases.

The below example assumes the tempdb database uses a different collation sequence to the current database.

Collation example

CREATE TABLE #test
(
  ID        INTEGER,
  Surname   VARCHAR(50),
  Given1    VARCHAR(30)
)
SELECT * FROM #test WHERE Surname = 'rose' -- this will work as we are not comparing with data from a table in the current database
 
-- however if we try
SELECT * FROM #test WHERE Surname IN (SELECT Surname FROM community)
-- this causes a collation conflict as we are comparing string values across databases with different collation sequences

The resolution is to specify the collation on each column that will be compared with a field in the source database.

Correct collation order handling

CREATE TABLE #test
(
  ID        INTEGER,
  Surname   VARCHAR(50) COLLATE DATABASE_DEFAULT,
  Given1    VARCHAR(30)
)

SELECT * FROM #test WHERE Surname IN (SELECT Surname FROM community) -- now it works

Note that you only need to specify collation sequence on fields you are doing cross database comparisons on. The Given1 field in the example above does not need it as we are only displaying it.

It is also good practice to specify the collation with table variables too as they may utilise the temp database with larger volumes of data.

Note also if the temp table is created by a SELECT INTO there are no collation issues as it copies the table structure including collation sequences. The issue occurs only when we CREATE and then INSERT.

Collation example using system tables

-- In this example we are using system tables therefore we cannot change the collation sequence of the table itself
-- We need to override the collation sequence on the actual field comparison
SELECT
    sd.DatabaseName
FROM SynDatabases sd
WHERE NOT EXISTS
    (
        SELECT
            *
        FROM master.sys.databases md
        WHERE md.name = sd.DatabaseName COLLATE DATABASE_DEFAULT
    )

For most client organisations this change will be invisible. However, if the organisation has created their own user triggers or user stored procedures that use the tempdb and the Synergetic database uses a different collation sequence to tempdb, they may encounter an error such as the below:

Collation error

Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS" in the equal to operation.

If this type of error is encountered the collation sequence will need to be explicitly specified as outlined in the next section.

Contained Databases

Further issues are apparent when using contained databases with OPENXML and getting errors along the line of Cannot resolve the collation conflict between "Latin1_General_100_CI_AS_KS_WS_SC" and "Latin1_General_CI_AS" in the equal to operation.

OPENXML Sample

-- THE FOLLOWING CODE GETS COLLATION ERRORS IN A CONTAINED DATABASE

    SELECT
        *
      INTO #xmlparse     
    FROM OPENXML(@hDoc, '/InputXML/', 1) x


-- THIS IS THE WAY YOU NEED TO WRITE IT INSTEAD (note I have only included the fields we actually use)
    SELECT
        x.id,
        x.parentid,
        x.nodetype,
        localname = x.localname COLLATE DATABASE_DEFAULT,
        text      = x.text      COLLATE DATABASE_DEFAULT
      INTO #xmlparse      
    FROM OPENXML(@hDoc, '/InputXML/', 1) x