Friday, March 8, 2019

My Top Ten+ Favorite Replication Tricks

This post has been a long time in coming. It started as a SQL Saturday presentation I gave three times in 2016 and 2017, slides were posted here for each SQL Saturday presentation, but I have not gotten around to writing an actual blog post...until now. Hopefully I will get a blog post for my 2018 SQL Saturday presentation on encryption written quicker than I got this one completed.

Disclaimer!! These are NOT best practices. Rather these are a few things that I have stumbled across in my job as a full-time production DBA that have helped me resolve real world situations and I am sharing this information in case someone runs into a similar situation. As always thoroughly test any solution in a non-production environment. I am not responsible for any mishaps that you may encounter by using these tricks it is up to you to make sure you are doing what is best for your situation and data.

In my environment I only have transactional and snapshot replication. I don't have any merge replication so I don't have any tricks specifically for that. Also, this is not a replication basics or how to create replication post. This post is intended for the DBA that is familiar with replication and may need a few tricks to overcome some unique situations.

Trick #1 - Replicating from multiple publications to multiple subscribers

Replication is not just a one publisher to one subscriber architecture. A publication database can have multiple publications. Each publication can have multiple subscribers. A subscriber database can receive multiple publications. A database can even be a subscriber and a publisher although I would not publish tables that are populated through replication from another SQL Server publisher. The distribution of the publications can run on different schedules for different subscribers.

In my environment there is a central business intelligence server where data from production OLTP systems is replicated. This server is used by various SSRS, SSAS (both tabular and multidimensional) and PowerBI solutions. Some solutions require near real-time data and others need more static day-old data. The ERP system has an Oracle back end. To replicate this data a tool called Attunity Replicate is used. This data is continuously replicated to the database that hosts the near real-time data. To get the data that is needed in the database that hosts the day-old data, SQL Server transactional and snapshot replication (more on when to use one versus the other in a bit) is used. This is the rare circumstance where I must publish tables that are populated through replication...but they are not populated by SQL Server replication and that is the difference. The one thing that is a bit of a problem with this is that when I need to reload data from Oracle into the near real-time database is that I must drop one or more tables out of SQL Server replication. Sometimes this requires dropping an entire publication. I have a few tricks to make this easier.

All other systems have SQL Server back ends and the data is replicated to the central business intelligence server via SQL Server replication. The distribution to the near real-time database runs continuously but distributes to the day-old database on a schedule. The interesting part of this setup is that the OLTP systems are generally running on a newer version of SQL Server and replicating to a older version. This can present some challenges, but I have tricks for that as well.

Trick #2 - Know when to use Snapshot Replication vs. Transactional Replication

Earlier in this post I mentioned that I use a combination of Snapshot and Transactional replication to replicate data from the near real-time database to the day-old data database. Several years ago, when replication from Oracle to the near real-time database to the day-old database was set up it was just one large Attunity task replicating data to the near real-time data then one transactional publication that was distributed to the day-old database. This was rather problematic in that any issue for a table would cause the Attunity replication to fatal error requiring a complete reload of all the tables in the near real-time database and then resynchronizing the data to the day-old database. To help reduce the impact of any fatal errors* I split the Attunity replication from one giant task into ten smaller, but still sizable, tasks with tables grouped by functional use. I also broke the SQL Server replication into multiple publications to align with the Attunity tasks. This helped however there were some publications that pushed data from the near real-time database to the day-old database that had issues. A little research into the data in the problematic tables showed that although the tables were quite small they had a TON of transactions throughout the day. For the day-old data all that mattered was the data at a given point in time not all the changes throughout the day. Moving smaller sized tables to Snapshot replication alleviated the issues. As with anything SQL Server related there isn't a "one size fits all" solution. Below are some of the considerations I use when deciding to use Snapshot or Transactional replication.

Snapshot

  • Need point in time data
  • Relatively small table
  • High volume of transactions
  • You need to perform bulk operations on the table at the publisher (does not mark the table as replicated)
  • Primary key not required

Transactional

  • Need near real-time data
  • Large table size
  • Low volume of transactions
  • You do not need to perform bulk operations on the table at the publisher (marks the table as replicated)
  • Primary key required

*Upgrading the Attunity application also helped reduce then occurrences of fatal errors.

Trick #3 - Putting data in different schemas and collations at the subscriber than at the publisher

These fall into the category of "not necessarily best practice" however they can be useful so test thoroughly before implementing.

Many of the source systems that replicate to our business intelligence server have all their tables in the dbo schema. This would make things very messy, so I have them broke out into separate schemas on the business intelligence server by source system name. Since everything that reads from this server is developed in-house this is not an issue and clarifies things for the developers.

Additionally, not all source systems use our standard collation. Sometimes it is very close like using Latin1_General_CI_AS rather than SQL_Latin1_General_CP1_CI_AS. Other times it may be case sensitive (CS) or in once case it needed the supplementary characters (SC) option. One of the reasons for consolidating all the data was to alleviate collation conflicts so the decision was made early on to have all data brought in have the collation of the business intelligence server rather than the source server. For the most part this has worked well but there are cases where the case sensitive data replicated from Oracle into the case insensitive business intelligence server has had some issues with "duplicates" that are not duplicates when the data is case sensitive. Again, this is not necessarily a best practice.

To put the data in a different schema or collation on the subscriber than on the publisher you should get familiar with your publication properties.

Destination object owner is the schema name on the subscriber. Default is to use the same owner as is on the publisher.

Copy collation is boolean. True means the tables will retain the same collation they had on the publisher, false means they will use the subscriber collation. Default is true.

Trick #4 - Truncate your subscriber tables rather than drop and recreate them

As I have mentioned the replicated data is mainly used for reporting or other business intelligence activities. The query patterns for this can be quite different than on the OLTP servers so many times indexes are added to the subscriber tables to improve performance. When I first started working with replication I left many properties defaulted...at first, I left them all defaulted and was just happy it worked. When I did have to reload data, I would get questions like "where did our indexes go?" I soon discovered another publication property "Action if name is in use". The default is to drop the existing object and create a new one. If you have added indexes on the table at the subscriber this can be a problem. I now change this setting to truncate data in the existing object for all my publications.

Here is a little bonus that wasn't in the SQL Saturday presentations.

Semi-annually we reload our non-production systems with production data. We do this by restoring a backup of the production source system database on the corresponding non-production source systems. After we do that we have to resynchronize all the replication. Once and a while I will get a "cannot bulk copy into column [column_name]" message. This is usually due to a DDL change getting made in production--remember these are third party vendor systems that are administered by someone other than the DBAs, so they have a bad habit of skipping non-production systems sometimes. This isn't typically an issue when replication is running since DDL changes replicate from the publisher to the subscriber database just fine. When this occurs, I have to go find the table in production then alter the non-production table as needed. Once I do this the synchronization agent will automatically start up again in a few moments. The problem is often identifying the table since the message only gives the column name. Below is a script I run to help find the table or at least narrow down the possible tables. The script below should be run against the publication database.

SELECT c.name, t.name
FROM sys.columns c JOIN sys.tables t ON c.object_id = t.object_id
WHERE c.name = 'column_name'

Trick #5 - Distributing a snapshot as soon as it is available

This is specifically for snapshot replication. When configuring the publication, I will set the schedule that I want for the snapshot.

The problem I used to have was how do I get the snapshot distributed right away. I could have the distributor run continuously however I don't like having an agent running when it isn't needed plus if I get asked "has that finished distributing yet?" I would have no way of being quickly able to tell. I could define a schedule but that involves guess work and hoping the snapshot agent doesn't run long. My solution was to have the distributor run on demand only.

So my previous statement about having it run on demand only seems counter intuitive at first glance. Snapshot replication uses two SQL Agent jobs; one for the snapshot and one for distribution. What I do is to get into the SQL Agent and modify the Agent Job for the Snapshot.

The SQL Agent job created by SQL Server when setting up the publication has three steps. I add an additional step.

The new step will call sp_start_job in msdb and I give it the distribution SQL Agent job name for the @job_name parameter. I grab the job name from the General page of the SQL Agent job properties for the distribution job.

Here is the code.

EXEC [msdb].[dbo].[sp_start_job] @job_name = '[distribution job name]';

Next I move the new step to be the third step and modify the On Success and On Failure actions so that the Run Agent step will Go to the next step on success rather than quit reporting success. I have the new step quit reporting success on success and have both go to Detect nonlogged agent shutdown on failure.

At one SQL Saturday an attendee was shocked and questioned if modifying something that SQL Server created was a good idea. This is simply adding a step to a SQL Agent job not altering a system stored proc so I don't have any issue doing this and it works just fine for all the snapshot replication I have running.

Trick #6 - Changing the transactional replication distribution schedule from continuous to scheduled and vice versa

This trick also involves getting into the SQL Agent jobs. Transactional replication uses three jobs; one for the snapshot, one for the distribution and one for the log reader. I will modify the schedule the Agent Job for the Distribution.

For either going from continuous to scheduled or vice versa you make the change by changing the schedule type. If going from continuous to scheduled, you will need to first stop the distribution agent job the define a schedule. If going from scheduled to running continuously you will change the schedule type to Start automatically when SQL Server Agent starts the manually start the SQL Agent job for distribution since you probably won't want to cycle the SQL Agent service.

Trick #7 - Finding your articles

If you have lots of articles (tables) in a bunch of different publications it can be a challenge to keep track of what article is in which publication. The script below will show you what is where. It is also useful for checking to make sure you don't have the same article in multiple publications.

USE distribution
GO
SELECT p.publication, a.article
FROM dbo.MSpublications p JOIN dbo.MSarticles a ON p.publication_id = a.publication_id
ORDER BY a.article;

Trick #8 - Painlessly add articles to your publication

First off, this trick is one I found online that helped reduce the pain of adding new articles to my transactional replication publications. For years I hated adding a new article since it forced me to reinitialize the entire subscription which could take a couple hours. Finally, I stumbled across this article "Adding an Article Without a Full Snapshot" On SQL Server Central, and things got much easier.

There are a few key points for this trick to be successful. First, you have to run the following scripts against the publication database before adding the publication and second, you have to add the article using the GUI; if you use the stored procs it will cause a snapshot of the entire publication to be generated.

EXEC sp_changepublication
@publication = N‘PublicationName',
@property = N'allow_anonymous',
@value = 'false';
GO

EXEC sp_changepublication
@publication = N'PublicationName',
@property = N'immediate_sync',
@value = 'false';
GO

If you need to set properties, like in trick #5 and #6, for this article you will need to set the properties for just the article(s) added. If you set the properties for All Table Articles, it will cause a snapshot of the entire publication to be generated.

One small disclaimer. This trick works most of the time however there are many things that may not be apparent that can still cause a snapshot of the entire publication to be generated.

Trick #9 - Changing the snapshot location

Occasionally the location you designate for your snapshots when you configure distribution can become unsuitable for several reasons. Perhaps your snapshots have grown large enough they no longer fit and you need to point to a location with more space. There are two ways of doing this. You can use a system stored procedure or you can use the GUI.

EXEC sp_changedistpublisher
@publisher = 'YourServerName',
@property = 'working_directory',
@value = 'S:\ReplData\'

OR

The next two tricks were quite controversial at one of the SQL Saturday where I gave this presentation. Especially trick #10. What these tricks show is how to overcome a problem that I ran into in my environment and others may run into as well. I am by no means suggesting that this is how you should do things but rather, when you have no other feasible options, you can make things work by using these tricks. Again, these are not necessarily best practices but rather tricks that may be useful in the real world.

Trick #10 - Overcoming version differences

When your publisher is an older version of SQL Server than your subscriber you may get this strange message if you are doing all the work from the publisher server.

It does seem strange that if you are trying to replicate from SQL Server 2012 to SQL Server 2016 it would tell you that SQL Server 2016 does not satisfy the minimum version level. To get around this you can create the subscription from the subscriber, after you have created the publication, rather than from the publisher. Or you can use SSMS 2016, or higher, to connect to both the publisher and subscriber.

Warning...the following is not supported by Microsoft

Deprecated Features in SQL Server Replication.

Even though it is not officially supported by Microsoft I was able to replicate from SQL Server 2008 R2 to SQL Server 2016 but had to use a pull subscription. I was also able to replicate from SQL Server 2016 to SQL Server 2008 R2 using both push and pull subscriptions. This can be useful if you have legacy systems that you are not able to upgrade but still need to use in a replication solution.

After the controversy at the SQL Saturday I contacted the Microsoft TAM for the company I work for and asked about this. He told me that even though it isn't supported it may work however just be aware that future updates to either version of SQL Server could break your replication as Microsoft won't test this functionality prior to release.

Trick #11 - Crossing domains

If you are replicating across domains, you can get a few different messages.

If you just put the server name without the domain, it will tell you the server can’t be found.

Try to fully qualify the name or use the IP address and it tells you that won’t work either.

To get around this...create an alias.

Yes, I realize the second error message says aliases aren't supported...however they work. Again, this isn't best practices but tricks that I use to get out of tough spots.

Additionally, if you don’t have trust between the domains you may need to use a SQL Login to connect to the subscriber.

Around this point I usually get asked what permissions are needed for these accounts.

Security Role Requirements for Replication Setup

Security Role Requirements for Replication Maintenance

Trick #12 - Script your publications

Back in trick #4 I mentioned we do semi-annual refreshes of our non-production environments by doing a restore of a backup from production. Before restoring the databases, I first drop the publications (on the non-production servers). Once the restore is complete I need to recreate the replication architecture. If I were to manually recreate it from scratch every time it would be very tedious and the chances that I would forget something would be very high. So, to make sure that things are put back exactly as before I make sure I have all my publications scripted. This is also useful for DR as well as setting up a new publication say a test server, copying the script to a UAT server and with a quick update to server names and credentials in the script I can create the matching UAT publication with very little effort.

To script a publication right click on the publication name and select Generate Scripts

From the Generate Script dropdown I prefer to Open in New Query Window then in the script I make a few modifications before saving the script.

Here are a few things you may want to update in your script before saving.

  • Clean up the publication access section of the script removing unneeded access.
  • I prefer to comment out the create subscription section and add subscriptions manually.

For multiple transactional publications on a database you may want to comment out the create log reader section on all but the first script you would run. If you don't it won't cause a problem but will keep telling you it can't create a log reader since one already exists.

As you add new articles to your publication, you can update the script to add new articles...or just generate a new script. I like keeping my scripts up to date for DR.

One good thing to know is that if you have configured article/publication properties, like were mentioned in tricks #3 and #4, these will be persisted when you script your publication.

Bonus Trick - Completely removing replication

As mentioned above I delete replication before restoring a database then recreate replication. Sometimes there are replication artifacts that come over from the server where the backup was taken, and this can cause issues with recreating publications. The fix I have found for this is to completely remove replication on the database, and then start from scratch. This works for transactional and merge replication.

EXEC sp_removedreplication @dbname = '[DatabaseName]', @type = 'tran'; -- for merge replication you would change 'tran' to 'merge'

Bonus Tip - Replication Monitor is your friend

There are many times where the only way to get a good error message is to dig it out of Replication Monitor. If you encounter issues with your replication such as seeing the SQL Agent jobs failing, and you don't see a useful message in the job history in SQL Agent, look in Replication Monitor.

I hope that you find some of these tricks useful.