Thursday, November 21, 2013

Owning it

In SQL Server when you create a SQL Agent job it will automatically list the creator of the job as the owner unless you explicitly change the owner. The owner of the job does not determine the credentials the job runs under in fact you can have a disabled SQL Server account own your SQL Agent jobs and they will run just fine however if you have them owned by a Windows account that is removed from the domain you have a problem.

A couple years ago another DBA in my company was primary on certain servers and I was primary on others. When he left the company I assumed duties on all the servers. This was a very busy time for me as I had to adjust to the added workload. Just as I thought I had things under control the SQL Agent jobs on what had been the other DBA’s servers started failing. It wasn’t panic time and I quickly identified the issue, the trouble was that there were a LOT of SQL Agent jobs that he owned and they all were starting to fail. Why didn’t they fail right away? Corporate policy is to disable the account and mark it for deletion then, after a few weeks to a month, the account is removed from Active Directory so until the account was actually deleted the jobs ran just fine. I changed a few jobs manually but the next day discovered more failed jobs. I needed a way to find all the jobs he owned quickly so examining each job individually was out. Where to turn? Why the system views of course. The following scripts will allow you to find out who owns your SQL Agent jobs:.

-- For SQL Server 2005
select ma.name as JobOwner, ms.name as JobName
from master.sys.syslogins ma
join msdb.dbo.sysdtspackages90 ms on ms.ownersid = ma.sid

--For SQL Server 2008 and newer
select ma.name as JobOwner, ms.name as JobName
from master.sys.syslogins ma
join msdb.dbo.sysjobs ms on ms.owner_sid = ma.sid

Finding the jobs was only half the battle, now I needed to change them to an account that would allow them to continue to run. I am not going to go into a discussion of what is the correct account to own your SQL Agent jobs in this post I will simply tell you what I do and why. As part of my server set up procedure I give the ‘sa’ account a strong password then disable it. I also go the additional step of renaming the account. The only servers that I don’t do this on are vendor systems where they require ‘sa’ to be used. Why? A quick Bing, or Google, search will find you many articles on the dangers of using ‘sa’. It is better since Microsoft removed the option of leaving the ‘sa’ password blank however when our security team showed me all the attempted attacks that hit our firewall trying to login using ‘sa’ it was an easy decision to make a policy to disable and rename the account. Why not just delete it? You can’t.

So now I have this disabled account just sitting there...why not put it to use. I use the disabled and renamed ‘sa’ account to own my SQL Agent jobs. Back to the story…after finding all the jobs that needed to be renamed I created another script to change the job owner from the deleted Windows account to the disabled/renamed ‘sa’ account:

--For SQL Server 2005
begin tran
update msdb.dbo.sysdtspackages90
set [ownersid]=0x01
from master.sys.syslogins ma
join msdb.dbo.sysdtspackages90 ms on ms.ownersid = ma.sid
where ma.name = 'DOMAIN\old_account'
-- commit

--For SQL Server 2008 and newer
begin tran
update msdb.dbo.sysjobs
set [owner_sid]=0x01
from master.sys.syslogins ma
join msdb.dbo.sysjobs ms on ms.owner_sid = ma.sid
where ma.name = 'DOMAIN\old_account'
-- commit

I like to wrap these in a transaction then after the update I run my select again to verify all jobs are owned by the correct account then commit my changes…YMMV depending on your snapshot isolation level.

You will also note that I used the owner_sid in my update. This is one other advantage of the disabled/renamed ‘sa’ in that the SID is always the same and made creating the script easy. You can use another user but the SID would be much more complicated and vary from instance to instance so you would want to modify the script to join in the syslogins view to pull the correct SID for the account you want to use.

Monday, November 11, 2013

What Goes Up Must Come Down

In honor of Veterans Day I thought I would share a war story.

I used to be an officer in the Army National Guard. My branch was Engineers but when my battalion deployed for Operation Iraqi Freedom II from February 20th, 2004 through February 20th, 2005 I was in a Communications Officer position which in active duty battalions would be filled with a Signal Corp officer. Luckily the Communications Officer job was mainly involved in computer networks and I had finished my Master of Science degree in Information Systems in December of 2002 and was employed full time in the IT department of my current employer so I was up for the job.

One of my duties was running an “internet café” for the troops on our base. This consisted of a large air conditioned tent with 20 laptop computers and eight VoIP phones and it had its own VSAT (Very Small Aperture Terminal) satellite internet system. I had received training on how to tear this whole system down, load it on a truck then reassemble it at a new base and point it at the satellite. My section and I had done this a couple times and were pretty good at getting a strong signal with the satellite as well as knowing how to trouble shoot most issues that we ran into. One bright sunny day, like most days in the desert, one of my section sergeants came to the battalion headquarters to get me. The internet café was completely down.

When I got to the tent we ran through our checklist, reboot the laptops, phones and routers; HyperTerminal into the modem and check the signal strength. Everything checked out. There was only one more thing to do; go get one of the few Thuraya satellite phones and call the NOC back in the USA. I had called the NOC before when I had to get a new configuration file for the modem when we relocated the system, or when a VoIP phone needed a new configuration file pushed out to it but this was the first time the whole system was down. They were right on the problem and had the cause right away “your down link in Amsterdam has lost its connection to the satellite because of a severe rainstorm.” I was incredulous, I asked them to confirm that I had to tell the hundred waiting soldiers and marines that they couldn’t email or phone home today because it was raining in Amsterdam. They confirmed that “yes, until the rainstorm passes” our internet café in the sunny desert would be closed.

Passing on that bit of information was one of the harder things I had to do during my time in Iraq.

Friday, November 8, 2013

Joins, Indexes And Our Friend The Where Clause

This is a presentation I gave, just over a year ago, for one of our lunchtime “Byte Share” learning sessions for our developers, both Business Intelligence and Software, many of them new to writing queries. I have modified this from the original PowerPoint to something that is a bit more readable...I hope.



Joins, Indexes And Our Friend The Where Clause
(no relation to Santa)

Joins...Setting the Stage

Joins have the basic structure of-- FROM [Left_Table] JoinType [Right_Table] ON JoinCondition

There are three basic types of Joins: Inner Joins, Outer Joins and Cross Joins

We will be working with a very simple “notional” dataset



Let’s also take a moment to define a term, alias. This is used to reference database tables by short names or abbreviations within a query.

Inner Join

This type of join takes every record in the left table and looks for matches in the right table. For each match that is found you will have one row in the results set, if more than one match is found you will have multiple rows in the results set, if no match is found there will not be a row returned for that record.

SELECT REP.Name AS SalesRep, REG.RegionDesc AS Region
FROM SALES_REGION REG INNER JOIN SALES_REPRESENTATIVE REP ON REG.RegionID = REP.SalesRegion


This could also be written as:

SELECT REP.Name AS SalesRep, REG.RegionDesc AS Region
FROM SALES_REGION REG JOIN SALES_REPRESENTATIVE REP ON REG.RegionID = REP.SalesRegion


The INNER descriptor is optional but may add clarity when reading your query.

This query yields the following results from our notional dataset:



Notice there are no records for the North sales region since no sales person is assigned to that region.

Outer Join

Three variations:
Left Outer Join (aka Left Join)
Takes every record in the left table and looks for matches in the right table. For each match that is found you will have one row in the results set, if more than one match is found you will have multiple rows in the results set, if no match is found the column in the results set for the right table will have a NULL.
Right Outer Join (aka Right Join)
Same as a Left Outer Join but bases search on the right table rather than the left.
Full Outer Join (aka Full Join)
This is like a combination of Left and Right Outer Joins. All records from both tables are in the results set and matched where they meet the join condition and NULL in the results set columns where matches are not found.

SELECT REP.Name AS SalesRep, REG.RegionDesc AS Region
FROM SALES_REGION REG LEFT OUTER JOIN SALES_REPRESENTATIVE REP ON REG.RegionID = REP.SalesRegion




With our small minuscule notional dataset the results for a Right Outer Join are the same as for an Inner Join and the results for a Full Join are the same as the Left Outer Join.

Cross Join

Slightly different structure than an Inner or Outer join in that it has no ON condition. Instead it joins every row in the left table to every row in the right table creating a Cartesian product.

SELECT REP.Name AS SalesRep, REG.RegionDesc AS Region
FROM SALES_REGION REG CROSS JOIN SALES_REPRESENTATIVE REP




When would you use this? For this dataset the results don’t give us anything meaningful however there are circumstances that call for a Cross Join. For example, if you run an online business that prints books on order and you offer every title in a variety of languages doing a cross join of your TITLES table with your LANGUAGES table could give you all the product possibilities that you sell. Keep in mind that cross joins can be very dangerous if used improperly especially on tables with millions of rows. This would definitely impact performance if not completely bring the server to a screeching halt.

Now that we have covered the basics lets mix it up a bit with some complex Joins

Multiple Join Conditions

AND Join Conditions

FROM QUOTE Q JOIN EQUIPMENT E ON
Q.PROJECT_NO = E.PROJECT_NO AND
Q.QUOTE_NO = E.QUOTE_NO AND
Q.REV_NO = E.REV_NO


OR Join Conditions

FROM TABLE_A A JOIN TABLE_B B ON
A.ORDER_NO = B.CONTRACT_NO OR A.ORDER_NO = B.PURCHASEORDER_NO


Multiple tables, cross database mixing Inner and Outer joins

FROM QUOTE Q JOIN PROJECT P ON Q.PROJECT_NO = P.PROJECT_NO
LEFT JOIN EMPLOYEE.dbo.EMPLOYEE_INFO E ON P.SALESPERSON_ID = E.EMPLOYEE_ID

You can even use joins in an UPDATE query

UPDATE PROJECT
SET CONTRACTOR_ID = 12345
FROM PROJECT P JOIN PROJECT_STATUS S
ON P.STATUS_CODE = S.STATUS_CODE
WHERE S.STATUS_PROJECT_IS_ACTIVE = 1 AND P.SALESPERSON_ID = 54321

I bet you are wondering why is all the data split into all these tables that we need to join? This is done through a database design process called Normalization. Does this help with performance? No, it does not necessarily help with performance nor is it designed for that purpose. So why do we normalize the data? It is done to prevent data anomalies…but that is a discussion for another day.

Know thy data

Make sure that the joins you are creating make sense. You will typically want to make sure you are joining on columns that have a Primary Key—Foreign Key relationship. If you join on non-key values you could end up with…



Spurious: false, bogus, not valid.
Tuple: row in a result set.

Wait a minute!

You said that when databases are normalized it does not help with performance. It seems to me that if you split the data in to multiple tables that have to be reconnected it would actually be a performance hit. Is there any way to help speed things up? Yes there is….INDEXES.

SQL Server Indexes

There are two types of indexes:
Clustered, these are created automatically when creating Primary Key. There is only one per table…actually it is the table sorted on the Index Key.
Non-clustered, these have to be manually created and you may have multiple per table.

Clustered Indexes

These are kind of like a phone book, you seek based on the index key and your data is right there. Usually indexed on the Primary Key but can be changed if needed. It is possible to drop the Clustered Index on the Primary Key, this will also drop the Primary Key, a Clustered Index on a different key can then be created and the Primary Key re-added which will generate a unique non-clustered index on the Primary Key. There are some cases where an auto-generated identity column is used as the Primary Key rather than natural key. Typically this is done to maintain Primary/Foreign key relations when the data in the natural key is likely to change over time. Usually clustered indexes are created on columns that are used in other tables as foreign keys. This is done in improve query performance for joins.

Non-clustered Indexes

These act like the index in the back of a book, you search the index for the key value you are looking for and it directs you to the page where the data is stored. Non-clustered indexes can contain several columns, if the query can be completed using only the data in the columns included in the index it is called a covering index. These can be very useful for increasing performance of some high frequency queries. “So let’s just index all the columns.” Not a good idea. More indexes do not mean better performance. There is overhead for maintaining indexes. Whenever data is inserted, updated or deleted the index needs to be updated. Indexes take storage space; it is possible that you could create indexes that are eating up disk space that are not being used. Not all data should be indexed for example Yes/No data (low cardinality).

Our Friend the WHERE Clause

What is it used for?
It restricts the rows returned in the result set through the use of predicates; it is used to extract only those records that fulfill a specified criterion. Basically it acts as a filter for your data.

The WHERE clause is not just for SELECT queries!

Where should I use WHERE?
In SELECT queries to limit the size of your results set to just the relevant data.
In UPDATE queries; did you REALLY mean to change that value for all record in the table? Remember there is no “Undo” button.
In DELETE queries; typically I prefer to mark a row as inactive rather than delete but if you MUST delete make sure you just delete the appropriate records. Remember there is no “Undo” button.

Relational-Operator Conditions You can craft very simple to very complex WHERE conditions using the following operators:
=, >, <, <>, =>, <=, IN, NOT IN, BETWEEN, LIKE ‘Somethi%’, NOT LIKE ‘Everythi%’, IS NULL, IS NOT NULL, AND, OR …

These are the one you are most likely to use; there may be a few others that are not on this list.

Remember those indexes? One of the biggest benefit will be experienced due to the WHERE clause. Try to filter on Primary Keys or Foreign keys as much as possible. Integer and CHAR/VARCHAR are good for filtering and can be indexed. BLOBS (Binary Large OBjectS) cannot be indexed and should not be used for a filter. Yes, I realize that there are times you MUST filter on a TEXT datatype but keep in mind that it will be SLOW. Just please, please, please don’t do the following:

WHERE SomeValue LIKE ‘%string%’

Even if SomeValue is a Varchar(50) and indexed the leading wildcard will mean that the index won’t be used.