My scenario was this:
Pulling in several million rows of data from a DB2 database system into an Enterprise 2012 SQL Server Analysis Services setup. This data was basically tranasction records keyed on several fields, like Location, job numbers…etc. And this data was from a main system that was not developped or maintained by the devs I work with.
But in the data’s structure there was an element that wasn’t completely rock solid. That was trying to get the most recent transaction of it’s type based on a max date, max time (on the given date) and also a max sequence number.
Problem was on occasion there was no sequence number and even no time records… really making it impossible to accurately get the desired results. This transactional data is used, from what I can tell, as a log that is compiled by a few different “sub programs” or modules within the main application.
This data was accessible through the legacy system, an ancient green screen program, which still worked quite well but was no longer being used. However, in this program when viewing the data you could see record level keys… but these were not accessible by query field.
These were loaded in correct sequence and give the desired results (in my case) with the proper criteria.
We didn’t think, at first, that it was possible to get these record numbers which wouldn’t just fix the current problem but also simplify our update scripts.
After a little research (thanks Google!) I found that it was so very easy.
I found DB2’s RRN() function. All I had to do was pass the table name to the function in the query and the RRN function would return this Relative Record Number.
SELECT RRN(users) AS rID,FirstName,LastName
FETCH FIRST 10 ROWS ONLY
Or using an alias on the table name
SELECT RRN(u) AS rID,FirstName,LastName
FROM users AS u
FETCH FIRST 10 ROWS ONLY
This query would return 10 records (SQL server TOP 10 equivalent) with rID as the Relative Record Number.
Thought I’d add this for future reference.
Working with a column that was defined as a varchar in a database, but it was intended the contain integer values. Whatever the original purpose for this I’m not sure as I’m not the database’s author.
The program that is used to enter data also allows the input of non-numeric characters.
So, in my scenario, this could be a problem when joining onto that column in a query.
Using ISNUMERIC() in this way eliminated all non-numeric data for me.
ISNUMERIC(dbColumn + ‘.0e0’) = 1
Greetings. It’s been so long since I’ve even touch this blog that I write this first line with a touch of shame.
Ok… I’m over it now. 😉
I’ve been working heavily with SQL Server 2008 and with DB2 for the last couple of years. I really love my job.
Lately we’ve been working on a data warehouse project which is structured for reporting. This presented a new concept in my lap which at first I didn’t find to hard to comprehend, we I still don’t really, but it threw a wrench into the wheel I’ve been accustomed to.
– Data denormalization –
Writing SQL code for many years, for me, has been about creating relational databases where data is divided up and joined together in queries. Denormalization brings forth the concept of doubling data for the purpose of having it readily available. (Fast)
So the thought process has been different. Writing SQL jobs that will suck in data in the format that we require to build our warehouse (CUBE – Ready). I guess what this method really seems to be is a space vs time trade off. You have much more data, doubled up… but the result is that you can query it very fast.
The cube my co-worker created was really really fast. Slicing and dicing the data at the blink of an eye.
Data is only as good as it’s usability.