Print Story Someone here knows SQL right?
Diary
By jayhawk88 (Wed May 07, 2008 at 10:56:32 AM EST) (all tags)
Let's see if I can explain this:


I have the following SQL statement:

select accounts.account_number, collection_activities.status, accounts.datetime_last_modified,
       operators.first_name, operators.last_name,
       collectors.first_name, collectors.last_name
       from collection_activities, accounts, collectors, operators
where collection_activities.account_id = accounts.account_id
and collection_activities.collection_work_list_id = collectors.collector_id
and accounts.modified_by_operator_id = operators.operator_id
and collection_activities.status = 'u'
and accounts.datetime_last_modified > dateadd("d", -14, getdate())
order by accounts.datetime_last_modified desc, collection_activities.collection_activity_ts desc

There are four tables in a vendors database that I'm trying to pull info out of:

Accounts: Contains the field Account_Number and Account_ID, as well as the DateTime formatted field Datetime_Last_Modified.

Collections_Activity: Also contains Account_ID, and the Status field which I am interested in searching on obviously. Also contains the field Collection_Work_List_ID

Collectors: Basically a lookup table to hold the name and some other info on the person who is set as a "Collector" for a particular record, using Collector_ID which = Collection_Work_List_ID in Collections_Activity.

Operators: Same Idea as Collectors except for associating a name with the Datetime_Last_Modified.

OK, so here's the situation I get with the above query. The Collection Activities table might have multiple entries for a particular Account_ID. For example you'd get some results that look like this:

Act # Status     Datetime          Name     Name
55555   U   2008-01-01 12:00:00  Person A Person B
55555   U   2008-01-01 12:00:00  Person A Person C
55555   U   2008-01-01 12:00:00  Person A Person C

Person A = Last person to modify the record. Person B = Last Collector to enter collection information of some kind that would trigger an entry into the Collection_Activities table. Person C = Employee who hasn't been here for 6 months, but is associated with older entries in the Collection_Activities table that are showing up in my query. Which is obviously confusing since I'm only able to display the Datetime_Last_Modified from the Accounts table, it looks like these older entries in Collection_Activities by Person C are recent activity.

The main problem here is that Collection_Activities doesn't have any kind of datetime field associated with when entries in this table were made. It has a timestamp field, but as I understand it that's not super useful in determining an actual date and time the entry was made. That collection_activities.collection_activity_TS field I'm doing the second DESC sort on is that timestamp field, which is just basically putting the earliest timestamps at the top. What I'm wanting to do is find some way to eliminate the "old" entries from the final query results.

Off hand I cannot think of any way to do this, since the vendor did not give me any kind of datetime reference in this table to rely upon. Help? Is this bastard-spawn-of-hell of an Inner Join I'm doing to blame, better way to do this?

< Hello Boris voters! | of MFC and men* >
Someone here knows SQL right? | 18 comments (18 topical, 0 hidden) | Trackback
How I tackle these problems by Herring (4.00 / 2) #1 Wed May 07, 2008 at 11:06:44 AM EST
Take the above query, read the data into a list in memory and then use a proper programming language to do the real processing.

Works for me.

I'm English, and as such I crave disappointment. - Bill Bailey


Could I guess by jayhawk88 (2.00 / 0) #4 Wed May 07, 2008 at 11:12:48 AM EST
What I was hoping to do though was to be able to take the results of this query, dump it to a text file, and have the person who wants this info just access those files via file share. Trying to avoid an "I need access to Query Analyzer" situation.

Getting this info via a web page is also a possibility I guess, and would in theory allow me to better manipulate the data.

[ Parent ]

you're working by sasquatchan (2.00 / 0) #2 Wed May 07, 2008 at 11:09:32 AM EST
for a collections agency ? A pox on you!



I'm not by jayhawk88 (2.00 / 0) #3 Wed May 07, 2008 at 11:10:58 AM EST
It's for our medical practice people who run billing for the clinics. Clinics who specialize in taking care of the uninsured.

[ Parent ]

"Taking care" Edwards style by Rogerborg (2.00 / 0) #6 Wed May 07, 2008 at 12:36:03 PM EST
Or Larry Niven style?

-
Metus amatores matrum compescit, non clementia.
[ Parent ]

The former by jayhawk88 (4.00 / 1) #9 Wed May 07, 2008 at 12:44:53 PM EST
Trust me, you wouldn't want the organs of the kind of patients our clinics see.

[ Parent ]

You're screwed by theboz (4.00 / 2) #5 Wed May 07, 2008 at 11:41:33 AM EST
While it returns results, what you are doing is no longer considered proper SQL.  In order to read it, I first converted it to modern SQL like this:

SELECT
    act.account_number,
    colact.status,
    act.datetime_last_modified,
    oprs.first_name,
    oprs.last_name,
    clrs.first_name,
    clrs.last_name
FROM collection_activities colact
    INNER JOIN accounts act ON
        act.account_id = colact.account_id
    LEFT JOIN collectors clrs ON
        clrs.collector_id = colact.collection_work_list_id
    LEFT JOIN operators oprs ON
        oprs.operator_id = act.modified_by_operator_id
where
colact.status = 'u'
and act.datetime_last_modified > dateadd("d", -14, getdate())
order by act.datetime_last_modified desc, colact.collection_activity_ts desc

The first problem I see is that it is entirely possible you're not getting all of the data as a result of using inner joins on everything.  I would start by only making fields that appear in the where clause part of an inner join as one of my best practices.

However, the more I look at this, the more I think you're fucked.  You are being asked to use data that simply doesn't exist.  Why isn't the timestamp in the datetime field useful?  Is it manually entered by the users via the application?  In situations where there are no good options, you have to go with the closest you have, and that field may be it.  If you really wanted to think out of the box, and I would have told my users no long before this, you can see if the SQL logs are available and if you can query them (if you're on SQL Server, you might need 2005 for this to work) then join them in based on some fucked up criteria I have no idea what it would be, and get timestamps from that.
- - - - -
That's what I always say about you, boz, you have a good memory for random facts about pussy. -- joh3n


I have NO idea what you just said by Rogerborg (2.00 / 0) #7 Wed May 07, 2008 at 12:37:07 PM EST
But I am graving images of you with my free hand right now.

-
Metus amatores matrum compescit, non clementia.
[ Parent ]

That's disturbing by theboz (4.00 / 1) #18 Wed May 07, 2008 at 03:17:25 PM EST
Although if a girl posted that, I would have posted some SQL for converting an interger field storing a date in epoch time to a date field.
- - - - -
That's what I always say about you, boz, you have a good memory for random facts about pussy. -- joh3n
[ Parent ]

Yes it's true by jayhawk88 (2.00 / 0) #8 Wed May 07, 2008 at 12:43:00 PM EST
I am a total SQL scrub, thx for the better code.

Maybe the timestamp field is useful? Keep in mind when I say "timestamp" I mean that it is formatted with the timestamp MSSQL field type. Everything I read says that timestamp is just a binary number chosen arbitrarily by the system and does not relate to an actual calendar date/time in any way, but possible I'm mistaken?

Actually this may all be moot. I have since discovered some information in another table, Collection_Activity_Details, that has a Contact Date field, which may or may not be what I'm looking for, still trying to determine this.

The best part about all of this is that we're asking the vendor for help in pulling this information, and they're sending back super useful stuff like:

Select * from accounts where account number = "XXXX"

Then find the Account ID in your results, then do the following query:

Select * from collection_activities where.....

On and on for like 6 simple queries. Gee thanks guys.

[ Parent ]

Correct by ad hoc (2.00 / 0) #10 Wed May 07, 2008 at 12:55:30 PM EST
MSSQL "timestamps" aren't what normal people would expect timestamps to be. It's just a number that means nothing more than "the smaller value is the earlier one". It's only useful for sorting and relative comparison. It's no good for getting and actual date or time.
--
The three things that make a diamond also make a waffle.
[ Parent ]

Oh...I see by theboz (2.00 / 0) #16 Wed May 07, 2008 at 03:15:07 PM EST
Then yeah, the timestamp is useless in that case.  Hopefully the details table has what you need.  It's too bad the vendor doesn't have a data dictionary they can send you, or make use of things like foreign keys so you can see how the tables relate.  Of course, based on your description, it sounds like the vendor are idiots anyway and probably don't know how to do that.
- - - - -
That's what I always say about you, boz, you have a good memory for random facts about pussy. -- joh3n
[ Parent ]

Thanks again for this by jayhawk88 (2.00 / 0) #13 Wed May 07, 2008 at 02:34:02 PM EST
The first problem I see is that it is entirely possible you're not getting all of the data as a result of using inner joins on everything.

Looks like you might be right. I took this and modified it somewhat to take this new table into account, and all of the sudden I have some new records showing up, specifically ones that had null values for the First and Last name fields.

[ Parent ]

They're probably new records by theboz (2.00 / 0) #17 Wed May 07, 2008 at 03:16:21 PM EST
Or records created in error or under a different set of code.  In any case, it's always good to be aware of them before changing your join or where clause to leave them out.  I've had trouble with that before, which is why I always use left joins now and clean it up later.
- - - - -
That's what I always say about you, boz, you have a good memory for random facts about pussy. -- joh3n
[ Parent ]

question by coryking (2.00 / 0) #11 Wed May 07, 2008 at 02:25:17 PM EST
First: use the SQL by theboz.  JOINs are your friends.

Second: I gather you are using a database that isn't MySQL and can thus handle real queries containing "enterprise" things like sub-queries, right?

Okay... what I gather is you are wanting "Show me all accounts that are open and show me the current operator and current collector", right?

The only way to get "current operator" and "current collector" is from the "activities done on this account" table, right?

It also sounds like there is another table "Collection Activity Details" that contains one or more details about each collection activity, right?  Is it one-one or one-many?

If it is "One activity has more than one detail", it might be collection activity is like a trouble ticket and the "detail" is every even on that trouble ticket, right?

You might not have to pull in the details.  You just need to get a GROUP BY on acct_id.  Here is a very crude sketch of what I'm thinking:


SELECT
    act.account_number,
    colact.status,
    act.datetime_last_modified,
    oprs.first_name,
    oprs.last_name,
    clrs.first_name,
    clrs.last_name
FROM (
        SELECT
            inner_coll.acct_id,
            inner_coll.status
        FROM collection_activities inner_coll
        WHERE inner_coll.status='u'
        GROUP BY inner_coll.acct_id, inner_coll.status
    ) AS colact
    INNER JOIN accounts act ON
        act.account_id = colact.account_id
    LEFT JOIN collectors clrs ON
        clrs.collector_id = colact.collection_work_list_id
    LEFT JOIN operators oprs ON
       oprs.operator_id = act.modified_by_operator_id
WHERE
    act.datetime_last_modified > dateadd("d", -14, getdate())
ORDER BY
    act.datetime_last_modified DESC

I already spent far to much time on this for you though :-)  I think i'm on the right track, but thinking about it more, you still need to hook up the details table... I think I'd be putting it in that subselect and getting the subselect to to return the "person id" for the operator and collector.

...Or something.  Bottom line is that provided you can get date on the account activity (or details) you can do what you ask in SQL.  Only wusses would do this in code :-)
We are Siamese if you please. We are Siamese if you don't please.


oh yeah by coryking (2.00 / 0) #12 Wed May 07, 2008 at 02:33:14 PM EST
But no matter what, if this query is going to be used a lot, make sure to profile the hell out of it.  I know MSSQL has some good query analysis tools, and you will need them. 

Since you make it sound like the designers of this schema are a bit clueless, odds are good it will be lacking the right indexes to make queries like the one I describe purr like a kitten.
We are Siamese if you please. We are Siamese if you don't please.
[ Parent ]

Actually by jayhawk88 (2.00 / 0) #14 Wed May 07, 2008 at 02:40:18 PM EST
Looking at the database structure some more I have discovered a "Collection_Activity_Details" table which seems to have more information about the "collection entries" users make. Quite why you would have two tables to store information on the same records I don't really know, and I'm not yet sure if this "Contact Date" is really recording a date that the "collection entry" was made, but feeling a little bit better about their dbase skills. Part of the problem here I think is I really don't understand anything about the workflow users are going through here.

It'll take me about 4 more reads to get through and start really comprehending what all you wrote there, but thx in advance for the effort. It is starting to look like I can get what I want out of this.

[ Parent ]

yeah by coryking (2.00 / 0) #15 Wed May 07, 2008 at 02:52:10 PM EST
You need to draw the fucker on the white board really.  Once you do that, it is a matter of SQL.

The main thing is to know you can do subselects like in my example.  You should also read up on GROUP BY and aggregate queries (I still have a hard time with them).

PostgreSQL has a HAVING clause, which goes after the ORDER BY and is like a filter on the results of your query.  I'm sure MSSQL has something like that too.   HAVING might come in handy for this query.

Odds are also good MSSQL has some fancy pants language extensions of this kind of stuff that i don't know about, so if database portability isn't a biggie, look into those as well.
We are Siamese if you please. We are Siamese if you don't please.
[ Parent ]

Someone here knows SQL right? | 18 comments (18 topical, 0 hidden) | Trackback