Medical analytics

An associate is doing medical records analysis and statistics. He’s found that nobody else in his state is doing this: he’s suddenly found that he is a ‘lead case’ for doing medical analytics. But he’s a statistician, not a data analyst, and he’s doing all the case matching with scripts and by hand.

It occurs to me that there might possibly be a framework for matching case reports in different systems or different records in the same system. Perhaps IBM Watson might have this a standard module. Or something. Has anybody heard or seen anything about medical analytics?

[quote=“Melbourne, post:1, topic:938203, full:true”]

[quote=“Melbourne, post:1, topic:938203, full:true”]
An associate is doing medical records analysis and statistics. He’s found that nobody else in his state is doing this: he’s suddenly found that he is a ‘lead case’ for doing medical analytics. But he’s a statistician, not a data analyst, and he’s doing all the case matching with scripts[/quote]
Too bad your friend must crunch numbers with a calculator. That must be a terrible burden for a statistician.
Come on, isn’t data analytics part of a statistician’s job?
If all he wants to do is report that 2+2=4, then perhaps he needs to seek employment as an accountant.

Unnecessary snark aside, there are significant differences between a data scientist (analyst) and a statistician. Data scientists are ‘experts’ (I use that term loosely) in applying filtering, sorting, and pattern recognition algorithms to real world data to develop “analytics”, i.e. summary statistics that can be interpreted by executives, advertisers, finance analysts, et cetera. A data scientist might be called upon to cull data from online reviews and sales data, for instance, to find trends that could determine what areas might be most fruitful to introduce, say, a noodle bar. Data scientists, therefore, need to have a fairly thorough understanding of the phenomenon they are studying but generally have a relatively superficial grasp of statistics, and for the most part modern data science is dominated by Bayesian methodology, generally using empirically derived kernel density estimates to aggregate data in ways that do not require a deep understanding of statistical analysis and validation.

Statisticians are, strictly speaking, formal mathematicians with a focus on probability and statistical theory and methodology, both frequentist and Bayesian approaches. (I say ‘strictly speaking’ because many times someone is designated a ‘statistician’ who has simply turned the crank on some numbers; I’ve worked in reliability engineering and know a fair amount about applied survival analysis, but I am far from being a trained statistician.) To that end they have a deep understanding of the underpinning of statistical theory and how it applies to the analysis of data as a black box without specific reference to the source of the data. This is not to say that there are not interdisciplinary specialties within statistics; there are various areas of applied statistics such as biostatistics, demography, jurimetrics, reliability, actuarial science, et cetera, each of which has specialized methods for data analysis. A general statistician is not qualified to perform medical statistics any more than a civil engineer can design an aircraft or an electrical engineer is qualified to build a tunnel. So, before responding with a snidely superior reply, perhaps you should educate yourself first.

Melbourne, there are a number of textbooks on medical statistics but this sounds like an issue where he probably needs the guidance of someone familiar with the jargon and codes used in medical case records which is extensive and not easily picked up. There is a standard for medical billing codes in the united states referred to as the Healthcare Common Procedure Coding System (HCPCS) which has Current Procedural Terminology (CPT) and a Codes List; unfortunately, different health systems may have subtly different ways of using these codes, and the code system is updated once a year to reflect changes in medical technology and how the codes are applied, as well as numerous systematic errors and ambiguities in how the codes are used. If your associate is referencing these codes he probably needs the assistance of someone who is actually an expert in medical billing, as well as someone who has knowledge of actual medical terminology (RN-BSN, physician’s assistant, or a physician) if trying to correlate signs & symptoms with pathologies or something like that. Trying to figure this out on his own is probably just going to result in statistical gibberish.

Stranger

Medical billing is simpler here, and he’s got the medical side ok. But a case may be recorded as a medical case, and then three days later as a surgical case, and then 3 weeks later as a clinic case. If the patient is in the pharmacy system getting anaesthetics on the same day as the patient was scheduled for surgery, then he can match up the use of aneasthetic drugs to surgery. Those linkages do not exist in the system, so he’s running scripts, and making individual judgements.

That record matching is the first step in doing any kind of analysis, and anybody doing any kind of analytics must be doing the same thing. If it’s just the internet, that’s what tracking cookies are for. Medical patients don’t have tracking cookies (though providers certainly try). But surely there must be something that other people are doing?

(As an aside, he has to be very careful when asking pharmacy about drug records. Their default assumption is that somebody is looking for a scapegoat.)

This it’s only incidental to your real question, but yours is not a common definition of data scientist. Admittedly the term “data scientist“ is historically amorphous, and has been trying to find its way unsuccessfully to a truly common definition, but the way it’s used in industry hiring and training programs includes people with a high level of mathematical training, including PhD in statistics or applied math.

That said, a lot of colleges and universities are starting to dilute the term as they introduce masters programs that do indeed only require a superficial understanding of statistics, while taking advantage of the sexiness of the term. But those programs are generally crappy.

Melbourne, I’ve done a lot of work with medical records, can you provide a little more detail on what your associate is trying to match? Is this attempting to match single patient records to each other, or looking for ways to aggregate data to make it analyzable, or something else?

I’m not directly involved, but I think this is what he’s spent most of his time doing.

I’m not sure what this means? He can’t aggregate data until he’s singularized patient data, because until he does so, it’s just an amorphous mass.

Things that he’s mentioned to me are utilization, queue depth, and drug/diagnosis associations, which are all aggregate measures.

In fact, a good friend of mine, who was a senior data guy with IBM, spent several years working on a project to use Watson on medical analytics – in particular, on cancer diagnosis, and using Watson to understand the outcomes of different treatments, and optimize treatment recommendations.

That said, my friend is no longer at IBM, as that particular line of business was shut down this year.

I’m not sure what this means? He can’t aggregate data until he’s singularized patient data, because until he does so, it’s just an amorphous mass.

That’s fine if that’s not what he’s looking for - there are lots of ways to aggregate/analyze medical data without associating patient records to each other, but it sounds like he’s after something different.

Assuming he doesn’t have SSN or some other unique identifier, that’s a pain. Is he trying to do “fuzzy matching” of name/address for the patients, or something like that? What kind of software is he working with? [I know, lots of questions]

He’s just scripting, with PHP. Which is working, but it’s a lot of work. Probably 95% of the records are clear, and the other 5% – I dunno if it’s patient names or drug names or doctor names or what. He only talks to me about the statistics.

Oof. Well, that’s similar to something we did a long while back writing conditional SAS scripts to create matches, but as he probably knows well, the problem with manual scripting is that people can always fuck up data entry in a new way. Eventually we stopped that, & moved it all over to Python, which has a lot of embedded matching capabilities.

But my guess is your associate is looking for something out of the box? Can’t help there…we used Watson for a while but found it expensive.

On thing to think about, how complete does the data have to be for his analysis…is 95% enough?

Your friend needs to be looking into this, Intersystems Medical Analytics. .

If you are in Australia this is the parent company of the TrakCare.Medical Electronic Record System that originated there. Numerous medical institutions use this software worldwide, and they can provide the access to data in a single coding format that has been transformed from the numerous different coding systems and protocols available. There are certainly other companies in this business, but if I have your location right I think your friend really wants to start there, and find out a lot more about the business he’s trying to participate in.

He’s not even looking :slight_smile: It just seemed to me that there must be a lot of people interested in the related field of health economics, and I wondered if I could re-direct him

That does look like the kind of thing I thought would probably exist.

As I mentioned above, there is nobody else in his state doing the stuff he is doing.