As mentioned we have diverse backgrounds in this class. And lest there be any confusion, I am not talking about our ethnicities, home countries, or spoken languages. I’m talking about the academic spaces we each inhabit, which has me thinking along the lines of Data Science as having the potential to be the Lingua Franca or Translator between disciplines. Is Data Science what happens in the space that exists between Big Data domains and not within a single Big Data domain? When I say “Domain”, I just mean “academic discipline” or “field”. Even my use of the word “Domain” to mean “subject matter expertise” is an example of my having a certain perspective as a statistician.
This is divided up into the following sections:
— Game played while hiking with S’s 5-year-old
— Domain Surfing from Analytical Sociology to BioInformatics
— Feeling Stupid
— Example using BioInformatics Department
Game played while hiking with S’s 5-year-old
Here’s a little anecdote to have in mind as an analogy. One of my best friends, S, has a son R (who I love 🙂 ). I visited a few months ago and we went hiking. At the time, he was 5 and one of his favorite games to play was you had to give him two 3-letter words and he would find the mapping between the first word to the last word by changing one letter at a time. For example: I’d give him “DOG” and “CAT” and he’d come back with “DOG”->”COG”->”COT”->”CAT”. Notice all the steps in between are legitimate words and there are probably multiple solutions and in fact there are some solutions that are really long, you could go off on some really long chain if you really felt like it, but we were aiming for the shortest path. And one of the challenges was he just wanted me to keep making up more and more of them. Before I gave him the two words, I would check if a solution existed in my head and usually for 3-letter words, I could do that fairly fast. Then he got to 4- and 5- and even 6-letter words and it got harder for me to figure out whether a mapping even existed, and I wanted to only give him two words if a mapping existed because otherwise he’d get frustrated. Why didn’t I just give him two words and let him figure out if a mapping existed? I could have! That would have been a learning opportunity– mappings don’t always exist. But I wanted to give him problems with solutions. Plus, he’s 5! And he’s doing this! Not necessarily recommending it as a general parenting technique. Keep this in mind as an analogy for what I’m about to talk about. Think about whether this is a good analogy or whether there are aspects that don’t illustrate my points well.
Domain Surfing from Analytical Sociology to BioInformatics
Did you see Adam’s comment on the Big Data in My Blood post? I’ll excerpt some of it here because it’s what got me started on this whole train of thought:
One of my reasons for taking this class is my interest in so-called *analytical sociology*, of which Hedström is a prime proponent, and which embraces this micro-level explanation. How does this relate to Big Data? Well, actually, sociologists have had Big Data since (before?) they’ve had statistics. What we need now — and what we are beginning to get — is LittleBigData (apologies to Sony Computer Entertainment): large collections of detailed, longitudinal, accurate, and above all *individual-level* data about many people. Only with LittleBigData can we provide micro-explanations of macro-phenomena
Domain Expert vs Technical Expert
Now Adam is what I would call a “domain expert”. He’s spent a lot of time thinking about problems within Sociology. I would distinguish him from people like me who you could say are “technical” [I need a better word!] experts. I’ve studied math, operations research, statistics, … I know lots of methods to solve problems in quantitative ways. But I don’t have a deep understanding of biology or sociology. Now Adam is actually also “technical” in his understanding of sociology. So the distinction between “domain expert” and “technical expert” may not be a good or fair one. I know of graduates from the Political Science Department at Columbia, for example, who know their area of political science well (domain expertise), but are also well-versed in advanced statistical methods(technical expertise) and could probably in all fairness call themselves “statisticians” and get away with it, if they wanted to. There are people in sociology who would also be considered domain experts with much less technical background than Adam, so it’s a spectrum.
When I read Adam’s thoughts, the way I think as a “technical expert” is that I think it would be possible to generalize his problem formulation mathematically, and that mathematical generalization would then solve Adam’s problem as well as problems across other domains.
The first domain I thought of was BioInformatics. Now, recognize that I’m in over my head in both domains. But I have experts in both domains (Adam and the students sitting in from BioInformatics) who could help me figure out the mapping. So I’m saying approximately: Analytical Sociology LittleBigDataProblems are equivalent (in some “Data Sciencey” sense of the word) to BioInformatics Problems, and that I think it would be valuable to try to figure out how to translate between them.
When you’re an “expert” in some area, which to some degree or another we all are, we feel confident in that space, and it’s more comfortable to inhabit it. Right after I got my PhD, I finally felt like I could call myself an “expert”, and then started work and “felt stupid” again because my colleagues had their own language, system for doing things, vocabulary words, notations, assumptions, etc. Even reading Adam’s post, I feel a little stupid because I can’t admit to understand everything he’s saying. But am I supposed to? I mean, am I expected to be an expert in Everything in the World? If I let my impulse to feel stupid block me from trying to understand what he’s saying even though we speak slightly different languages, then we may not solve some important problems! But I don’t think it’s easy to figure out how to talk to people who are experts in different things than you are. We’ve all “grown up” in different academic disciplines, and we don’t even realize sometimes how much the discipline we come from is affecting how we approach solving problems. We need to figure out how to talk across disciplines even if it means feeling stupid!
Example using BioInformatics Department
There are a few people from the bioinformatics department sitting in the class. Aside from them, how many of you could give a reasonable description of what they do in BioInformatics departments? Without looking on Wikipedia or Google! I have some vague ideas, but I don’t think I really know very much about what they do there, or how they see their domain, or how they think of data in their domain. So we started an email thread which I will excerpt here for illustrative purposes (Permission gotten):
Rachel: how many people are there of you from bioinformatics [attending the class]?
Hojjat: To the best of my knowledge, there are two PhD students and one post-doc, and one MA student that take it for credit, and one PhD student who wants to audit. We have different perspectives, I believe, as some of us are more focused on Bioinformatics, while others are more focused on Clinical Informatics.
Rachel:Thanks! Well first maybe you could help me understand the distinction between those two fields as you perceive them.
Hojjat: Here is a try: Biomedical Informatics (aka Health Informatics) is the general discipline in which information technology and information theory (and data science?) is used on biomedical data and/or to improve healthcare. It has several subdomains in it:
* Bioinformatics: where the data is mostly at the cell/tissue level and encompasses genomics, proteomics, etc.
* Medical/Clinical informatics: where the data is mostly at the human/process level; this one frequently deals with data stored in electronic medical records. other discplines such as Systems Biology (again mostly cell level, but using a systems approach), Imaging Informatics (body organ level, ties tightly with physics, image processing, etc), Public Health Informatics (population level data)
Now that he’s talking the language of “data” I start seeing ways his domain problems map to some general “data” type problems, with structures that could perhaps map back to Adam’s problems. But I need to spend more time talking to the people in the department to really get how they think about their field. But then I reread the
post email, I realized I don’t know what all the words mean.
Rachel: what’s proteomics?
Hojjat: Proteomics is to proteins, what genomics is to genes.
Rachel: How do you think of protein data being structured?
Hojjat: I am adding more people to the list of recipients. They are also from our department.
Hojjat: Jonathan! You can answer this one more reliably.
Heather: Hojjat gave a great overview of the field, but I wanted to add my area of interest because it’s relevant to the NYT article mentioned on the class blog “Big Data in Your Blood” and also because this area is new to me too and I’m hoping this class will help me clarify things a little more.
I’m new to the world of Biomedical Informatics, so I’m still trying to find the right way to classify my area of interest. My background is in Epidemiology and Behavioral Science and I’m interested in understanding the way people/patients interact with “mobile” technology (phones,tables, wearable sensors, etc.) and how those technologies can be leveraged to improve their every day health, manage chronic diseases, and provide information (integrated into EHRs and clinician workflow) to help physicians and others on the healthcare team make more informed recommendations.
I’m planning to go into what I think the implications of this are for Data Science as a field and Communication between fields. But this is a fairly long already, so maybe look back to the story about S’s son, R, and think about whether the analogy is making sense at all? You might even think of me as standing in as little R trying to find the mapping between two words, or even as my adult self trying to decide whether the two words can even be mapped in the first place before I give little R the problem to solve.