You are on page 1of 7

Welcome back. Were coming around the stretch. This is week six. Thanks for staying with me.

This week, we're going to talk about research designs and statistical inference, and things related like that. It's my most favorite topic, and I hope you enjoy it with me. Five modules this week. This first one is background. I'll going to give you some background ideas. one point I want to make in this whole week is, one can do a PhD and a whole career in this kind of stuff. I spend a lot of my time thinking about it. It's difficult and nuanced. I want to share the basics with you, and I hope you can enjoy that with me. There are three assigned readings this week. I hope you enjoy them. two are written by me. hope you can indulge me here. methodology for social epidemiology is my primary interest. And there's a paper by a very distinguished scholar and friend and former colleague of mine, John McKinley, and he also offers some background. Hope that'll be useful for you. Wanted to share briefly some of the books that influenced me and my thinking about research designs and inference and methodology for social epi, And you can read these or just understand them if you wish. First is the work of William or Bill Cochran, one of the great statisticians working on observational data sets. I'll explain what that is in a little bit. Next is the great Don Rubin, the statistician from Harvard. He really innovated how we think about causal inference. David Freedman, the late David Freedman from Berkeley, genius statistician has a great way to articulate hard, difficult topics about statistical inference. For those who are less mathematically inclined. So this might be a text for you if you're interested in that kind of thing. Two more books. The work of Paul Rosenbaum student of Don Ruben, talks about how to match this idea

of propensity square matching. Don't worry about the details. Fascinating work. Really helpful for social eppie. Work by Richard or Dick Berk, one of my favorite people, authors in the world, has a critique of how we use regression models. If you don't know what that is, that's fine. If you do know what it is, it's worth evaluating or looking into Dick Berk's work. Finally, some history, of course the great historian of statistics, Stephen Stigler talks about where all the stats stuff comes from. And that's really fascinating. I think knowing where we came from is really important when we think about where we're headed. Charles Manski, Chuck Manski, the great econometrician, innovative work in effect identification. We'll have a module on that this week. Really important stuff, critical for social epi, maybe worth exploring. Finally, some recent work by some econometricians, econometricians, [LAUGH] or economic statisticians. Josh Angrist, and, Pischke have this lovely new book, Mostly Harmless Econometrics. And, this is well worth it if you want to go into this kind of work. Finally, no discussion of causality or causal inference can be complete without a reference to Judea Pearl. His work is innovative. Now this is a great book, particularly the last chapter. He's got a great story of the history of causal inference. The rest of the book can be rather technical, so it's worth exploring if you want But, just a heads up. But if you could find that last chapter, if you're interested, that's well worth it. And finally a shameless plug for my own work with my colleague Jay Kaufman. our text, Methods in Social Epi is the first and I still think only text about methodology for social epi, and I think I've mentioned before, we hope to get a second edition out very shortly. So that's some background for you. Okay, when it comes to methodology, here's one of my favorite quotes. It's from the great statistician

Cornfield "On being asked to talk on the principles of research, my first thought was to arise... and say, 'Be careful', and to sit down." And if you take nothing else from this week, That's what I'd like you to take. Be careful. These are very difficult, very nuanced issues, and social epidemiology is particularly vexing when it comes to whether x caused y or something like that. So this is the important message for the day, be careful. Background. So what is methodological research? Well, we're interested in scientific imprints. How we take some data and draw conclusions. We want to ask what conclusions can be drawn from some data And from assumptions. There's always data in assumptions. And the question is how do we balance those two. Do we have bigger stronger assumptions, and less great data. Or really great data, that requires fewer or testable assumptions. And together we put those two things together and we draw conclusions or make inferences. So that's one of the key points. What's science? We talked about this a little bit in week one. Science is a system that builds and organizes knowledge around the form of testable explanations and predictions about the world. So we're interested in things we can test and predict. And this is really important for advancing social epidemiology. What hypothesis can we test and what can we learn in or predict? Now the scientific method is just a way to have procedures to do science, if you will. You want to articulate some procedures for understanding what truth with a capital T is. We want to have transparency, and critically, replication. If we can't replicate a design or an experiment or a result, we want to be suspicious of it. Science can be a child-like process. It should be invigorating.

It should be one of wonder: looking into the world and finding new things, as a child does every day of his or her life. I want to make a distinction between science and research. And this is my take. The important distinction is this that a scientist is not interested in the outcome of a study, whether it says this caused that or whether this is greater than that. The scientist is interested in the procedures or methods to get to that conclusion. The outcome of the conclusion need not be so important. By contrast, a researcher is someone who looks for data to firm up what they already know. You see the process is reversed. Science is about learning things, whatever it might be. Research is about finding data to support what we already know. There can be great researchers who aren't scientists. So keep that in mind. This will come up when we talk about this critical issue of confirmation bias, looking for evidence to prove or demonstrate what we aleady know. But the scientific process entails more. The scientific process makes us ask what else could it be. That's the key. What is confirmation bias? Well, this is conducting research, often subconsciously, it's a human endeavor, in a way that supports our current beliefs. This is troubling for science. We want to always have the potential for alternative explanations. Now bias or, bias, confirmation bias, may arise in many ways. It can be the way we frame questions, the way we collect data, or of course, in the way we analyse or interpret data. So be on guard for confirmation bias in your work. In the work you read of others. Here's a quote that I like. Good science is more than the mechanics of research and experimentation. Good science requires that scientists look inward to contemplate the origin of their thoughts. The failure of science Failures of science do not begin with flawed evidence or fumbles statistics. They begin with personal self-deception

and an unjustified sense of knowing. This is what we must guard against. It's a huge threat in social epidemiology. Now briefly I want to jump into some difficult waters, the distinction between correlation and causation. Why is there a rooster here? Well, the rooster is here to make the point that two things can be correlated. Like the sun rising, and the rooster crying in the morning. Correlated, obviously, right? But the key distinction is, the farmer can train his rooster to sleep later. So the farmer can perhaps sleep later, but the sun will rise as it wishes. They are correlated but not causal. The rooster changing his wake-up time does not affect the rise of the sun. Now, in terms of statistics, most researchers, forever, for as long as there's been statistics, formally about 100 years, have focused on correlations. Most formal statisticians have avoided the term, causal or causation, and thus they're doing a randomized clinical trial. We'll talk about this later. This is changing, however. The last 20 or so years, epidemiologists, especially economists, especially, have focused on causal inference intensely. What do we need to know? What kind of evidence do we need to have to say that X causes Y? It is the rage and it has been important for 20+ years. Why are we looking at grapes? Well, I want to share with you the fable and remind you of the fable Of sour grapes and the fox. Remember this story, that there's a fox who wants to eat some lovely looking grapes, perhaps much like these. But the fox can't get the grapes because there's a fence between him and the grapes. So what does the fox do? Since he can't gets the grapes, can't get the grapes. He decides that the grapes must be sour and therefore doesn't want them. This is the idea of sour grapes, if we can't get something we want, we decide it's no good. This has happened, in my opinion, in the way we do research and the way we think about causation, since causation is such a hard thing to demonstrate or prove.

So many of us researchers say, don't do it. It's not worth it. I think that's a bad trend, and I want to encourage you to not have that sour grapes experience, but instead go for directly causal effects. Worth noting that the Wikipedia page on cause, causality is well worth exploring. It's long. It's technical in places. But if you want, take a look and take what you wish from this really interesting Wikipedia page. Well where does this all come from? Well in my view, two broad sort of trunks of a tree, which is where we are today in modern epidemiologic thinking. One comes from the great Karl Pearson, and here's a picture of the man himself. Now Pearson was alive in the early 1900s, doing his work. He was a student of Francis Galton, who worked on genes and genetics, and all kinds of great mathematical things. Pearson, of course, derived the correlation coefficient, otherwise known as the Pearson Correlation Coefficient. That's why it's Pearson. He developed other things, contingency tables, and he worked in the law of large numbers. Now, what's important about Pearson, one of the founding fathers of modern statistics, was he had no interest in causality. He didn't want to touch it, didn't think it was useful. And a lot of people are still his, sort of, intellectual descendants today. By contrast, if you will for this course, anyway, is Ronald Fisher, another giant of the field, another father of modern statistics. He developed things called Analysis of Variance. Interestingly, he rejected an offer to work, he was rejected of his interest in working in Pearson's lab. He worked in agriculture. He was trying understand how crop yield differed by fertilized and other environmental elements. So in contrast to Pearson, Fisher was interested in doing things and seeing what happened, running experiments. Fisher gave us randomization. He wanted to see if we had more fertilizer or more water, what happens to the crop.

You see, it's a very different perspective than Pearson, who was saying, this is correlated with that. Fisher was doing things with randomization and seeing the outcome. In terms of medicine or public health research, the next big name or perhaps last big name is Austin Bradford Hill, a British scholar. Now, he was a student of Pearson, which you'd think would make him all interested in correlation. But what Hill did was to take Fisher's idea of randomization and experiments, and bring them into medical science. He helped develop the first randomized clinical trial. So this is really important to understand. And of course, did other things with Richard Dahl on smoking, smoking and lung cancer, and heart disease, very important scholar. So what's the story here? Well, the story is correlation, correlation statistics, thinking about correlations, comes from Carl Pearson and his intellectual offspring. Work in doing things, running experiments and seeing what happens, comes from Fisher and his offspring. In this course, and in my own work, I want to emphasize the work of Fisher in doing things and running experiments. Hard to do in social epi, critically important, in my opinion.

You might also like