Learning Statistics with R, Part I

Written by: Cait Pickens

Primary Source: Computing Education

For my new job, I need to be doing some data analysis. While I love Python and the IPython Notebook, it is definitely time that I invest some effort in learning how to use R Stats. So, this blog post (maybe even a series?) documents what I am learning and how I believe it applies to learning science reseawrch.

I have started with this book, Learning Statistics with R: A tutorial for psychology students and other beginners. Why? Because it’s free, online, and I like the writing style. Plus, I think I was a cognitive psychologist in a past life (or will be one in a future life!), so I like readings that apply to the cognitive psych domain.

Disclaimer: The learning R part doesn’t start until Part II of the book!

Note: Bolded sections are quotes and ideas from the textbook. Subsequent text includes my commentary / thoughts / reactions / ideas.

Chapter 1: Why do we learn statistics?

  • If I’m a psychologist, why do I have to do statistics? If I wanted to be a statistician, I would be. I am going to trust that you are at the same point I am and know that you need statistics to do your research. If you aren’t there yet, read the introductory chapter!
  • Can’t someone else do the statistics? I’m going to be honest: this question crosses my mind a lot. Wouldn’t a full-time statistician be better trained to do these analyses than I am? The answer is: probably. But being proficient in stats allows me to have a lot more flexibility with my own research design because I know what is possible and what just isn’t. If I need help from a statistician somewhere down the road, then I can seek one out. For now, I’m looking to be decently self-sufficient in stats.

Chapter 2: A Brief Introduction to Research Design

  • Operationalisation is the process of taking a meaningful but somewhat vague concept and turning it into a precise measurement. Advice from the text: (1) be precise, (2) determine how you will measure the concept, (3) determine what the allowable values are that the measurement can take.
    • A theoretical construct is what you are trying to measure. (Example: age, gender, opinion)
    • A measure is a method or tool used to make observations.
    • An operationalisation is the connection between the theoretical construct and the measure.
  • Different scales of measurement let us distinguish between types of variables (actual data):
    • Nominal scale: there is no relationship between different possibilities; none is “bigger” or “better” than another; examples are eye color and gender
    • Ordinal scale: a relationship exists between different possible values the data can take; there is a natural, meaningful order to the possibilities; an example is who finishes first, second, third in a race
    • Interval scale: the differences between possible data values is meaningful; an example is temperature
    • Ratio scale: the possibility “0″ actually means zero; division and multiplication of data values is meaningful; an example is response time for students to answer questions
    • Continuous variables: for any two values you can think of, it is possible to have another value in between them
    • Discrete variables: not continuous
    • Likert scale (Strongly Agree – Strongly Disagree) is considered quasi-interval
Screenshot 2013-08-20 at 11.49.48 AM

The relationship between scales of measurement and discrete/continous distinctions

Reliability. I already have a blog post about this, so I am going to gloss over it. The chapter briefly explains test-retest, inter-rater, parallel forms, and internal consistency reliability. All good things!

  • Experimental versus non-experimental research. The text discusses that experimental research must be controlled and randomized by the experimenter. In learning science, that is pretty much impossible. Any time you’re working with students, “control” of an educational intervention is pretty difficult to achieve. I like the distinction of quasi-experimental research and case studies as a decent “middle ground” in psych research.
  • Validity. Again, I already have spent some blog time talking about how important validity is. Go read that. :] Or, read this part of the chapter, ’cause it gives a really nice description. It also has a good section on possible threats to validity.
  • Bias. There is a pretty extensive discussion of possible kinds of bias in a research design. They are very important to be aware of, but I am not going to detail them here.
The following two tabs change content below.
Cait Pickens
Cait Sydney Pickens is a graduate student at Michigan State University and a Noogler. Her research is in computer science education, and is advised by Bill Punch. In her (free?) time she does a lot of work with an organization called Software Carpentry that teaches computing tools and skills to scientists.
Cait Pickens

Latest posts by Cait Pickens (see all)