To
understand the use of statistics, one needs to know a little bit about
experimental design or how a researcher conducts investigations. A little
knowledge about methodology will provide us with a place to hang our
statistics. In other words, statistics are not numbers that just appear out of
nowhere. Rather, the numbers (data) are generated out of research. Statistics
are merely a tool to help us answer research questions. As such, an
understanding of methodology will facilitate our understanding of basic
statistics.
Validity
A
key concept relevant to a discussion of research methodology is that of
validity. When an individual asks, "Is this study valid?", they are
questioning the validity of at least one aspect of the study. There are four
types of validity that can be discussed in relation to research and statistics.
Thus, when discussing the validity of a study, one must be specific as to which
type of validity is under discussion. Therefore, the answer to the question
asked above might be that the study is valid in relation to one type of
validity but invalid in relation to another type of validity.
Each
of the four types of validity will be briefly defined and described below. Be
aware that this represents a cursory discussion of the concept of validity.
Each type of validity has many threats which can pose a problem in a research
study. Examples, but not an exhaustive discussion, of threats to each validity
will be provided. For a comprehensive discussion of the four types of validity,
the threats associated with each type of validity, and additional validity
issues see Cook and Campbell (1979).
Statistical Conclusion Validity: Unfortunately, without a background
in basic statistics, this type of validity is difficult to understand. According
to Cook and Campbell (1979), "statistical conclusion validity refers to
inferences about whether it is reasonable to presume covariation given a
specified alpha level and the obtained variances (p. 41)." Essentially,
the question that is being asked is - "Are the variables under study
related?" or "Is variable A correlated (does it covary) with Variable
B?". If a study has good statistical conclusion validity, we should be
relatively certain that the answer to these questions is "yes".
Examples of issues or problems that would threaten statistical conclusion
validity would be random heterogeneity of the research subjects (the subjects
represent a diverse group - this increases statistical error) and small sample
size (more difficult to find meaningful relationships with a small number of
subjects).
Internal Validity: Once it has been determined that the two variables (A &
B) are related, the next issue to be determined is one of causality. Does A
cause B? If a study is lacking internal validity, one can not make cause and
effect statements based on the research; the study would be descriptive but not
causal. There are many potential threats to internal validity. For example, if
a study has a pretest, an experimental treatment, and a follow-up posttest,
history is a threat to internal validity. If a difference is found between the
pretest and posttest, it might be due to the experimental treatment but it
might also be due to any other event that subjects experienced between the two
times of testing (for example, a historical event, a change in weather, etc.).
Construct Validity: One is examining the issue of construct validity when one
is asking the questions "Am I really measuring the construct that I want
to study?" or "Is my study confounded (Am I confusing
constructs)?". For example, if I want to know a particular drug (Variable
A) will be effective for treating depression (Variable B) , I will need at
least one measure of depression. If that measure does not truly reflect
depression levels but rather anxiety levels (Confounding Variable X), than my
study will be lacking construct validity. Thus, good construct validity means
the we will be relatively sure that Construct A is related to Construct B and
that this is possibly a causal relationship. Examples of other threats to
construct validity include subjects apprehension about being evaluated,
hypothesis guessing on the part of subjects, and bias introduced in a study by
expectencies on the part of the experimenter.
External Validity: External validity addresses the issue of being able to
generalize the results of your study to other times, places, and persons. For
example, if you conduct a study looking at heart disease in men, can these
results be generalized to women? Therefore, one needs to ask the following
questions to determine if a threat to the external validity exists: "Would
I find these same results with a difference sample?", "Would I get
these same results if I conducted my study in a different setting?", and
"Would I get these same results if I had conducted this study in the past
or if I redo this study in the future?" If I can not answer
"yes" to each of these questions, then the external validity of my
study is threatened.
Types of Research Studies
There
are four major classifications of research designs. These include observational
research, correlational research, true experiments, and quasi-experiments. Each
of these will be discussed further below.
Observational research: There are many types of studies which could be defined as observational
research including case studies, ethnographic studies, ethological studies,
etc. The primary characteristic of each of these types of studies is that
phenomena are being observed and recorded. Often times, the studies are
qualitative in nature. For example, a psychological case study would entail
extensive notes based on observations of and interviews with the client. A
detailed report with analysis would be written and reported constituting the
study of this individual case. These studies may also be qualitative in nature
or include qualitative components in the research. For example, an ethological
study of primate behavior in the wild may include measures of behavior
durations ie. the amount of time an animal engaged in a specified behavior. This
measure of time would be qualitative.
Surveys
are often classified as a type of observational research.
Correlational research: In general, correlational research examines the covariation
of two or more variables. For example, the early research on cigarette smoking
examine the covariation of cigarette smoking and a variety of lung diseases.
These two variable, smoking and lung disease were found to covary together.
Correlational
research can be accomplished by a variety of techniques which include the
collection of empirical data. Often times, correlational research is considered
type of observational research as nothing is manipulated by the experimenter or
individual conducting the research. For example, the early studies on cigarette
smoking did not manipulate how many cigarettes were smoked. The researcher only
collected the data on the two variables. Nothing was controlled by the
researchers.
It
is important to not that correlational research is not causal research. In
other words, we can not make statements concerning cause and effect on the
basis of this type of research. There are two major reasons why we can not make
cause and effect statements. First, we donĂ‚¹t know the direction of the cause.
Second, a third variable may be involved of which we are not aware. An example
may help clarify these points.
In
major clinical depressions, the neurotransmitters serotonin and/or
norepinephrine have been found to be depleted (Coppen, 1967; Schildkraut &
Kety, 1967). In other words, low levels of these two neurotransmitters have
been found to be associated with increased levels of clinical depression.
However, while we know that the two variables covary - a relationship exists -
we do not know if a causal relationship exists. Thus, it is unclear whether a
depletion in serotonin/norepinephrine cause depression or whether depression
causes a depletion is neurotransmitter levels. This demonstrates the first
problem with correlational research; we don't know the direction of the cause.
Second, a third variable has been uncovered which may be affecting both of the
variables under study. The number of receptors on the postsynaptic neuron has
been found to be increased in depression (Segal, Kuczenski, & Mandell,
1974; Ventulani, Staqarz, Dingell, & Sulser, 1976). Thus, it is possible
that the increased number of receptors on the postsynaptic neuron is actually
responsible for the relationship between neurotransmitter levels and
depression. As you can see from the discussion above, one can not make a simple
cause and effect statement concerning neurotransmitter levels and depression
based on correlational research. To reiterate, it is inappropriate in
correlational research to make statements concerning cause and effect.
Correlational
research is often conducted as exploratory or beginning research. Once
variables have been identified and defined, experiments are conductable.
True Experiments: The true experiment is often thought of as a laboratory
study. However, this is not always the case. A true experiment is defined as an
experiment conducted where an effort is made to impose control over all other
variables except the one under study. It is often easier to impose this sort of
control in a laboratory setting. Thus, true experiments have often been
erroneously identified as laboratory studies.
To
understand the nature of the experiment, we must first define a few terms:
- Experimental or treatment group - this is the group that receives the experimental treatment, manipulation, or is different from the control group on the variable under study.
- Control group - this group is used to produce comparisons. The treatment of interest is deliberately withheld or manipulated to provide a baseline performance with which to compare the experimental or treatment group's performance.
- Independent variable - this is the variable that the experimenter manipulates in a study. It can be any aspect of the environment that is empirically investigated for the purpose of examining its influence on the dependent variable.
- Dependent variable - the variable that is measured in a study. The experimenter does not control this variable.
- Random assignment - in a study, each subject has an equal probability of being selected for either the treatment or control group.
- Double blind - neither the subject nor the experimenter knows whether the subject is in the treatment of the control condition.
Now
that we have these terms defined, we can examine further the structure of the
true experiment. First, every experiment must have at least two groups: an experimental
and a control group. Each group will receive a level of the independent
variable. The dependent variable will be measured to determine if the
independent variable has an effect. As stated previously, the control group
will provide us with a baseline for comparison. All subjects should be randomly
assigned to groups, be tested a simultaneously as possible, and the experiment
should be conducted double blind. Perhaps an example will help clarify these
points.
Wolfer
and Visintainer (1975) examined the effects of systematic preparation and
support on children who were scheduled for inpatient minor surgery. The
hypothesis was that such preparation would reduce the amount of psychological
upset and increase the amount of cooperation among thee young patients. Eighty
children were selected to participate in the study. Children were randomly
assigned to either the treatment or the control condition. During their
hospitalization the treatment group received the special program and the
control group did not. Care was take such that kids in the treatment and the
control groups were not roomed together. Measures that were taken included
heart rates before and after blood tests, ease of fluid intake, and self-report
anxiety measures. The study demonstrated that the systematic preparation and
support reduced the difficulties of being in the hospital for these kids.
Let
us examine now the features of the experiment described above. First, there was
a treatment and control group. If we had had only the treatment group, we would
have no way of knowing whether the reduced anxiety was due to the treatment or
the weather, new hospital food, etc. The control group provides us with the
basis to make comparisons The independent variable in this study was the
presence or absence of the systematic preparation program. The dependent
variable consisted of the heart rates, fluid intake, and anxiety measures. The
scores on these measures were influenced by and depended on whether the child
was in the treatment or control group. The children were randomly assigned to
either group. If the "friendly" children had been placed in the
treatment group we would have no way of knowing whether they were less anxious
and more cooperative because of the treatment or because they were "friendly".
In theory, the random assignment should balance the number of
"friendly" children between the two groups. The two groups were also
tested at about the same time. In other words, one group was not measured
during the summer and the other during the winter. By testing the two groups as
simultaneously as possible, we can rule out any bias due to time. Finally, the
children were unaware that they were participants in an experiment (the parents
had agreed to their children's participation in research and the program), thus
making the study single blind. If the individuals who were responsible for the
dependent measures were also unaware of whether the child was in the treatment
or control group, then the experiment would have been double blind.
A
special case of the true experiment is the clinical trial. A clinical trial is
defined as a carefully designed experiment that seeks to determine the clinical
efficacy of a new treatment or drug. The design of a clinical trial is very
similar to that of a true experiment. Once again, there are two groups: a
treatment group (the group that receives the therapeutic agent) and a control
group (the group that receives the placebo). The control group is often called
the placebo group. The independent variable in the clinical trial is the level
of the therapeutic agent. Once again, subjects are randomly assigned to groups,
they are tested simultaneously, and the experiment should be conducted double
blind. In other words, neither the patient or the person administering the drug
should know whether the patient is receiving the drug or the placebo.
Quasi-Experiments: Quasi-experiments are very similar to true experiments but
use naturally formed or pre-existing groups. For example, if we wanted to
compare young and old subjects on lung capacity, it is impossible to randomly
assign subjects to either the young or old group (naturally formed groups).
Therefore, this can not be a true experiment. When one has naturally formed
groups, the variable under study is a subject variable (in this case - age) as
opposed to an independent variable. As such, it also limits the conclusions we
can draw from such an research study. If we were to conduct the
quasi-experiment, we would find that the older group had less lung capacity as
compared to the younger group. We might conclude that old age thus results in
less lung capacity. But other variables might also account for this result. It
might be that repeated exposure to pollutants as opposed to age has caused the
difference in lung capacity. It could also be a generational factor. Perhaps
more of the older group smoked in their early years as compared to the younger
group due to increased awareness of the hazards of cigarettes. The point is
that there are many differences between the groups that we can not control that
could account for differences in our dependent measures. Thus, we must be
careful concerning making statement of causality with quasi-experimental
designs.
Quasi-experiments
may result from studying the differences between naturally formed groups (ie.
young & old; men & women). However, there are also instances when a
researcher designs a study as a traditional experiment only to discover that
random assignment to groups is restricted by outside factors. The researcher is
forced to divide groups according to some pre-existing criteria. For example,
if a corporation wanted to test the effectiveness of a new wellness program,
they might decide to implement their program at one site and use a comporable
site (no wellness program) as a control. As the employees are not shuffled and
randomly assigned to work at each site, the study has pre-existing groups.
After a few months of study, the researchers could then see if the wellness
site had less absenteeism and lower health costs than the non-wellness site.
The results are again restricted due to the quasi-correlational nature of the
study. As the study has pre-existing groups, there may be other differences
between those groups than just the presence or absence of a wellness program.
For example, the wellness program may be in a significantly newer, more
attractive building, or the manager from hell may work at the nonwellness
program site. Either way, it a difference is found between the two sites it may
or may not be due to the presence/absence of the wellness program.
To
summarize, quasi-experiments may result from either studying naturally formed
groups or use of pre-existing groups. When the study includes naturally formed
groups, the variable under study is a subject variable. When a study uses
pre-existing groups that are not naturally formed, the variable that is
manipulated between the two groups is an independent variable (With the
exception of no random assignment, the study looks similar in form to a true
experiment). As no random assignment exists in a quasi-experiment, no causal
statements can be made based on the results of the study.
Populations and Samples
When
conducting research, one must often use a sample of the population as opposed
to using the entire population. Before we go further into the reasons why, let
us first discuss what differentiates between a population and a sample.
A
population can be defined as any set of persons/subjects having a common
observable characteristic. For example, all individuals who reside in the
United States make up a population. Also, all pregnant women make up a
population. The characteristics of a population are called a parameter. A
statistic can be defined as any subset of the population. The characteristics
of a sample are called a statistic.
Why Sample?
This
brings us to the question of why sample. Why should we not use the population
as the focus of study. There are at least four major reasons to sample.
First,
it is usually too costly to test the entire population. The United States government
spends millions of dollars to conduct the U.S. Census every ten years. While
the U.S. government may have that kind of money, most researchers do not.
The
second reason to sample is that it may be impossible to test the entire
population. For example, let us say that we wanted to test the 5-HIAA (a
serotonergic metabolite) levels in the cerebrospinal fluid (CSF) of depressed
individuals. There are far too many individuals who do not make it into the
mental health system to even be identified as depressed, let alone to test
their CSF.
The
third reason to sample is that testing the entire population often produces
error. Thus, sampling may be more accurate. Perhaps an example will help
clarify this point. Say researchers wanted to examine the effectiveness of a
new drug on Alzheimer's disease. One dependent variable that could be used is
an Activities of Daily Living Checklist. In other words, it is a measure of
functioning o a day to day basis. In this experiment, it would make sense to
have as few of people rating the patients as possible. If one individual rates
the entire sample, there will be some measure of consistency from one patient
to the next. If many raters are used, this introduces a source of error. These
raters may all use a slightly different criteria for judging Activities of
Daily Living. Thus, as in this example, it would be problematic to study an
entire population.
The
final reason to sample is that testing may be destructive. It makes no sense to
lesion the lateral hypothalamus of all rats to determine if it has an effect on
food intake. We can get that information from operating on a small sample of
rats. Also, you probably would not want to buy a car that had the door slammed
five hundred thousand time or had been crash tested. Rather, you probably would
want to purchase the car that did not make it into either of those samples.
Types of Sampling Procedures
As
stated above, a sample consists of a subset of the population. Any member of
the defined population can be included in a sample. A theoretical list (an
actual list may not exist) of individuals or elements who make up a population
is called a sampling frame. There are five major sampling procedures.
The
first sampling procedure is convenience. Volunteers, members of a class,
individuals in the hospital with the specific diagnosis being studied are
examples of often used convenience samples. This is by far the most often used
sample procedure. It is also by far the most biases sampling procedure as it is
not random (not everyone in the population has an equal chance of being
selected to participate in the study). Thus, individuals who volunteer to
participate in an exersise study may be different that individuals who do not
volunteer.
Another
form of sampling is the simple random sample. In this method, all
subject or elements have an equal probability of being selected. There are two
major ways of conducting a random sample. The first is to consult a random
number table, and the second is to have the computer select a random sample.
A
systematic sample is conducted by randomly selecting a first case on a
list of the population and then proceeding every Nth case until your sample is
selected. This is particularly useful if your list of the population is long.
For example, if your list was the phone book, it would be easiest to start at
perhaps the 17th person, and then select every 50th person from that point on.
Stratified sampling makes up the fourth sampling strategy. In a stratified
sample, we sample either proportionately or equally to represent various strata
or subpopulations. For example if our strata were states we would make sure and
sample from each of the fifty states. If our strata were religious affiliation,
stratified sampling would ensure sampling from every religious block or
grouping. If our strata were gender, we would sample both men and women.
Cluster sampling makes up the final sampling procedure. In cluster sampling
we take a random sample of strata and then survey every member of the group.
For example, if our strata were individuals schools in the St. Louis Public
School System, we would randomly select perhaps 20 schools and then test all of
the students within those schools.
Sampling Problems
There
are several potential sampling problems. When designing a study, a sampling
procedure is also developed including the potential sampling frame. Several
problems may exist within the sampling frame. First, there may be missing
elements - individuals who should be on your list but for some reason are
not on the list. For example, if my population consists of all individuals
living in a particular city and I use the phone directory as my sampling frame
or list, I will miss individuals with unlisted numbers or who can not afford a
phone.
Foreign elements make up my second sampling problem. Elements which should
not be included in my population and sample appear on my sampling list. Thus,
if I were to use property records to create my list of individuals living
within a particular city, landlords who live elsewhere would be foreign
elements. In this case, renters would be missing elements.
Duplicates represent the third sampling problem. These are elements
who appear more than once on the sampling frame. For example, if I am a
researcher studying patient satisfaction with emergency room care, I may
potentially include the same patient more than once in my study. If the
patients are completing a patient satisfaction questionnaire, I need to make
sure that patients are aware that if they have completed the questionnaire
previously, they should not complete it again. If they complete it more that
once, their second set of data represents a duplicate.
No comments:
Post a Comment