Warwick and Linninger (1975)
point out that there are two basic goals in questionnaire design.
1. To obtain information
relevant to the purposes of the survey.
2. To collect this information
with maximal reliability and validity.
How can a researcher be sure
that the data gathering instrument being used will measure what it is supposed
to measure and will do this in a consistent manner? This is a question that can
only be answered by examining the definitions for and methods of establishing
the validity and reliability of a research instrument. These two very important
aspects of research design will be discussed in this module.
Validity
Validity can be defined as the
degree to which a test measures what it is supposed to measure. There are three
basic approaches to the validity of tests and measures as shown by Mason and
Bramble (1989). These are content validity, construct validity, and criterion-related
validity.
Content Validity
This approach measures the
degree to which the test items represent the domain or universe of the trait or
property being measured. In order to establish the content validity of a
measuring instrument, the researcher must identify the overall content to be
represented. Items must then be randomly chosen from this content that will
accurately represent the information in all areas. By using this method the
researcher should obtain a group of items which is representative of the
content of the trait or property to be measured.
Identifying the universe of
content is not an easy task. It is, therefore, usually suggested that a panel
of experts in the field to be studied be used to identify a content area. For
example, in the case of researching the knowledge of teachers about a new
curriculum, a group of curriculum and teacher education experts might be asked
to identify the content of the test to be developed.
Construct Validity
Cronbach and Meehl (1955)
indicated that, "Construct validity must be investigated whenever no
criterion or universe of content is accepted as entirely adequate to define the
quality to be measured" as quoted by Carmines and Zeller (1979). The term
construct in this instance is defined as a property that is offered to explain
some aspect of human behavior, such as mechanical ability, intelligence, or
introversion (Van Dalen, 1979). The construct validity approach concerns the
degree to which the test measures the construct it was designed to measure.
There are two parts to the
evaluation of the construct validity of a test. First and most important, the
theory underlying the construct to be measured must be considered. Second the
adequacy of the test in measuring the construct is evaluated (Mason and
Bramble, 1989). For example, suppose that a researcher is interested in
measuring the introverted nature of first year teachers. The researcher defines
introverted as the overall lack of social skills such as conversing, meeting
and greeting people, and attending faculty social functions. This definition is
based upon the researcher’s own observations. A panel of experts is then asked
to evaluate this construct of introversion. The panel cannot agree that the
qualities pointed out by the researcher adequately define the construct of
introversion. Furthermore, the researcher cannot find evidence in the research
literature supporting the introversion construct as defined here. Using this
information, the validity of the construct itself can be questioned. In this
case the researcher must reformulate the previous definition of the construct.
Once the researcher has
developed a meaningful, useable construct, the adequacy of the test used to
measure it must be evaluated. First, data concerning the trait being measured
should be gathered and compared with data from the test being assessed. The
data from other sources should be similar or convergent. If convergence exists,
construct validity is supported.
After establishing convergence
the discriminate validity of the test must be determined. This involves
demonstrating that the construct can be differentiated from other constructs
that may be somewhat similar. In other words, the researcher must show that the
construct being measured is not the same as one that was measured under a
different name.
Criterion-Related
Validity
This approach is concerned with
detecting the presence or absence of one or more criteria considered to
represent traits or constructs of interest. One of the easiest ways to test for
criterion-related validity is to administer the instrument to a group that is
known to exhibit the trait to be measured. This group may be identified by a
panel of experts. A wide range of items should be developed for the test with
invalid questions culled after the control group has taken the test. Items
should be omitted that are drastically inconsistent with respect to the
responses made among individual members of the group. If the researcher has
developed quality items for the instrument, the culling process should leave
only those items that will consistently measure the trait or construct being
studied. For example, suppose one wanted to develop an instrument that would
identify teachers who are good at dealing with abused children. First, a panel
of unbiased experts identifies 100 teachers out of a larger group that they
judge to be best at handling abused children. The researcher develops 400
yes/no items that will be administered to the whole group of teachers,
including those identified by the experts. The responses are analyzed and the
items to which the expert identified teachers and other teachers responding
differently are seen as those questions that will identify teachers who are
good at dealing with abused children.
Reliability
The reliability of a research
instrument concerns the extent to which the instrument yields the same results
on repeated trials. Although unreliability is always present to a certain
extent, there will generally be a good deal of consistency in the results of a
quality instrument gathered at different times. The tendency toward consistency
found in repeated measurements is referred to as reliability (Carmines &
Zeller, 1979).
In scientific research, accuracy
in measurement is of great importance. Scientific research normally measures
physical attributes which can easily be assigned a precise value. Many times
numerical assessments of the mental attributes of human beings are accepted as
readily as numerical assessments of their physical attributes. Although we may
understand that the values assigned to mental attributes can never be
completely precise, the imprecision is often looked upon as being too small to
be of any practical concern. However, the magnitude of the imprecision is much
greater in the measurement of mental attributes than in that of physical
attributes. This fact makes it very important that the researcher in the social
sciences and humanities determine the reliability of the data gathering
instrument to be used (Willmott & Nuttall, 1975).
Retest Method
One of the easiest ways to
determine the reliability of empirical measurements is by the retest method in
which the same test is given to the same people after a period of time. The
reliability of the test (instrument) can be estimated by examining the
consistency of the responses between the two tests.
If the researcher obtains the
same results on the two administrations of the instrument, then the reliability
coefficient will be 1.00. Normally, the correlation of measurements across time
will be less than perfect due to different experiences and attitudes that
respondents have encountered from the time of the first test.
The test-retest method is a
simple, clear cut way to determine reliability, but it can be costly and
impractical. Researchers are often only able to obtain measurements at a single
point in time or do not have the resources for multiple administration.
Alternative Form
Method
Like the retest method, this
method also requires two testings with the same people. However, the same test
is not given each time. Each of the two tests must be designed to measure the
same thing and should not differ in any systematic way. One way to help ensure
this is to use random procedures to select items for the different tests.
The alternative form method is
viewed as superior to the retest method because a respondent’s memory of test
items is not as likely to play a role in the data received. One drawback of
this method is the practical difficulty in developing test items that are
consistent in the measurement of a specific phenomenon.
Split-Halves Method
This method is more practical in
that it does not require two administrations of the same or an alternative form
test. In the split-halves method, the total number of items is divided into
halves, and a correlation taken between the two halves. This correlation only
estimates the reliability of each half of the test. It is necessary then to use
a statistical correction to estimate the reliability of the whole test. This
correction is known as the Spearman-Brown prophecy formula (Carmines &
Zeller, 1979)
Pxx" = 2Pxx'/1+Pxx'
where Pxx" is the
reliability coefficient for the whole test and Pxx' is the split-half
correlation.
Example
If the correlation between the
halves is .75, the reliability for the total test is:
Pxx" = [(2) (.75)]/(1 +
.75) = 1.5/1.75 = .857
There are many ways to divide
the items in an instrument into halves. The most typical way is to assign the
odd numbered items to one half and the even numbered items to the other half of
the test. One drawback of the split-halves method is that the correlation
between the two halves is dependent upon the method used to divide the items.
Internal Consistency
Method
This method requires neither the
splitting of items into halves nor the multiple administration of instruments.
The internal consistency method provides a unique estimate of reliability for
the given test administration. The most popular internal consistency
reliability estimate is given by Cronbach’s alpha. It is expressed as follows:
where N equals the number of
items; equals the sum of item variance and
equals the variance of the total
composite.
If one is using the correlation
matrix rather than the variance-covariance matrix then alpha reduces to the
following:
alpha = Np/[1+p(N-1)]
where N equals the number of
items and p equals the mean interitem correlation.
Example
The average intercorrelation of
a six item scale is .5, then the alpha for the scale would be:
alpha = 6(.5)/[1+.5(6-1)]
= 3/3.5 = .857
An example of how alpha can be
calculated can be given by using the 10 item self-esteem scale developed by
Rosenberg (1965). (See table) The 45 correlations in the table are first
summed: .185+.451+.048+ . . . + .233= 14.487. Then the mean interitem
correlation is found by dividing this sum by 45: 14.487/45= .32. Now use this
number to calculate alpha:
alpha = 10(.32)/[1+.32(10-1)]
= 3.20/3.88
= .802
The coefficient alpha is an
internal consistency index designed for use with tests containing items that
have no right answer. This is a very useful tool in educational and social
science research because instruments in these areas often ask respondents to
rate the degree to which they agree or disagree with a statement on a
particular scale.
Cronbach’s Alpha
Example
Questions
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
1
|
2
|
2
|
2
|
3
|
4
|
5
|
2
|
1
|
2
|
4
|
2
|
1
|
1
|
2
|
4
|
5
|
5
|
1
|
2
|
2
|
2
|
3
|
1
|
2
|
2
|
5
|
5
|
4
|
1
|
2
|
2
|
1
|
4
|
3
|
2
|
2
|
2
|
1
|
3
|
2
|
2
|
2
|
2
|
5
|
5
|
5
|
5
|
4
|
4
|
3
|
3
|
2
|
3
|
4
|
6
|
1
|
1
|
1
|
1
|
5
|
1
|
1
|
1
|
1
|
1
|
7
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
8
|
2
|
1
|
2
|
2
|
4
|
1
|
3
|
3
|
1
|
1
|
9
|
5
|
5
|
1
|
1
|
1
|
2
|
1
|
2
|
5
|
4
|
10
|
4
|
3
|
3
|
3
|
1
|
2
|
1
|
1
|
3
|
4
|
|
|
|
|
|
|
|
|
|
|
|
N
|
10
|
10
|
10
|
10
|
10
|
10
|
10
|
10
|
10
|
10
|
|
26
|
24
|
22
|
27
|
32
|
27
|
17
|
18
|
23
|
25
|
|
2.6
|
2.4
|
2.2
|
2.7
|
3.2
|
2.8
|
1.7
|
1.8
|
2.3
|
2.5
|
|
90
|
78
|
60
|
89
|
130
|
98
|
35
|
36
|
65
|
79
|
|
22.4
|
20.4
|
11.6
|
16.1
|
27.6
|
19.6
|
6.1
|
3.6
|
12.1
|
16.5
|
S2
|
2.5
|
2.3
|
1.3
|
1.8
|
3.1
|
2.2
|
.68
|
.4
|
1.3
|
1.8
|
= .917 .467 .337 .455 .014 -.146 .512 -.06 .74
=
P
= Mean Interitem Correlation
alpha
= Np / [1+p(N-1)]
=
(10)(.36)/[1+.36(10-1)]
=
3.6/4.24
=.849
SELF ASSESSMENT
1.
Name the three types of validity.
2.
Name four ways to establish the reliability of an instrument.
No comments:
Post a Comment