Comparison of test scores of the same person taking the same test at different times

The test manual will usually provide detailed descriptions of the norm groups and the test norms. To ensure valid scores and meaningful interpretation of norm-referenced tests, make sure that your target group is similar to the norm group. Compare the educational level, the occupational, language and cultural backgrounds, and other demographic characteristics of the individuals making up the two groups to determine their similarity.

For example, consider an accounting knowledge test that was standardized on the scores obtained by employed accountants with at least 5 years of experience. This would be an appropriate test if you are interested in hiring experienced accountants. However, this test would be inappropriate if you are looking for an accounting clerk. You should look for a test normed on accounting clerks or a closely related occupation.

Criterion-referenced test interpretation. In criterion-referenced tests, the test score indicates the amount of skill or knowledge the test taker possesses in a particular subject or content area. The test score is not used to indicate how well the person does compared to others; it relates solely to the test taker's degree of competence in the specific area assessed. Criterion-referenced assessment is generally associated with educational and achievement testing, licensing, and certification.

A particular test score is generally chosen as the minimum acceptable level of competence. How is a level of competence chosen? The test publisher may develop a mechanism that converts test scores into proficiency standards, or the company may use its own experience to relate test scores to competence standards.

For example, suppose your company needs clerical staff with word processing proficiency. The test publisher may provide you with a conversion table relating word processing skill to various levels of proficiency, or your own experience with current clerical employees can help you to determine the passing score. You may decide that a minimum of 35 words per minute with no more than two errors per 100 words is sufficient for a job with occasional word processing duties. If you have a job with high production demands, you may wish to set the minimum at 75 words per minute with no more than 1 error per 100 words.

Principle of Assessment: Ensure that scores are interpreted properly.

It is important to ensure that all inferences you make on the basis of test results are well founded. Only use tests for which sufficient information is available to guide and support score interpretation. Read the test manual for instructions on how to properly interpret the test results. This leads to the next principle of assessment.

Interpreting test results

Test results are usually presented in terms of numerical scores, such as raw scores, standard scores, and percentile scores. In order to interpret test scores properly, you need to understand the scoring system used.

Types of scores
- Raw scores. These refer to the unadjusted scores on the test. Usually the raw score represents the number of items answered correctly, as in mental ability or achievement tests. Some types of assessment tools, such as work value inventories and personality inventories, have no "right" or "wrong" answers. In such cases, the raw score may represent the number of positive responses for a particular trait. Raw scores do not provide much useful information. Consider a test taker who gets 25 out of 50 questions correct on a math test. It's hard to know whether "25" is a good score or a poor score. When you compare the results to all the other individuals who took the same test, you may discover that this was the highest score on the test. In general, for norm-referenced tests, it is important to see where a particular score lies within the context of the scores of other people. Adjusting or converting raw scores into standard scores or percentiles will provide you with this kind of information. For criterion-referenced tests, it is important to see what a particular score indicates about proficiency or competence.
- Standard scores.Standard scores are converted raw scores. They indicate where a person's score lies in comparison to a reference group. For example, if the test manual indicates that the average or mean score for the group on a test is 50, then an individual who gets a higher score is above average, and an individual who gets a lower score is below average. Standard scores are discussed in more detail below in the section on standard score distributions.
- Percentile scores.A percentile score is another type of converted score. An individual's raw score is converted to a number indicating the percent of people in the reference group who scored below the test taker. For example, a score at the 70th percentile means that the individual's score is the same as or higher than the scores of 70% of those who took the test. The 50th percentile is known as the median and represents the middle score of the distribution.
Score distribution
- Normal curve A great many human characteristics, such as height, weight, math ability, and typing skill, are distributed in the population at large in a typical pattern. This pattern of distribution is known as the normal curve and has a symmetrical bell-shaped appearance. The curve is illustrated in Figure 2. As you can see, a large number of individual cases cluster in the middle of the curve. The farther from the middle or average you go, the fewer the cases. In general, distributions of test scores follow the same normal curve pattern. Most individuals get scores in the middle range. As the extremes are approached, fewer and fewer cases exist, indicating that progressively fewer individuals get low scores (left of center) and high scores (right of center).
- Standard score distribution. There are two characteristics of a standard score distribution that are reported in test manuals. One is the mean, a measure of central tendency; the other is the standard deviation, a measure of the variability of the distribution.
  - Mean The most commonly used measure of central tendency is the mean or arithmetic average score. Test developers generally assign an arbitrary number to represent the mean standard score when they convert from raw scores to standard scores. Look at Figure 2. Test A and Test B are two tests with different standard score means. Notice that Test A has a mean of 100 and Test B has a mean of 50. If an individual got a score of 50 on Test A, that person did very poorly. However, a score of 50 on Test B would be an average score.
  - Standard deviation. The standard deviation is the most commonly used measure of variability. It is used to describe the distribution of scores around the mean. Figure 2 shows the percent of cases 1, 2, and 3 standard deviations (sd) above the mean and 1, 2, and 3 standard deviations below the mean. As you can see, 34% of the cases lie between the mean and +1 sd, and 34% of the cases lie between the mean and -1 sd. Thus, approximately 68% of the cases lie between -1 and +1 standard deviations. Notice that for Test A, the standard deviation is 20, and 68% of the test takers score between 80 and 120. For Test B the standard deviation is 10, and 68% of the test takers score between 40 and 60.
- Percentile distribution. The bottom horizontal line below the curve in Figure 2 is labeled "Percentiles." It represents the distribution of scores in percentile units. Notice that the median is in the same position as the mean on the normal curve. By knowing the percentile score of an individual, you already know how that individual compares with others in the group. An individual at the 98th percentile scored the same or better than 98% of the individuals in the group. This is equivalent to getting a standard score of 140 on Test A or 70 on Test B.

Processing test results to make employment decisions-rank-ordering and cut-off scores

The rank-ordering of test results, the use of cut-off scores, or some combination of the two is commonly used to assess the qualifications of people and to make employment-related decisions about them. These are described below.

Rank-ordering is a process of arranging candidates on a list from highest score to lowest score based on their test results. In rank-order selection, candidates are chosen on a top-down basis.

A cut-off score is the minimum score that a candidate must have to qualify for a position. Employers generally set the cut-off score at a level which they determine is directly related to job success. Candidates who score below this cut-off generally are not considered for selection. Test publishers typically recommend that employers base their selection of a cut-off score on the norms of the test.

Combining information from many assessment tools

Many assessment programs use a variety of tests and procedures in their assessment of candidates. In general, you can use a "multiple hurdles" approach or a "total assessment" approach, or a combination of the two, in using the assessment information obtained.

Multiple hurdles approach. In this approach, test takers must pass each test or procedure (usually by scoring above a cut-off score) to continue within the assessment process. The multiple hurdles approach is appropriate and necessary in certain situations, such as requiring test takers to pass a series of tests for licensing or certification, or requiring all workers in a nuclear power plant to pass a safety test. It may also be used to reduce the total cost of assessment by administering less costly screening devices to everyone, but having only those who do well take the more expensive tests or other assessment tools.
Total assessment approach. In this approach, test takers are administered every test and procedure in the assessment program. The information gathered is used in a flexible or counterbalanced manner. This allows a high score on one test to be counterbalanced with a low score on another. For example, an applicant who performs poorly on a written test, but shows great enthusiasm for learning and is a very hard worker, may still be an attractive hire.

A key decision in using the total assessment approach is determining the relative weights to assign to each assessment instrument in the program.

Figure 3 is a simple example of how assessment results from several tests and procedures can be combined to generate a weighted composite score.

Which type of test gives the same score when different people score it?

A norm-referenced test is a type of standardized test (that is, a test that is identical for every test-taker). After the items on a norm-referenced test are scored, the scores are compared to those of a comparison group, or norming group.

What is the first modern intelligence test?

For this reason, Binet developed the first modern intelligence test, the Binet-Simon Scale, in 1905. His intelligence scales have since inspired and influenced the development of intelligence tests across America and Europe [3, 7].

What are the two most widely used intelligence tests?

The most widely used intelligence tests include the Stanford-Binet Intelligence Scale and the Wechsler scales.

What are the main purposes of intelligence testing?

Intelligence tests are widely assumed to measure maximal intellectual performance, and predictive associations between intelligence quotient (IQ) scores and later-life outcomes are typically interpreted as unbiased estimates of the effect of intellectual ability on academic, professional, and social life outcomes.

Comparison of test scores of the same person taking the same test at different times

Interpreting test results

Processing test results to make employment decisions-rank-ordering and cut-off scores

Combining information from many assessment tools

Which type of test gives the same score when different people score it?

What is the first modern intelligence test?

What are the two most widely used intelligence tests?

What are the main purposes of intelligence testing?

zusammenhängende Posts

What is true about the paired-comparison method of measuring employee performance?

Which theory argues that people evaluate themselves by comparison with similar others when they are uncertain of their abilities or opinions?

The comparison of oneself to others in ways that raise one’s self-esteem is called

Which of the following is an accurate comparison of the arguments made in Federalist 10 and Brutus 1?

Which of the following is not one of the suggested modifications of the paired comparison technique?

Which term refers to a comparison between two things to highlight a point of similarity?

Which of these types of products usually involves the customer doing comparison shopping?

Document analysis is the examination and comparison of questioned documents with known material.

Why tropical communities have greater species diversity in comparison to temperature or polar communities?

What is the advantage of the web of causation model in comparison with the epidemiological triangle model?

Toplist

Neuester Beitrag

Stichworte