ANNEX B.2 Evaluation materials for testing with end-users

From wiki.gpii
Jump to: navigation, search

Table 17: HF assessment specifics – End-users

High Level Evaluation Objective

Key Indicators

Evaluation techniques/Measuring ways

Measuring tools

Success targets/thresholds

User experience

Usability

Learnability

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Identification of at least 5 core aspects of learnability:

a)      Comprehension

b)      Recognition

c)       Retrieval

d)      Real presence

Questionnaires/Interviews

Quantitative/Subjective:

Global: System Usability Scale (SUS)

Specific: Tool/application explicit questions

Global & Specific: >70% Percentile for mature state of development work

Focus Groups

Qualitative/Subjective:  Open questions based on pre-defined topics/themes

Identification of major recurrent topics covering the needs of researchers (content analysis)

Efficiency/Effectiveness

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Quantitative/Objective:

Relevant measures:

  • task success
  • prompt frequency (in Facilitators diaries)

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

  • Success >75% of tasks completed
  • 70% of user successfully complete tasks
  • Lower number of assistance (e.g. number of help requests is lower than 10% of activity)
  • Self-reported (comments relevant to these two attributes/for more than 70% of users

Inter-data types reliability> .7 (above mediocre, if applicable)

Questionnaires/Interviews

Quantitative/Subjective: System Usability Scale and tool/application specific questions

SUS>75% percentile for mature version of tools, applications services

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform; mixed approach with open-and close-ended questionnaire items)

*Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

Efficiency : > 70% mature version

Effectiveness: > 75 % mature version

 

 

Focus Groups/Content analysis

Qualitative/Subjective:  Open questions – free discussion

Congruence in 4 out 5 topics/themes identified

Memorability

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Performance testing: task retrieval in 4 out of 5 tasks;

Task recognition for all 5 tasks

Usability testing/Questionnaires/Interviews

Quantitative/Subjective: Tool/application specific questions

Perceived attribute: <3 (on a five-point scale)

Focus Groups/Content analysis

Qualitative/Subjective:  Open questions – free discussion

Content analysis supports the increased memorability of tested applications or services (e.g. much higher number of words related to memorability than forgetfulness in coded themes and/or resulting word clouds)

Errors

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Quantitative/Objective:

relevant measures: errors

(in facilitators diaries)

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Quantity: Less than 5%

Quality: No major errors that result in task failure

Self-reporting of errors is less than 5% of overall activity

Inter-data types reliability> .7 (above mediocre, if applicable)

Focus Groups/Content analysis

Qualitative/Subjective:  Open questions – free discussion

At least one topic covering user performance related to error prone behaviour (e.g. frustration, mistakes, actual error reporting, why errors occurred, why they did not) with positive outcome by the majority of the group

Satisfaction

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Quantitative/Subjective: Task difficulty (in facilitators diaries)

Quantitative/Subjective: Close-ended questionnaire items (e.g. Likert scale)

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

70% of users positive comments

70% of users score 5 and above (7-Likert scale)

Usability testing/ Questionnaires/Interviews

Quantitative/Subjective:

Global: System Usability Scale (SUS)

Specific: tool/application explicit questions

SUS>75% percentile for mature version of tools, applications services

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

70% of users score 5 and above (7-Likert scale)

Focus Groups

Qualitative/Subjective:  Open questions – free discussion

At least one topic yields positive outcome by the majority of the group members

Confidence

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Quantitative/Objective: Relevant measures:

Task completion/task failure (indirect impact/only used for correlational analysis)

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Task completion of more than 80% of tasks by over 70% of users

Task failure is occurring in less than 5% of tasks for less than 10% of users

Content analysis results in high coded themes reflecting confidence (e.g. word cloud)

Questionnaires/Interviews

Quantitative/Subjective: System Usability Scale  and tool/application specific questions

Self-reported confidence above 5 (in 7-Likert scale) for over 70% of users

Positive correlation between subjective assessment and indirect measures (task completion/failure) >.7 (If and wherever applicable) should be investigated because confidence is an indicator affected by other factors

Field testing

Qualitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

*Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

Self-reported confidence above 5 (in 7-Likert scale) for over 70% of users

 

Focus Groups/Content analysis

Qualitative/Subjective:  Open questions – free discussion (identification of topics/themes)

Content analysis results in high coded themes reflecting confidence (e.g. word cloud)

Desirability

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Content analysis results in high coded themes reflecting hedonic topics/attributes (e.g. word cloud)

Questionnaires/Interviews

Quantitative/Subjective:

Global: System Usability Scale (SUS)

Specific: tool/application explicit questions

Self-reported  desirability above 5 (in 7-Likert scale) for over 70% of users

 

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

*Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

Self-reported confidence above 5 (in 7-Likert scale) for over 70% of users

 

Focus Groups/content analysis

Qualitative/Subjective:  Open questions – free discussion (identification of themes/topics)

Content analysis results in high coded themes reflecting good navigation attributes (e.g. word cloud)

Findability

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Content analysis results in high coded themes reflecting good navigation attributes (e.g. word cloud)

Questionnaires/Interviews

Quantitative/Subjective: Tool/application specific questions (close-ended)

Self-reported findability above 5 (in 7-Likert scale) for over 70% of users

 

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

Self-reported findability above 5 (in 7-Likert scale) for over 70% of users

 

Focus groups/content analysis

Qualitative/Subjective:  Open questions – free discussion (identification of topics/themes)

Content analysis results in high coded themes reflecting good navigation attributes (e.g. word cloud)

Accessibility

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Content analysis results in high coded themes reflecting good accessibility attributes (e.g. word cloud); cross-check with standards

Questionnaires/Interviews

Quantitative/Subjective: Tool/application specific questionnaire items

Self-reported accessibility above 5 (in 7-Likert scale) for over 70% of users

 

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

Self-reported accessibility above 5 (in 7-Likert scale) for over 70% of users

 

Focus Groups/content analysis

Qualitative/Subjective:  Open questions – free discussion (identification of topics/themes)

Content analysis results in high coded themes reflecting good accessibility attributes (e.g. word cloud); cross-check with standards

Credibility

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Content analysis results in high coded themes reflecting high credibility attributes (e.g. word cloud);

cross-tab with confidence and task completion (if applicable)

Questionnaires/Interviews

Quantitative/Subjective: Tool/application specific questionnaire items

Self-reported creditability above 5 (in 7-Likert scale) for over 70% of users

 

Field testing

Qualitative/Subjective & Qualitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

Self-reported creditability above 5 (in 7-Likert scale) for over 70% of users

 

Focus Groups/content analysis

Qualitative/Subjective:  Open questions – free discussion(topics/themes)

Content analysis results in high coded themes reflecting high credibility attributes (e.g. word cloud);

cross-tab with confidence and task completion (if applicable)

User Acceptance

 

 

Ease of use

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questionnaire items

Considerable reporting of easiness to handle and use (comments & facilitator reporting)

Lower required assistance (comments & facilitator reporting)

Questionnaires/Interviews

Quantitative/Subjective: TAM, SUS, Tool/application specific questionnaire items

70% of participants score above 5 in a 7-item Likert scale

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be probably put in force in the 2nd evaluation round, in addition to performance testing.

70% of participants score above 5 in a 7-item Likert scale

Usefulness

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

70% of participants score above 5 in a 7-item Likert scale

Questionnaires/Interviews

Quantitative/Subjective: TAM, Tool/application specific questions

70% of participants score above 5 in a 7-item Likert scale

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms*(in the context of the multi-sided platform)

  • Basically this will be put in force in the 3rd evaluation round, in addition to performance testing.

70% of participants score above 5 in a 7-item Likert scale

Attitudes towards use

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective: Think Aloud/Co-discovery Protocol/open questions

Content analysis results in high coded themes reflecting higher probability for future use (e.g. word cloud);

cross-tab with satisfaction, confidence,  and task completion (if applicable)

Questionnaires/Interviews

Qualitative/Subjective: TAM, Tool/application specific questions

70% of participants score above 5 in a 7-item Likert scale

Field testing

Qualitative/Subjective & Quantitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be put in force in the 2nd evaluation round, in addition to performance testing.

70% of participants score above 5 in a 7-item Likert scale

Intention of use

Performance testing combined with Naturalistic Observations and Contextual Inquiry

Qualitative/Subjective & Quantitative/Subjective: Think Aloud/Co-discovery Protocol/open questionnaire items

Content analysis results in high coded themes reflecting higher probability for future use (e.g. word cloud);

cross-tab with satisfaction, attitude towards use (above),  and task completion (if applicable)

Questionnaires/Interviews

Quantitative/Subjective: TAM, Tool/application specific questionnaire items (cost-benefit related, WTH/WTP)

70% of participants score above 5 in a 7-item Likert scale

Field testing

Qualitative/Subjective: Built –in on line feedback forms* (in the context of the multi-sided platform)

  • Basically this will be put in force in the 2nd evaluation round, in addition to performance testing.

70% of participants score above 5 in a 7-item Likert scale