Week 8 - Game Evaluation Tech Challenges

March 4, 20256 minutes

Capoo Quantitative Evaluation Report

Workload and Usability Analysis of Difficulty Levels L1 and L2

Abstract

This report evaluates the workload and usability of the platformer puzzle game “Capoo” at two difficulty levels (L1 and L2) using NASA TLX and SUS. Ten classmates participated in the test, and data were analyzed using the Wilcoxon signed-rank test. Results indicate that L2 had a slightly higher workload than L1 (28.75 vs 24.58), but the difference was not significant (W = 36, p > 0.05). Similarly, L2 had a lower SUS score than L1 (43.5 vs 45.5), but the difference was also not significant (W = 24, p > 0.05). This suggests that the difficulty variation in Capoo has a limited impact on user experience.

Introduction

Objective

This report aims to evaluate the user experience of the platformer puzzle game “Capoo” at low difficulty (L1) and high difficulty (L2) using quantitative methods, comparing workload and usability differences.

Background

NASA TLX is a tool for measuring subjective workload across six dimensions (Hart & Staveland, 1988). SUS is a reliable usability assessment tool (Brooke, 1986). This study combines both methods to analyze the impact of Capoo’s difficulty on player experience.

Goals

  • Quantify the workload (NASA TLX) and usability (SUS) of Capoo at L1 and L2.
  • Use statistical tests to determine the significance of differences.

Methodology

Participants

  • Number: 10 volunteers.
  • Characteristics: Classmates with no specific gaming experience requirements.
  • Selection Method: Random recruitment.

Experimental Design

  • Difficulty Levels: Capoo includes L1 (low difficulty) and L2 (high difficulty).
  • Testing Order:
    • 5 users played L1 first, then L2.
    • 5 users played L2 first, then L1.
    • This minimizes learning effects.

Data Collection

  • Tools:
    • NASA TLX: 6 dimensions (raw scores).
    • SUS: 10-question survey.
  • Procedure: Each user played one difficulty level and then filled out the NASA TLX and SUS forms, resulting in four scores per participant.

Scoring Method

  • NASA TLX: Dimension score = (Rating - 1) × 25, Total score = (∑ Dimension scores) / 6.
  • SUS: Odd-numbered questions = Rating - 1, Even-numbered questions = 5 - Rating, Total score = (∑ Score contributions) × 2.5.

Data Analysis

Results

Data Overview

User IDL1 NASA TLXL2 NASA TLXL1 SUSL2 SUS
V112.516.675550
V220.8320.834535
V329.1733.335555
V420.83254542.5
V529.1729.1752.550
V637.541.6737.540
V733.3337.542.545
V88.3316.6737.540
V937.545.833535
V1016.6720.835542.5
  • Averages:
    • L1 NASA TLX: 24.58, L2 NASA TLX: 28.75.
    • L1 SUS: 45.5, L2 SUS: 43.5.

Statistical Analysis

  • NASA TLX:
    • Wilcoxon test result: W = 36 (n=8, excluding zero values).
    • Critical value (n=8, α=0.05): 3.
    • Conclusion: W > 3, no significant difference.
  • SUS:
    • Wilcoxon test result: W = 24 (n=8, excluding zero values).
    • Critical value (n=8, α=0.05): 3.
    • Conclusion: W > 3, no significant difference.

Discussion

Interpretation of Results

  • Workload: L2 NASA TLX (28.75) is slightly higher than L1 (24.58), mainly due to increased physical demands (e.g., jumping) and mental effort (e.g., puzzle complexity), but the difference is not significant.
  • Usability: L1 SUS (45.5) is slightly higher than L2 (43.5), suggesting that the increased difficulty of L2 slightly reduced perceived usability, but not significantly.

Comparison with Expectations

  • It was expected that L2 would have a higher workload and lower usability. The observed trend aligns with expectations but does not reach statistical significance, possibly due to insufficient difficulty differences.

Design Insights

  • Increase the difficulty of L2 by making jumps and puzzles more challenging to amplify workload differences.
  • Optimize L2’s control smoothness (SUS Q1 and Q6 had lower scores) to reduce inconsistencies.

Limitations

  • Small sample size (10 participants) limits statistical power.
  • The difficulty difference between L1 and L2 may not be significant enough to fully reflect puzzle and platforming challenges.

Conclusion

Capoo’s L2 workload is slightly higher than L1 (28.75 vs 24.58), and its SUS score is lower than L1 (43.5 vs 45.5), but neither difference is statistically significant (NASA TLX W = 36, SUS W = 24, p > 0.05). It is recommended to enhance L2’s difficulty and optimize control experience to improve player immersion.

Appendix

Game Design Updates

  1. Enhance L2 Jumping Difficulty: Increase platform height and introduce moving obstacles to heighten physical demand.
  2. Optimize Puzzle Consistency: Standardize puzzle hint styles to improve SUS Q4 (consistency) scores.
  3. Implement Dynamic Difficulty Adjustment: Adjust jumping and puzzle complexity based on player performance.

Raw Data

L1 NASA TLX

DimensionV1V2V3V4V5V6V7V8V9V10
Mental Demand2322123331
Physical Demand2133233133
Temporal Demand1132312132
Performance1111331112
Effort1312233131
Frustration2231232121

L2 NASA TLX

DimensionV1V2V3V4V5V6V7V8V9V10
Mental Demand2322123331
Physical Demand2134243244
Temporal Demand1132312132
Performance1111331112
Effort2322234241
Frustration2231232121

L1 SUS

QuestionsV1V2V3V4V5V6V7V8V9V10
1. The system is easy to use.3334233332
2. The system components are well-coordinated.3224343333
3.Most people can quickly learn to use the system.3222433423
4. The system has too many inconsistencies.3222223434
5. Does not require much assistance.3242334233
6. Encountered many difficulties while using the system.3324243342
7. Felt confident while using the system.4233224324
8. Requires learning many things before starting.3322234442

L2 SUS

QuestionsV1V2V3V4V5V6V7V8V9V10
1.The system is easy to use.2223122221
2. The system components are well-coordinated.2113232222
3. Most people can quickly learn to use the system.2111322312
4. The system has too many inconsistencies.2111112323
5. Does not require much assistance.2131223122
6. Encountered many difficulties while using the system.2213132231
7. Felt confident while using the system.3122113213
8. Requires learning many things before starting.2211123331