Human Evaluation

Related Work Interactive Human Evaluation

Example Implementation Video.

You will compare two models’ generated Related Work sections over multiple rounds using a dedicated evaluation interface. Each round includes:

Initial Evaluation: Assess both models based on:
- Coherence of Citation Sentences: Are claims grounded in the cited papers? Example
- Positioning of the Main Paper: Does the draft clearly highlight the main paper’s contributions and novelty? Example
- Instruction Following: Has the model followed task instructions and incorporated feedback effectively?
Feedback Phase: Provide feedback to each model via chat panels. Each model will revise its draft based on your input.
Re-Evaluation: Compare the updated drafts and repeat the process for 3 rounds.

After the final round, click the Finish button to save your evaluations.

Interface Overview

Paper Information Panel: Displays the main paper and cited paper summaries.
Chat Panels: For interacting with each model and giving feedback.
Evaluation Panel: Used to select the better model and view automated checks (e.g., citation accuracy, length constraints).

Estimated Time: 25-30 minutes

After you provide your feedback to the models you need to wait until model completes the generation.

For detailed instructions please refer to this document.

Very rarely, there may be some errors or longer waiting times (no longer than 3 mins) for inference (or submission of evaluations) due to several user requests. In such cases, you can start from the beginning.

For any questions, feedback please contact with Furkan Sahinuc (furkan.sahinuc@tu-darmstadt.de) Thanks for your participation and patience. Click Begin to start.

Related Work Interactive Human Evaluation

Interface Overview

Consent Information