INDEX
Explanations
comparisons or contrasts between different situations or entities
situations that contrast expectations versus reality
New Auto-Interp
Negative Logits
Flavoring
-0.76
actionGroup
-0.73
Viol
-0.62
istics
-0.62
guiActiveUnfocused
-0.62
ACTIONS
-0.61
ZIP
-0.60
Leilan
-0.60
ario
-0.60
Absent
-0.59
POSITIVE LOGITS
feared
1.11
claimed
1.10
hoped
1.08
imagined
1.03
advertised
1.00
assumed
0.98
portrayed
0.94
implied
0.93
predicted
0.93
anticipated
0.92
Activations Density 0.195%