INDEX
Explanations
phrases indicating problems or challenges in various contexts
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.08
3:0.45
4:0.03
5:0.07
6:0.02
7:0.05
8:0.03
9:0.01
10:0.12
11:0.02
Negative Logits
united
-2.52
Peace
-2.51
united
-2.42
Nobel
-2.35
Emmanuel
-2.34
Liberation
-2.29
�
-2.16
Peace
-2.13
�
-2.13
Together
-2.13
POSITIVE LOGITS
cumbers
4.04
annoying
3.72
sloppy
3.70
distractions
3.65
clutter
3.62
glitches
3.52
awkward
3.51
tedious
3.50
delays
3.48
confusion
3.43
Activations Density 1.525%