INDEX
Explanations
phrases related to physical struggle or frustration
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.12
3:0.04
4:0.26
5:0.03
6:0.04
7:0.14
8:0.04
9:0.05
10:0.10
11:0.10
Negative Logits
ngth
-1.77
ipment
-1.57
geries
-1.45
iencies
-1.42
ience
-1.41
iture
-1.39
issance
-1.38
indal
-1.38
utherford
-1.38
Planes
-1.38
POSITIVE LOGITS
forcefully
1.87
knowingly
1.64
boldly
1.61
vigorously
1.61
loudly
1.58
angrily
1.54
silently
1.52
cautiously
1.49
softly
1.48
passionately
1.47
Activations Density 0.061%