INDEX
Explanations
parentheses in the text
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.19
3:0.05
4:0.06
5:0.03
6:0.04
7:0.16
8:0.04
9:0.04
10:0.11
11:0.18
Negative Logits
astered
-1.92
alon
-1.73
hyde
-1.63
ogun
-1.60
olics
-1.58
joy
-1.57
itia
-1.56
hner
-1.53
livion
-1.52
bern
-1.50
POSITIVE LOGITS
ACTION
1.72
rab
1.71
Fighting
1.65
Range
1.59
urnal
1.59
Poké
1.56
href
1.54
ambush
1.53
Brave
1.52
Hidden
1.50
Activations Density 0.001%