INDEX
Explanations
punching, kicking, passing, cooking, gardening
New Auto-Interp
Negative Logits
anik
-0.09
coun
-0.09
Wen
-0.09
Us
-0.09
...
-0.09
switching
-0.08
jog
-0.08
withholding
-0.08
impr
-0.08
iner
-0.08
POSITIVE LOGITS
iscing
0.14
angu
0.12
oning
0.11
anch
0.11
asing
0.11
arsing
0.11
ÅĪovánÃŃ
0.11
lawy
0.10
arching
0.10
merch
0.10
Activations Density 0.117%