INDEX
Explanations
references to physical actions involving hands or arms
New Auto-Interp
Negative Logits
erson
-0.18
oloj
-0.17
å¹
-0.17
orr
-0.15
ensburg
-0.15
ume
-0.14
trop
-0.14
eparator
-0.14
irm
-0.13
skins
-0.13
POSITIVE LOGITS
icer
0.19
vinc
0.17
pler
0.15
pole
0.15
å¾Ĵ
0.14
asio
0.14
numeral
0.14
Ø¢Ùħد
0.14
304
0.14
Sen
0.14
Activations Density 0.039%