INDEX
Explanations
references to physical or metaphorical 'hands' and their associated actions or consequences
New Auto-Interp
Negative Logits
ÙIJب
-0.17
onaut
-0.16
brero
-0.15
936
-0.15
ovo
-0.15
797
-0.14
isch
-0.14
ovit
-0.14
ida
-0.14
UPS
-0.14
POSITIVE LOGITS
prote
0.15
ÙĨÙħاÛĮ
0.14
arlar
0.14
sibling
0.14
axy
0.14
ÑĢой
0.14
èŃī
0.14
елÑĮзÑı
0.14
koy
0.13
WE
0.13
Activations Density 0.141%