INDEX
Explanations
references to central locations or the concept of "center."
New Auto-Interp
Negative Logits
ARGIN
-0.17
enheim
-0.16
omu
-0.15
sid
-0.14
dar
-0.14
corner
-0.14
es
-0.14
orners
-0.14
unist
-0.14
ERCHANT
-0.14
POSITIVE LOGITS
most
0.23
line
0.22
pieces
0.20
央
0.19
-center
0.17
fold
0.17
/top
0.17
-most
0.17
-central
0.17
/end
0.17
Activations Density 0.039%