INDEX
Explanations
references to actions involving measurement or evaluation related to entities and individuals
New Auto-Interp
Negative Logits
odor
-0.16
oden
-0.15
esc
-0.15
ovy
-0.15
ÅĦ
-0.14
ldr
-0.14
quin
-0.14
ffer
-0.14
singles
-0.14
tension
-0.14
POSITIVE LOGITS
holm
0.18
pis
0.15
Macron
0.14
uide
0.14
äl
0.14
pencil
0.13
apa
0.13
ergisi
0.13
zure
0.13
åıĬåħ¶
0.13
Activations Density 0.209%