INDEX
Explanations
phrases indicating location or presence within sentences
New Auto-Interp
Negative Logits
-0.17
iki
-0.16
eson
-0.16
ly
-0.16
ai
-0.16
elia
-0.16
allet
-0.15
aring
-0.15
per
-0.15
lo
-0.15
POSITIVE LOGITS
voor
0.19
least
0.19
assis
0.17
ENA
0.16
orthand
0.15
azzo
0.14
inati
0.14
664
0.14
oldown
0.14
onet
0.14
Activations Density 0.009%