INDEX
Explanations
elements related to Slovenian culture or language
New Auto-Interp
Negative Logits
ovah
-0.19
ecz
-0.16
úÄįin
-0.15
ke
-0.15
kj
-0.14
ky
-0.14
elin
-0.14
alc
-0.14
ôm
-0.14
emb
-0.14
POSITIVE LOGITS
vog
0.18
uchen
0.17
podpor
0.16
last
0.15
znam
0.15
neob
0.15
URING
0.14
pode
0.14
stav
0.14
_iters
0.14
Activations Density 0.008%