INDEX
Explanations
Polish nouns and adjectives related to culture and identity
New Auto-Interp
Negative Logits
eding
-0.16
ascar
-0.16
заÑıв
-0.15
ãĥ¼ãĥŃ
-0.15
abilia
-0.15
uggy
-0.14
ÑĩÑĥ
-0.14
patch
-0.14
Patch
-0.14
itung
-0.14
POSITIVE LOGITS
dre
0.26
kami
0.24
monument
0.21
mau
0.21
fas
0.21
monumental
0.20
mural
0.20
fres
0.20
oz
0.20
got
0.20
Activations Density 0.005%