INDEX
Explanations
Polish words associated with social and collective themes
New Auto-Interp
Negative Logits
overl
-0.16
vester
-0.16
besides
-0.15
iky
-0.15
Class
-0.15
erv
-0.15
lav
-0.14
GY
-0.14
andle
-0.14
Bav
-0.14
POSITIVE LOGITS
że
0.21
nie
0.21
ujÄħ
0.21
jÄĻ
0.20
ÅĤ
0.20
acz
0.20
jÄħ
0.20
ów
0.19
ż
0.19
pow
0.19
Activations Density 0.315%