INDEX
Explanations
references to Poland or Polish culture
New Auto-Interp
Negative Logits
orial
-0.15
icap
-0.15
iles
-0.15
esen
-0.14
anker
-0.14
azine
-0.14
editable
-0.14
ieval
-0.14
ownt
-0.13
hdl
-0.13
POSITIVE LOGITS
olu
0.20
elik
0.17
enta
0.17
ych
0.16
itical
0.16
itics
0.16
r
0.15
mav
0.15
uter
0.15
indr
0.15
Activations Density 0.024%