INDEX
Explanations
specific historical or cultural references, particularly in Polish context
New Auto-Interp
Negative Logits
contres
-0.17
serter
-0.15
esda
-0.14
PLIC
-0.14
sert
-0.14
IFIC
-0.14
udic
-0.14
estic
-0.14
าà¸ĩ
-0.13
thá»Ŀ
-0.13
POSITIVE LOGITS
анк
0.14
conflict
0.14
926
0.14
adden
0.14
Conflict
0.14
829
0.14
ittel
0.14
uten
0.14
ocene
0.14
dumps
0.14
Activations Density 0.053%