INDEX
Explanations
mentions of the city of Warsaw
New Auto-Interp
Negative Logits
ös
-0.16
øre
-0.16
reon
-0.16
ึà¸ĩ
-0.15
Lion
-0.15
umbed
-0.15
teness
-0.15
enet
-0.15
cloak
-0.15
reo
-0.14
POSITIVE LOGITS
Duty
0.18
aw
0.18
awa
0.17
hausen
0.16
duty
0.16
Aw
0.16
Aw
0.16
cly
0.15
mam
0.15
mann
0.15
Activations Density 0.007%