INDEX
Explanations
references to political figures and their affiliations
New Auto-Interp
Negative Logits
zes
-0.15
uet
-0.15
humor
-0.14
ucz
-0.14
iske
-0.14
utsch
-0.14
ôt
-0.14
pole
-0.14
ursal
-0.13
ius
-0.13
POSITIVE LOGITS
opyright
0.15
çī
0.14
èĩ
0.14
à¹Ģล
0.14
clich
0.14
edn
0.14
roid
0.13
Surre
0.13
olen
0.13
аÑĤÑĸ
0.13
Activations Density 0.022%