INDEX
Explanations
terms related to political or societal groups and their dynamics
New Auto-Interp
Negative Logits
.decorate
-0.17
erus
-0.16
заÑģ
-0.15
Klopp
-0.14
gnore
-0.14
au
-0.14
>",
-0.14
iked
-0.14
Friedrich
-0.14
aepernick
-0.14
POSITIVE LOGITS
isko
0.17
ettel
0.15
nda
0.15
ges
0.15
iral
0.15
/misc
0.14
zig
0.14
ploy
0.14
.RemoveAll
0.14
enga
0.14
Activations Density 0.010%