INDEX
Explanations
references to individuals in political contexts
New Auto-Interp
Negative Logits
itsu
-0.18
elsey
-0.17
Canter
-0.17
çª
-0.17
itter
-0.17
ero
-0.17
itch
-0.17
tsky
-0.16
ahun
-0.16
utr
-0.16
POSITIVE LOGITS
tin
0.19
aryana
0.18
asm
0.17
ema
0.17
iral
0.17
UDA
0.17
iss
0.17
iran
0.17
uda
0.16
iday
0.16
Activations Density 0.027%