INDEX
Explanations
references to political ideologies and their implications
New Auto-Interp
Negative Logits
UMAN
-0.16
omet
-0.15
/preferences
-0.15
ë¥´ê³ł
-0.14
ISC
-0.13
άνÏī
-0.13
hape
-0.13
ulumi
-0.13
bsolute
-0.13
_CALLBACK
-0.13
POSITIVE LOGITS
anti
0.25
comb
0.22
pro
0.21
defensive
0.21
inflammatory
0.20
confront
0.20
particip
0.20
flammatory
0.20
protective
0.19
inclusive
0.19
Activations Density 0.332%