INDEX
Explanations
references to controversial historical figures and symbols
New Auto-Interp
Negative Logits
çģ
-0.14
íķ
-0.14
icas
-0.14
hus
-0.14
ighest
-0.14
antium
-0.13
ritis
-0.13
ãĥĨãĥ«
-0.13
RELEASE
-0.13
Ú©ÛĮÙĦ
-0.13
POSITIVE LOGITS
statue
0.34
statues
0.32
Confederate
0.30
symbols
0.29
symbol
0.26
Symbols
0.26
conf
0.26
monuments
0.25
symbols
0.24
removal
0.24
Activations Density 0.076%