INDEX
Explanations
phrases related to controversy and opinion on societal issues
New Auto-Interp
Negative Logits
rok
-0.16
iland
-0.16
vt
-0.16
erton
-0.14
sheets
-0.14
Mickey
-0.13
fors
-0.13
Sink
-0.13
ud
-0.13
tem
-0.13
POSITIVE LOGITS
utherland
0.15
chick
0.14
withhold
0.14
kening
0.14
едак
0.14
Vak
0.14
ickt
0.14
emailer
0.14
Ñīина
0.14
çak
0.14
Activations Density 0.275%