INDEX
Explanations
references to political positions or roles, especially those related to opposition or alternative leadership
references to "shadow" roles or positions in political contexts
New Auto-Interp
Negative Logits
urses
-0.79
ickr
-0.78
ktop
-0.72
keye
-0.72
artney
-0.72
renheit
-0.70
anchester
-0.69
aii
-0.69
OPLE
-0.68
TAIN
-0.68
POSITIVE LOGITS
boxing
1.01
moon
1.01
loo
0.95
runners
0.88
fax
0.84
flame
0.78
fell
0.77
shadow
0.76
wra
0.76
runner
0.76
Activations Density 0.044%