INDEX
Explanations
mentions of political partisanship
references to partisanship and partisan issues
New Auto-Interp
Negative Logits
uras
-0.84
ternally
-0.84
uran
-0.82
ofi
-0.81
ivia
-0.78
aryn
-0.76
eon
-0.76
iva
-0.75
ivas
-0.73
ea
-0.72
POSITIVE LOGITS
partisans
1.03
partisan
1.03
affiliation
0.84
psy
0.83
persuasion
0.78
affili
0.76
political
0.75
strugg
0.75
aggregation
0.74
propaganda
0.74
Activations Density 0.009%