INDEX
Explanations
references to nonpartisan entities or concepts
terms associated with neutrality and bipartisanship in political contexts
New Auto-Interp
Negative Logits
angers
-0.82
KT
-0.78
Stage
-0.76
Pause
-0.74
thur
-0.73
uman
-0.70
gars
-0.70
VL
-0.70
Pac
-0.70
Mist
-0.70
POSITIVE LOGITS
vernment
1.02
nonpartisan
0.88
impartial
0.86
watchdog
0.75
partisan
0.75
artisan
0.74
Congressional
0.72
sonian
0.72
observer
0.70
irection
0.68
Activations Density 0.015%