INDEX
Explanations
discussions surrounding political controversies
New Auto-Interp
Negative Logits
bipartisan
-0.16
bias
-0.16
Bias
-0.15
ugo
-0.15
Bias
-0.15
biased
-0.15
.gov
-0.14
biased
-0.14
Paz
-0.14
ยà¸ģ
-0.14
POSITIVE LOGITS
party
0.22
internal
0.21
intra
0.20
åĨħéĥ¨
0.18
.internal
0.18
Party
0.17
internal
0.17
306
0.17
Party
0.17
party
0.16
Activations Density 0.138%