INDEX
Explanations
references to media outlets or individuals associated with controversial or right-wing views
New Auto-Interp
Negative Logits
AI
-0.17
Fir
-0.17
ysi
-0.15
imer
-0.14
Rog
-0.14
Marc
-0.14
LR
-0.14
Hidden
-0.14
Huntington
-0.14
ivil
-0.14
POSITIVE LOGITS
Vectorizer
0.16
UIT
0.15
AMIL
0.15
æģ¯
0.15
-Ta
0.15
bek
0.14
ationToken
0.14
_Params
0.14
.SuspendLayout
0.14
uits
0.14
Activations Density 0.002%