INDEX
Explanations
references to political affiliations and actions
New Auto-Interp
Negative Logits
lyn
-0.16
285
-0.16
OC
-0.14
IE
-0.14
Bark
-0.14
gia
-0.14
AT
-0.14
zer
-0.13
spr
-0.13
ames
-0.13
POSITIVE LOGITS
usch
0.17
fx
0.16
furt
0.15
Schwartz
0.15
addCriterion
0.15
rana
0.15
ERGE
0.14
istik
0.14
aje
0.14
.sax
0.14
Activations Density 0.869%