INDEX
Explanations
references to political matters and complex social issues
New Auto-Interp
Negative Logits
agency
-0.16
610
-0.15
([{-0.14
/address
-0.14
Attention
-0.14
Architect
-0.14
.defer
-0.13
ancestor
-0.13
obot
-0.13
AXB
-0.13
POSITIVE LOGITS
aff
0.83
Aff
0.79
aff
0.79
Aff
0.73
AFF
0.68
-aff
0.68
'aff
0.62
_aff
0.60
af
0.60
AFF
0.57
Activations Density 0.090%