INDEX
Explanations
negatives or criticisms related to political figures or events
New Auto-Interp
Negative Logits
exceptions
-0.70
execut
-0.67
footnote
-0.67
exponent
-0.66
bree
-0.66
(>
-0.66
overd
-0.64
braces
-0.64
(<
-0.64
relapse
-0.63
POSITIVE LOGITS
friendly
1.19
themed
1.15
branded
1.09
owned
1.07
controlled
1.07
centric
1.06
related
1.02
aligned
1.02
induced
1.01
centered
1.01
Activations Density 0.026%