INDEX
Explanations
references to specific individuals, particularly those involved in politics or public affairs
New Auto-Interp
Negative Logits
o
-0.20
er
-0.20
opt
-0.16
HandlerContext
-0.16
consenting
-0.16
ney
-0.15
ois
-0.15
owi
-0.15
ores
-0.14
oxy
-0.14
POSITIVE LOGITS
ipeg
0.23
sylvania
0.19
edy
0.18
igans
0.17
ovation
0.17
nun
0.17
igan
0.16
ery
0.16
lw
0.16
elho
0.16
Activations Density 0.014%