INDEX
Explanations
negative emotions and themes of resistance
New Auto-Interp
Negative Logits
beyond
-0.21
Beyond
-0.20
enz
-0.19
illard
-0.17
eyond
-0.17
enough
-0.16
Beyond
-0.15
iset
-0.14
Unless
-0.14
iamond
-0.14
POSITIVE LOGITS
nor
0.35
Nor
0.29
nor
0.27
Nor
0.26
NOR
0.21
EITHER
0.19
either
0.18
Either
0.17
Either
0.17
Norris
0.17
Activations Density 0.274%