INDEX
Explanations
instances of the word "nor" indicating negation or contrast
New Auto-Interp
Negative Logits
McCart
-0.16
rchive
-0.15
enko
-0.15
ing
-0.15
urs
-0.15
agle
-0.14
392
-0.14
itious
-0.14
sch
-0.14
ric
-0.14
POSITIVE LOGITS
tamp
0.17
theless
0.17
shalt
0.16
deen
0.16
lify
0.16
licht
0.15
ãĤīãģĦ
0.15
thern
0.15
лади
0.15
thing
0.15
Activations Density 0.022%