INDEX
Explanations
phrases indicating a shift or change in perspective or action
New Auto-Interp
Negative Logits
CFG
-0.15
urm
-0.14
enet
-0.14
ebo
-0.14
xon
-0.14
ittle
-0.14
enity
-0.14
AGO
-0.14
enia
-0.14
elm
-0.14
POSITIVE LOGITS
ÙĨØ´
0.15
ç¾
0.14
Dame
0.14
ordes
0.14
instead
0.14
((__
0.14
emez
0.13
oner
0.13
lobal
0.13
Instead
0.13
Activations Density 0.036%