INDEX
Explanations
phrases relating to capability and decision-making
New Auto-Interp
Negative Logits
lobal
-0.15
scramble
-0.14
ANE
-0.14
.
-0.14
d
-0.14
global
-0.13
TES
-0.13
Sle
-0.13
re
-0.13
(
-0.13
POSITIVE LOGITS
.inflate
0.15
bay
0.15
çĪ
0.15
Ñģа
0.14
ullo
0.14
Tomorrow
0.14
alk
0.14
ếp
0.14
اÙĪÛĮ
0.14
andro
0.13
Activations Density 0.001%