INDEX
Explanations
phrases indicating a state of perception or awareness
New Auto-Interp
Negative Logits
ëį°ìĿ´íĬ¸
-0.15
vice
-0.15
Vice
-0.15
.Display
-0.14
811
-0.14
Falcon
-0.14
Äĥr
-0.14
ØŃÙĪ
-0.14
ernet
-0.14
jes
-0.14
POSITIVE LOGITS
oulos
0.16
habi
0.15
-disable
0.15
uzey
0.14
gap
0.14
[of
0.13
arda
0.13
.dsl
0.13
__':č↵
0.13
adora
0.13
Activations Density 0.042%