INDEX
Explanations
expressions of surprise or the notion of unexpected experiences
New Auto-Interp
Negative Logits
ambi
-0.15
Actual
-0.15
oyal
-0.14
cott
-0.14
Said
-0.14
pillar
-0.14
κολ
-0.14
ÙĨداÙĨ
-0.13
emek
-0.13
ocker
-0.13
POSITIVE LOGITS
otherwise
0.36
previously
0.34
otherwise
0.29
elsewhere
0.27
dream
0.26
Previously
0.25
Otherwise
0.24
dreamed
0.24
Otherwise
0.23
Dream
0.22
Activations Density 0.091%