INDEX
Explanations
expressions of surprise or unanticipated experiences
New Auto-Interp
Negative Logits
pillar
-0.17
ambi
-0.15
eah
-0.15
aleb
-0.15
istingu
-0.14
emek
-0.14
ocker
-0.14
κολ
-0.14
oyal
-0.13
Trust
-0.13
POSITIVE LOGITS
otherwise
0.29
dream
0.28
dreamed
0.25
Dream
0.24
previously
0.24
dream
0.24
even
0.23
otherwise
0.23
Dream
0.22
梦
0.22
Activations Density 0.104%