INDEX
Explanations
emotions and feelings associated with experiences
New Auto-Interp
Negative Logits
udies
-0.17
leground
-0.16
bes
-0.16
inae
-0.16
ndx
-0.15
rik
-0.14
sbin
-0.14
ıģı
-0.14
ortal
-0.13
plorer
-0.13
POSITIVE LOGITS
having
0.31
being
0.28
hearing
0.28
knowing
0.28
having
0.23
seeing
0.22
watching
0.22
actually
0.21
realizing
0.21
being
0.21
Activations Density 0.124%