INDEX
Explanations
emotional responses or sentiments towards experiences and relationships
New Auto-Interp
Negative Logits
hol
-0.16
bes
-0.15
by
-0.15
roker
-0.15
374
-0.15
ics
-0.15
leground
-0.15
↵
-0.15
oya
-0.14
-0.14
POSITIVE LOGITS
having
0.25
knowing
0.23
hearing
0.22
having
0.22
seeing
0.22
watching
0.21
how
0.18
Hearing
0.17
Having
0.16
Having
0.16
Activations Density 0.194%