INDEX
Explanations
expressions of personal feelings and emotional responses
New Auto-Interp
Negative Logits
emez
-0.15
ÃŃda
-0.15
vailability
-0.15
eyin
-0.15
MOTE
-0.15
rema
-0.14
regor
-0.14
ALSE
-0.14
ONO
-0.14
pering
-0.14
POSITIVE LOGITS
feel
0.23
uncomfortable
0.23
aware
0.23
want
0.23
sad
0.21
question
0.20
proud
0.20
gig
0.20
laugh
0.19
comfortable
0.19
Activations Density 0.037%