INDEX
Explanations
references to emotional states and psychological concepts
New Auto-Interp
Negative Logits
iman
-0.17
ogle
-0.15
afa
-0.14
etÃŃ
-0.14
uchs
-0.14
enz
-0.14
imen
-0.14
ëĭĪìĬ¤
-0.14
šem
-0.14
ingers
-0.13
POSITIVE LOGITS
μÏĢο
0.17
bour
0.15
exclus
0.14
Ïģιν
0.14
might
0.14
bunu
0.14
Ñģли
0.14
haps
0.13
responsible
0.13
483
0.13
Activations Density 0.003%