INDEX
Explanations
emotional responses and reactions to societal issues
New Auto-Interp
Negative Logits
anzi
-0.16
اÙĦا
-0.15
poss
-0.15
благодаÑĢÑı
-0.15
Heb
-0.14
thanks
-0.14
[[]
-0.14
_prog
-0.13
ine
-0.13
poss
-0.13
POSITIVE LOGITS
how
0.29
how
0.23
considering
0.23
cómo
0.20
å¤ļå°ij
0.19
hearing
0.18
indeed
0.17
HOW
0.17
ÙĥÙĬÙģ
0.16
аÑĢаÑĤ
0.16
Activations Density 0.082%