INDEX
Explanations
expressions of concern or anxiety
New Auto-Interp
Negative Logits
chy
-0.18
isz
-0.17
ery
-0.16
_FLUSH
-0.16
kits
-0.15
erville
-0.15
orp
-0.14
agr
-0.14
ãģĦãģŁ
-0.14
езда
-0.13
POSITIVE LOGITS
about
0.24
ingly
0.22
_about
0.19
tentang
0.19
worry
0.18
about
0.17
oldt
0.16
875
0.16
ABOUT
0.16
fret
0.16
Activations Density 0.017%