INDEX
Explanations
expressions of emotional states or feelings
New Auto-Interp
Negative Logits
ule
-0.15
ula
-0.14
esan
-0.14
pg
-0.14
isu
-0.14
.simple
-0.14
Mary
-0.14
unh
-0.13
aga
-0.13
L
-0.13
POSITIVE LOGITS
like
0.26
åĥıæĺ¯
0.19
_like
0.18
như
0.18
ingly
0.18
Like
0.18
Like
0.18
LIKE
0.17
seperti
0.16
bern
0.16
Activations Density 0.023%