INDEX
Explanations
verbs and actions related to engagement and social interaction
New Auto-Interp
Negative Logits
égor
-0.16
omor
-0.16
reator
-0.15
itre
-0.15
ilet
-0.15
yx
-0.14
many
-0.14
atan
-0.14
.vertx
-0.14
à¸Ńà¸ĩ
-0.14
POSITIVE LOGITS
away
0.38
Away
0.30
Away
0.28
-away
0.28
away
0.25
harder
0.16
till
0.16
lk
0.15
sobie
0.15
mad
0.15
Activations Density 0.162%