INDEX
Explanations
action verbs related to significant experiences or changes
New Auto-Interp
Negative Logits
Ñİк
-0.16
ahat
-0.15
yper
-0.15
aub
-0.15
rine
-0.15
tero
-0.15
267
-0.14
asal
-0.14
roker
-0.14
одо
-0.14
POSITIVE LOGITS
iday
0.17
irket
0.16
zan
0.16
umbs
0.15
sit
0.14
oler
0.14
angelo
0.14
ermal
0.14
çĿ£
0.14
libs
0.14
Activations Density 0.110%