INDEX
Explanations
verbs related to movement or change of position
verbs indicating an action or participation in an event
New Auto-Interp
Negative Logits
rone
-0.80
zo
-0.67
âĹ¼
-0.64
Koreans
-0.64
ria
-0.63
ricular
-0.60
ignor
-0.58
rome
-0.57
ole
-0.57
street
-0.56
POSITIVE LOGITS
hirt
0.89
paces
0.88
ometimes
0.83
atform
0.82
hift
0.81
ilver
0.81
omething
0.77
hement
0.77
anamo
0.77
psey
0.75
Activations Density 0.349%