INDEX
Explanations
phrases related to initiating actions or events
New Auto-Interp
Negative Logits
hed
-0.14
TORT
-0.14
ths
-0.14
cki
-0.14
tort
-0.14
iap
-0.14
azen
-0.14
enne
-0.14
indsight
-0.13
hea
-0.13
POSITIVE LOGITS
kicked
0.23
boxing
0.22
Kick
0.21
kick
0.20
kicking
0.19
Kick
0.18
starter
0.17
kick
0.17
kicks
0.17
start
0.17
Activations Density 0.014%