INDEX
Explanations
phrases indicating movement or action
past tense verbs and certain action-related phrases
New Auto-Interp
Negative Logits
avery
-0.70
mong
-0.67
hov
-0.66
eway
-0.65
vier
-0.65
WER
-0.64
icist
-0.64
bold
-0.64
gart
-0.63
ateg
-0.62
POSITIVE LOGITS
join
0.66
Chimera
0.61
THEM
0.59
strap
0.58
Leopard
0.58
ipeg
0.57
them
0.56
Scorp
0.56
æĸ¹
0.56
VIDEOS
0.55
Activations Density 0.273%