INDEX
Explanations
verbs related to physical actions or outcomes
New Auto-Interp
Negative Logits
wont
-0.59
lihood
-0.57
requiring
-0.56
unfolding
-0.55
applied
-0.55
preventing
-0.55
unfolded
-0.54
specifying
-0.54
assisting
-0.53
namely
-0.53
POSITIVE LOGITS
oneself
0.86
yourselves
0.84
yourself
0.83
ourselves
0.75
ãĥ³ãĤ¸
0.73
toes
0.70
ulate
0.68
noses
0.68
kered
0.65
tune
0.65
Activations Density 0.811%