INDEX
Explanations
verbs related to taking action or making decisions
phrases indicating goals or objectives
New Auto-Interp
Negative Logits
abad
-0.74
aspers
-0.72
IQ
-0.65
okia
-0.65
ashtra
-0.64
JP
-0.64
é¾įå¥ij士
-0.63
ccording
-0.63
dependent
-0.63
¿½
-0.62
POSITIVE LOGITS
liest
0.93
ggles
0.70
Ney
0.65
ked
0.64
elight
0.64
rium
0.63
nearest
0.62
oneself
0.61
gery
0.61
]),
0.61
Activations Density 0.313%