INDEX
Explanations
verbs related to taking action or making changes
New Auto-Interp
Negative Logits
starter
-0.63
abet
-0.62
sa
-0.61
ija
-0.61
raltar
-0.61
spot
-0.61
den
-0.60
ahoo
-0.60
mad
-0.59
topped
-0.59
POSITIVE LOGITS
ulate
0.96
them
0.92
orously
0.88
uate
0.87
ively
0.86
their
0.85
ibly
0.84
our
0.81
oneself
0.81
themselves
0.78
Activations Density 3.061%