INDEX
Explanations
words related to taking action or responsibility
instances of the word "step" and its variations
New Auto-Interp
Negative Logits
selage
-0.78
ecause
-0.69
rontal
-0.66
orsche
-0.63
pport
-0.62
ores
-0.60
raid
-0.60
è¦ļéĨĴ
-0.60
Reasons
-0.59
herent
-0.58
POSITIVE LOGITS
forth
0.99
frog
0.97
aside
0.97
forward
0.91
ashore
0.87
up
0.84
foot
0.84
up
0.80
out
0.79
toe
0.77
Activations Density 0.029%