INDEX
Explanations
phrases or words related to actions or directives
occurrences of the word "action" in various contexts
New Auto-Interp
Negative Logits
gling
-0.68
YL
-0.62
thening
-0.62
aez
-0.61
mining
-0.61
KER
-0.60
Affairs
-0.60
Äį
-0.57
gins
-0.57
arton
-0.56
POSITIVE LOGITS
able
0.95
ivated
0.93
Replay
0.83
verbs
0.81
quit
0.81
ives
0.80
figure
0.79
hero
0.79
igraph
0.76
iveness
0.75
Activations Density 0.050%