INDEX
Explanations
phrases related to actions and behaviors
instances of the word "acted" in various contexts
New Auto-Interp
Negative Logits
pled
-0.77
occupied
-0.74
issue
-0.73
fer
-0.71
mare
-0.71
uns
-0.65
abol
-0.65
moon
-0.64
mark
-0.64
olen
-0.63
POSITIVE LOGITS
uary
0.94
acebook
0.88
uated
0.81
accordingly
0.77
Replay
0.76
acted
0.75
SourceFile
0.75
therap
0.73
inic
0.72
ecided
0.72
Activations Density 0.008%