INDEX
Explanations
instances of encouragement and motivation in various contexts
New Auto-Interp
Negative Logits
achten
-0.15
ound
-0.15
cket
-0.15
enton
-0.15
orc
-0.14
_initialize
-0.14
ake
-0.13
aroo
-0.13
atars
-0.13
ARED
-0.13
POSITIVE LOGITS
towards
0.21
toward
0.20
encouraged
0.20
oward
0.18
ambient
0.18
emb
0.17
to
0.16
pson
0.16
encourage
0.16
Confidence
0.16
Activations Density 0.097%