INDEX
Explanations
phrases related to decisive or impactful actions
instances of the word "the" in various contexts
New Auto-Interp
Negative Logits
ttp
-0.80
ornings
-0.76
HAEL
-0.76
ossal
-0.75
-->
-0.75
ogyn
-0.74
deen
-0.73
nces
-0.72
>:
-0.69
oine
-0.69
POSITIVE LOGITS
envelope
1.25
blame
1.23
brakes
1.22
needle
1.13
ball
1.07
curtain
1.03
lid
1.02
screws
1.02
reins
0.99
curtains
0.97
Activations Density 0.161%