INDEX
Explanations
instances of the word "drop" and its variations, often in contexts related to relinquishing or letting go of something
New Auto-Interp
Negative Logits
ume
-0.18
iring
-0.18
leness
-0.16
Up
-0.15
ffects
-0.15
jury
-0.15
irl
-0.14
ifers
-0.14
ating
-0.14
Up
-0.14
POSITIVE LOGITS
anchor
0.31
kick
0.27
trou
0.27
hints
0.26
anchor
0.26
-anchor
0.24
-down
0.23
dead
0.23
bombs
0.22
-off
0.22
Activations Density 0.029%