INDEX
Explanations
instances of the word "shame" or related forms in various contexts
New Auto-Interp
Negative Logits
im
-0.16
ended
-0.16
anche
-0.16
ekt
-0.15
sin
-0.15
324
-0.15
are
-0.15
ges
-0.14
acio
-0.14
Parm
-0.14
POSITIVE LOGITS
sh
0.43
(sh
0.17
.sh
0.17
rew
0.17
enan
0.17
sh
0.17
-sh
0.16
ogan
0.15
rou
0.15
.scalablytyped
0.15
Activations Density 0.017%