INDEX
Explanations
exclamatory statements and commands
phrases or expressions indicating strong emotional reactions
New Auto-Interp
Negative Logits
reclaimed
-0.74
metic
-0.74
conj
-0.71
preserved
-0.70
featured
-0.68
recomb
-0.68
tangled
-0.67
fused
-0.67
targeted
-0.67
entangled
-0.67
POSITIVE LOGITS
Hey
1.09
Everybody
1.04
everyone
1.04
why
1.04
please
1.03
Look
1.01
Okay
1.00
nob
0.98
there
0.96
everything
0.96
Activations Density 0.047%