INDEX
Explanations
phrases related to constructing or assembling something, particularly in a setting involving imitation or approximation
negative phrases related to deception or trickery
New Auto-Interp
Negative Logits
unfocusedRange
-0.81
ourses
-0.79
clerosis
-0.76
GOODMAN
-0.76
ibel
-0.72
actionGroup
-0.72
RELEASE
-0.70
ourse
-0.70
IRC
-0.69
uve
-0.68
POSITIVE LOGITS
wra
0.72
Hats
0.68
balls
0.68
caric
0.67
pas
0.65
pants
0.64
excuse
0.63
hide
0.62
whisk
0.62
boot
0.61
Activations Density 0.215%