INDEX
Explanations
sentences or phrases indicating a problem or issue
the phrase "there's something wrong" or variations of it
New Auto-Interp
Negative Logits
incinn
-0.74
NetMessage
-0.73
cit
-0.72
herer
-0.70
xit
-0.67
weeney
-0.66
aukee
-0.66
pole
-0.65
è¦ļéĨĴ
-0.64
achev
-0.64
POSITIVE LOGITS
headed
0.78
eous
0.74
fully
0.71
behaviour
0.71
mouth
0.70
wrong
0.69
havoc
0.69
aligned
0.67
align
0.66
doing
0.65
Activations Density 0.012%