INDEX
Explanations
words related to suggestions or recommendations
phrases questioning or challenging the status quo
New Auto-Interp
Negative Logits
photo
-0.74
ELD
-0.72
digit
-0.70
utch
-0.68
iple
-0.66
Reviewed
-0.65
chairs
-0.64
vor
-0.63
iHUD
-0.61
ItemTracker
-0.61
POSITIVE LOGITS
indulge
0.87
?]
0.86
emulate
0.81
itia
0.79
unleash
0.77
succumb
0.74
abolish
0.74
invoke
0.74
intervene
0.74
recreate
0.73
Activations Density 0.028%