INDEX
Explanations
instructions or reminders
phrases emphasizing the importance of remembering or not forgetting something
New Auto-Interp
Negative Logits
hook
-0.73
elle
-0.69
folk
-0.68
framework
-0.67
wom
-0.65
cheat
-0.65
oreal
-0.64
ullah
-0.64
law
-0.64
Released
-0.64
POSITIVE LOGITS
heny
0.70
vation
0.64
elbows
0.63
Fraz
0.62
additions
0.62
sweets
0.62
tainment
0.60
theless
0.60
pieces
0.59
classics
0.58
Activations Density 0.025%