INDEX
Explanations
phrases related to comparison or evaluation
phrases that emphasize negation or denial
New Auto-Interp
Negative Logits
Almighty
-0.64
LIN
-0.63
Naples
-0.59
WT
-0.59
Condition
-0.56
catentry
-0.56
multipl
-0.56
motif
-0.55
TEXTURE
-0.54
soType
-0.53
POSITIVE LOGITS
anymore
0.69
vae
0.68
nor
0.64
Enough
0.64
bother
0.63
dfx
0.61
REDACTED
0.61
Cheong
0.61
userc
0.60
cffff
0.59
Activations Density 0.182%