INDEX
Explanations
comparisons and evaluations in text, focusing on expressions related to superiority or advancement
New Auto-Interp
Negative Logits
raq
-1.09
autions
-1.09
Sit
-1.01
uctions
-1.01
imity
-0.97
oyer
-0.96
eeper
-0.95
negie
-0.92
oking
-0.92
Remove
-0.91
POSITIVE LOGITS
average
1.15
usual
1.05
ours
1.04
theirs
1.00
average
0.99
ordinary
0.97
anticipated
0.93
Gore
0.91
anything
0.90
usual
0.89
Activations Density 1.989%