INDEX
Explanations
comparisons indicating improvement or superiority
instances of the word "better."
New Auto-Interp
Negative Logits
iasco
-0.70
lies
-0.69
Contents
-0.63
endant
-0.61
ital
-0.60
lig
-0.60
opsis
-0.60
aline
-0.59
achusetts
-0.59
achi
-0.59
POSITIVE LOGITS
better
3.42
better
2.85
Better
2.07
worse
2.06
Better
2.06
nicer
2.05
smarter
1.95
safer
1.94
stronger
1.81
wiser
1.76
Activations Density 0.032%