INDEX
Explanations
instances of the word "Better" and its variations
New Auto-Interp
Negative Logits
trl
-0.77
essee
-0.72
ettes
-0.69
Nob
-0.68
mberg
-0.68
ette
-0.65
cano
-0.64
gemony
-0.61
warts
-0.61
wk
-0.59
POSITIVE LOGITS
Than
0.97
Faster
0.88
ment
0.87
than
0.84
behaved
0.82
than
0.81
iation
0.81
suited
0.77
ments
0.72
Ideas
0.69
Activations Density 0.020%