INDEX
Explanations
instances of the word 'Better'
references to the word "better"
New Auto-Interp
Negative Logits
Dresden
-0.68
NetMessage
-0.65
idon
-0.63
Pione
-0.63
Straw
-0.62
heter
-0.61
trl
-0.58
Pavilion
-0.58
wk
-0.58
gemony
-0.58
POSITIVE LOGITS
suited
1.04
than
1.01
behaved
0.97
acquainted
0.85
seller
0.82
ment
0.80
than
0.79
Than
0.77
quality
0.75
iating
0.74
Activations Density 0.045%