INDEX
Explanations
comparisons stating that two things are the same
the phrase "the same as."
New Auto-Interp
Negative Logits
uca
-0.70
uci
-0.63
uto
-0.62
endale
-0.62
ulent
-0.61
ocy
-0.60
ople
-0.59
eker
-0.59
uld
-0.59
raq
-0.58
POSITIVE LOGITS
ylum
0.84
regards
0.80
bestos
0.67
ours
0.67
Horton
0.67
ocial
0.67
usual
0.66
pects
0.65
pire
0.64
advertised
0.63
Activations Density 0.046%