INDEX
Explanations
phrases indicating similarity or equivalence
phrases that include the word "as."
New Auto-Interp
Negative Logits
YD
-0.74
Whe
-0.67
uld
-0.66
ople
-0.66
Plex
-0.65
LA
-0.65
Kent
-0.64
endale
-0.64
loe
-0.64
uned
-0.63
POSITIVE LOGITS
usual
0.80
pects
0.79
ours
0.78
regards
0.75
ylum
0.75
advertised
0.74
bestos
0.72
follows
0.70
opposed
0.69
ocial
0.66
Activations Density 0.042%