INDEX
Explanations
phrases indicating similarity or comparison
New Auto-Interp
Negative Logits
es
-0.68
dymyr
-0.61
sto
-0.60
cupa
-0.60
aphne
-0.59
ate
-0.58
o
-0.57
ste
-0.57
sphinct
-0.56
arbox
-0.56
POSITIVE LOGITS
Similar
1.27
Similar
1.25
SIMILAR
1.23
similar
1.22
RectangleBorder
1.21
similar
1.18
nahilalakip
1.12
Похо
1.10
iliar
1.08
simil
1.01
Activations Density 0.101%