INDEX
Explanations
phrases that convey comparisons or similarities
New Auto-Interp
Negative Logits
ãģ¾ãģŁ
-0.17
usercontent
-0.15
ëijĺ
-0.13
ange
-0.13
_then
-0.13
ERSHEY
-0.13
tır
-0.13
igo
-0.13
orno
-0.13
ycopg
-0.13
POSITIVE LOGITS
ones
0.34
ones
0.30
those
0.30
ours
0.24
those
0.22
Ones
0.22
:
0.21
напÑĢимеÑĢ
0.21
napÅĻ
0.20
yours
0.19
Activations Density 0.152%