INDEX
Explanations
phrases relating to equivalence and comparison
New Auto-Interp
Negative Logits
yon
-0.14
ella
-0.14
ãĢĪ
-0.14
Ramos
-0.14
olith
-0.14
eron
-0.14
_simps
-0.14
oret
-0.13
ÙĬج
-0.13
yms
-0.13
POSITIVE LOGITS
Equivalent
0.18
ypsy
0.16
823
0.16
asti
0.15
equivalent
0.15
GOODMAN
0.15
iez
0.15
metatable
0.15
ivalent
0.15
onymous
0.14
Activations Density 0.015%