INDEX
Explanations
comparative phrases emphasizing the degree of similarity or difference
New Auto-Interp
Negative Logits
æĹ¢
-0.17
inals
-0.15
astle
-0.14
dup
-0.14
onder
-0.14
istrovstvÃŃ
-0.14
isse
-0.14
etik
-0.14
Feels
-0.14
gew
-0.14
POSITIVE LOGITS
anything
0.29
anything
0.25
Anything
0.21
Anything
0.20
ÏĢαÏģά
0.16
versus
0.16
anywhere
0.16
vice
0.15
ori
0.14
chine
0.14
Activations Density 0.050%