INDEX
Explanations
phrases indicating contrast or difference
New Auto-Interp
Negative Logits
atura
-0.16
sto
-0.14
erton
-0.14
ADOW
-0.14
¦
-0.14
N
-0.13
Gle
-0.13
ÙĨج
-0.13
kort
-0.13
strup
-0.13
POSITIVE LOGITS
unlike
0.19
Unlike
0.16
rene
0.15
Unlike
0.15
[assembly
0.15
vester
0.14
sdale
0.14
Trit
0.14
pch
0.14
ìĦŃ
0.14
Activations Density 0.019%