INDEX
Explanations
phrases that convey comparisons or contrasts
New Auto-Interp
Negative Logits
yna
-0.19
mand
-0.17
yn
-0.17
rought
-0.16
ynes
-0.16
mand
-0.15
mes
-0.15
uw
-0.15
odium
-0.14
calar
-0.14
POSITIVE LOGITS
iver
0.14
'options
0.14
insky
0.14
istrovstvÃŃ
0.14
erras
0.13
leo
0.13
Undefined
0.13
háºŃu
0.13
ợi
0.13
çīĩ
0.13
Activations Density 0.075%