INDEX
Explanations
phrases indicating scarcity or deficiency
New Auto-Interp
Negative Logits
ãĥ¼ãĥĢ
-0.15
à¥ĩà¤Łà¤°
-0.15
uala
-0.15
mt
-0.14
reo
-0.14
ntl
-0.13
Mrs
-0.13
ÑĤеÑĢи
-0.13
antan
-0.13
éro
-0.13
POSITIVE LOGITS
nor
0.22
nor
0.19
anymore
0.19
als
0.16
epad
0.16
bane
0.15
Nor
0.15
айÑĤе
0.15
ays
0.15
sondern
0.15
Activations Density 0.114%