INDEX
Explanations
phrases that express positivity, particularly those containing the word "good."
New Auto-Interp
Negative Logits
ament
-0.15
ount
-0.15
бол
-0.15
upert
-0.15
butt
-0.14
agt
-0.14
isLoggedIn
-0.14
åł
-0.14
Educ
-0.14
ÙģÙĪ
-0.14
POSITIVE LOGITS
yms
0.17
HRESULT
0.16
698
0.16
_Reference
0.15
ymi
0.15
Sor
0.15
lander
0.15
ẽ
0.14
Fé
0.14
_wo
0.14
Activations Density 0.045%