INDEX
Explanations
expressions and mentions of happiness
feeling happy
New Auto-Interp
Negative Logits
Roof
-0.39
disponibilités
-0.36
Độ
-0.36
Становништво
-0.35
NameInMap
-0.35
strokeStyle
-0.35
zieher
-0.35
wyżs
-0.34
-------
-0.33
<bos>
-0.33
POSITIVE LOGITS
happy
0.79
Happy
0.74
HAPPY
0.71
HAPPY
0.71
Happy
0.68
happy
0.63
happ
0.60
felices
0.57
makeText
0.57
للمعارف
0.57
Activations Density 0.009%