INDEX
Explanations
expressions of gladness or happiness
New Auto-Interp
Negative Logits
frau
-0.16
er
-0.16
etics
-0.16
izzo
-0.15
677
-0.15
werk
-0.14
eway
-0.14
ansas
-0.14
ikki
-0.13
à¹Ģà¸Ħ
-0.13
POSITIVE LOGITS
stone
0.21
ys
0.19
fully
0.17
wyn
0.17
tid
0.17
win
0.17
dest
0.17
lıkla
0.16
indow
0.16
stones
0.16
Activations Density 0.006%