INDEX
Explanations
references to smiling or enjoyment
New Auto-Interp
Negative Logits
amarin
-0.17
anager
-0.15
onn
-0.15
emer
-0.14
æ´
-0.14
.yy
-0.14
asje
-0.14
zcze
-0.14
inator
-0.14
inated
-0.14
POSITIVE LOGITS
Sm
0.30
sm
0.29
aller
0.26
.Sm
0.23
(sm
0.23
smo
0.22
/sm
0.21
.SM
0.21
arth
0.21
.sm
0.21
Activations Density 0.012%