INDEX
Explanations
expressions of happiness and positive feelings
New Auto-Interp
Negative Logits
erner
-0.17
ndata
-0.16
leans
-0.15
etting
-0.15
------+------+
-0.14
ÌĤ
-0.14
plevel
-0.14
ered
-0.14
ÄĽr
-0.14
bsite
-0.14
POSITIVE LOGITS
-go
0.20
/light
0.16
yyyy
0.16
-looking
0.15
ogo
0.15
(er
0.14
fully
0.14
dest
0.14
acket
0.14
lic
0.14
Activations Density 0.034%