INDEX
Explanations
expressions of love and positive feelings towards experiences or objects
New Auto-Interp
Negative Logits
unas
-0.16
ump
-0.15
sect
-0.15
372
-0.15
боÑĢ
-0.15
QUE
-0.15
um
-0.15
.xz
-0.15
uman
-0.15
real
-0.14
POSITIVE LOGITS
plung
0.15
full
0.15
plain
0.14
fat
0.14
birds
0.14
.cbo
0.14
uÃŃ
0.14
Volk
0.14
ably
0.14
-filled
0.14
Activations Density 0.028%