INDEX
Explanations
expressions of positive surprise or unexpected enjoyment
New Auto-Interp
Negative Logits
anneer
-0.15
egt
-0.15
future
-0.15
пÑĢави
-0.14
flo
-0.14
Joh
-0.14
buie
-0.14
ļ
-0.14
sund
-0.14
ene
-0.13
POSITIVE LOGITS
dedim
0.18
СÐŀ
0.17
decided
0.17
ainen
0.16
داش
0.16
olis
0.15
ÄĽÅĻ
0.15
_macros
0.14
_PS
0.14
.openg
0.14
Activations Density 0.299%