INDEX
Explanations
words associated with contrasts in emotional states or qualities
New Auto-Interp
Negative Logits
ows
-0.19
ÑģÑĤан
-0.16
richt
-0.15
ylon
-0.14
ç¨ĭ度
-0.14
ifications
-0.14
uality
-0.14
3
-0.14
_define
-0.14
ÅĻ
-0.14
POSITIVE LOGITS
spir
0.17
angled
0.16
angan
0.16
incl
0.16
astic
0.16
eming
0.16
iegel
0.16
ł
0.15
¼
0.15
assed
0.15
Activations Density 0.203%