INDEX
Explanations
concepts related to feelings of shame and expressions of discontent
New Auto-Interp
Negative Logits
ãģĬãĤĬ
-0.19
ehir
-0.17
çħ§
-0.17
_shader
-0.16
sic
-0.16
sen
-0.16
lod
-0.15
neh
-0.15
extremes
-0.15
ation
-0.15
POSITIVE LOGITS
pherd
0.19
peare
0.19
cro
0.17
akespeare
0.16
ampoo
0.16
ppard
0.16
ered
0.15
tember
0.15
/bl
0.15
orthand
0.15
Activations Density 0.183%