INDEX
Explanations
expressions of humility and related qualities
New Auto-Interp
Negative Logits
thumbs
-0.16
villa
-0.14
aldo
-0.14
ainless
-0.14
ONO
-0.14
onte
-0.13
wire
-0.13
zilla
-0.13
term
-0.13
helm
-0.13
POSITIVE LOGITS
kker
0.16
ardy
0.16
idata
0.16
ERRU
0.16
kins
0.15
hum
0.15
anka
0.15
Hum
0.15
Hum
0.14
isle
0.14
Activations Density 0.014%