INDEX
Explanations
phrases relating to self-belief and confidence
New Auto-Interp
Negative Logits
нÑı
-0.17
rib
-0.15
yy
-0.15
yny
-0.15
quel
-0.15
arakter
-0.14
pg
-0.14
utz
-0.14
stell
-0.14
yyy
-0.14
POSITIVE LOGITS
worthy
0.18
ance
0.17
fulness
0.16
enco
0.15
bel
0.15
Bel
0.15
strongly
0.15
vably
0.15
252
0.15
ances
0.14
Activations Density 0.040%