INDEX
Explanations
words that convey assertiveness or confidence
New Auto-Interp
Negative Logits
rete
-0.16
ever
-0.15
vez
-0.15
èµĦæĸĻ
-0.15
ctor
-0.15
acro
-0.15
ÏĦαν
-0.15
plode
-0.15
htable
-0.14
trinsic
-0.14
POSITIVE LOGITS
ness
0.33
face
0.28
-faced
0.25
-face
0.24
ly
0.23
enough
0.22
ened
0.21
speaker
0.20
faced
0.20
symbol
0.20
Activations Density 0.026%