INDEX
Explanations
descriptors that imply courage, assertiveness, or a strong stance in various contexts
New Auto-Interp
Negative Logits
ÏĦαν
-0.16
iten
-0.15
rete
-0.15
orro
-0.14
acro
-0.14
torture
-0.14
iggs
-0.14
иÑĤелÑĮно
-0.14
ctor
-0.14
dup
-0.14
POSITIVE LOGITS
ness
0.33
face
0.26
-faced
0.24
ly
0.22
enough
0.21
speaker
0.21
statement
0.21
-face
0.21
statements
0.21
faced
0.19
Activations Density 0.028%