INDEX
Explanations
terms related to leadership positions and roles
New Auto-Interp
Negative Logits
bak
-0.16
unga
-0.15
Bak
-0.15
mess
-0.15
umble
-0.14
orget
-0.14
ijing
-0.14
uls
-0.14
amoto
-0.14
ation
-0.14
POSITIVE LOGITS
yor
0.18
erten
0.15
defe
0.14
640
0.14
lor
0.14
беÑĢ
0.13
ë¡
0.13
oric
0.13
ÙĪÙĨÛĮ
0.13
еÑĢÑĪ
0.13
Activations Density 0.008%