INDEX
Explanations
references to leadership roles
New Auto-Interp
Negative Logits
anni
-0.15
sonian
-0.15
plum
-0.15
JR
-0.15
éĿ©
-0.14
orex
-0.14
دة
-0.14
ucz
-0.14
bird
-0.14
aval
-0.14
POSITIVE LOGITS
ãĥªãĤ«
0.16
oldown
0.16
ç¹Ķ
0.15
hower
0.14
zap
0.14
ãĥ¬ãĤ¹
0.14
spacer
0.14
è
0.14
yles
0.14
人åĵ¡
0.13
Activations Density 0.165%