INDEX
Explanations
references to leadership positions and roles
New Auto-Interp
Negative Logits
unca
-0.17
apy
-0.15
ucc
-0.15
berman
-0.14
лагод
-0.14
uela
-0.14
eday
-0.14
omed
-0.14
uzz
-0.14
anism
-0.14
POSITIVE LOGITS
lining
0.23
hunt
0.23
master
0.23
quartered
0.22
ache
0.21
hon
0.20
shot
0.20
strong
0.20
ship
0.19
count
0.19
Activations Density 0.010%