INDEX
Explanations
words related to demotion
references to the concept of 'demotion' or its variations
New Auto-Interp
Negative Logits
WAYS
-0.90
akeru
-0.67
lihood
-0.66
olkien
-0.65
20439
-0.64
orship
-0.64
hetti
-0.64
ORY
-0.64
tip
-0.63
tis
-0.63
POSITIVE LOGITS
agogue
1.03
agog
0.98
dem
0.97
ographically
0.92
onym
0.92
ploy
0.90
ilit
0.85
igration
0.84
onse
0.84
igmat
0.81
Activations Density 0.005%