INDEX
Explanations
references to merit and meritocracy
New Auto-Interp
Negative Logits
nette
-0.16
strup
-0.16
alar
-0.15
Carbon
-0.15
herent
-0.14
Cleaner
-0.14
URRE
-0.14
ÑĥнкÑĤ
-0.14
795
-0.14
794
-0.14
POSITIVE LOGITS
maid
0.29
idian
0.27
maids
0.27
cedes
0.25
cur
0.25
oving
0.23
itor
0.23
ienda
0.22
ced
0.21
etric
0.21
Activations Density 0.010%