INDEX
Explanations
terms related to superiority and authority
New Auto-Interp
Negative Logits
toa
-0.16
ÄĻk
-0.16
ü
-0.15
ÏĮ
-0.15
øy
-0.15
dehyde
-0.15
zes
-0.15
blas
-0.15
tar
-0.14
blem
-0.14
POSITIVE LOGITS
sup
0.23
erv
0.22
posing
0.21
posed
0.21
erville
0.20
erc
0.19
à¹Ģà¸Ľà¸Ńร
0.19
ervisor
0.19
Sup
0.19
erset
0.18
Activations Density 0.013%