INDEX
Explanations
words related to authority and supervision
New Auto-Interp
Negative Logits
onec
-0.19
ÄĻk
-0.17
segue
-0.16
üs
-0.16
Hayes
-0.16
zes
-0.15
ypad
-0.15
ture
-0.15
bilt
-0.15
ü
-0.15
POSITIVE LOGITS
sup
0.25
posed
0.20
posing
0.19
Sup
0.19
stit
0.18
à¹Ģà¸Ľà¸Ńร
0.18
pling
0.17
lub
0.16
reme
0.16
خاÙħ
0.16
Activations Density 0.011%