INDEX
Explanations
references to authority or hierarchy, particularly related to "super" or "supreme."
New Auto-Interp
Negative Logits
toa
-0.16
üst
-0.15
zes
-0.15
ü
-0.15
ÄĻk
-0.14
ture
-0.14
tar
-0.14
tier
-0.14
tail
-0.14
zing
-0.14
POSITIVE LOGITS
erv
0.23
posing
0.22
sup
0.22
posed
0.22
erville
0.21
reme
0.20
ervisor
0.20
à¹Ģà¸Ľà¸Ńร
0.20
erset
0.20
erval
0.19
Activations Density 0.012%