INDEX
Explanations
phrases expressing feelings of privilege and honor
New Auto-Interp
Negative Logits
lagen
-0.15
jud
-0.14
ily
-0.14
smouth
-0.14
.ag
-0.14
ook
-0.14
æĺŁ
-0.13
ohl
-0.13
Thunk
-0.13
inn
-0.13
POSITIVE LOGITS
privilege
0.62
privileged
0.48
priv
0.43
Priv
0.42
prive
0.42
priv
0.41
privileges
0.41
privileged
0.40
Priv
0.38
privile
0.36
Activations Density 0.133%