INDEX
Explanations
words related to various professions and social roles
New Auto-Interp
Negative Logits
uye
-0.17
jiang
-0.15
ÅĤa
-0.14
rang
-0.14
abal
-0.14
ulet
-0.14
was
-0.14
ÏĦÎŃ
-0.14
ighton
-0.13
BÃł
-0.13
POSITIVE LOGITS
often
0.23
generally
0.23
often
0.19
Generally
0.17
souvent
0.17
typically
0.17
usually
0.16
Often
0.16
Often
0.16
notoriously
0.15
Activations Density 0.173%