INDEX
Explanations
words and phrases describing influential and legendary figures
New Auto-Interp
Negative Logits
atrice
-0.17
olg
-0.15
Actress
-0.15
ohana
-0.15
ehir
-0.14
اÙĪÙĨ
-0.14
жаÑĢ
-0.14
packed
-0.14
iedo
-0.14
ÙĨدگاÙĨ
-0.14
POSITIVE LOGITS
figure
0.28
person
0.25
人çī©
0.22
guy
0.22
man
0.20
figura
0.20
himself
0.20
someone
0.20
homme
0.19
somebody
0.18
Activations Density 0.430%