INDEX
Explanations
descriptions of influential historical figures and their characteristics
New Auto-Interp
Negative Logits
stery
-0.17
atrice
-0.16
ạng
-0.15
esture
-0.15
ãĤ¹ãĥ¬
-0.15
Blades
-0.15
Flour
-0.14
sert
-0.14
estone
-0.14
');");↵
-0.14
POSITIVE LOGITS
figure
0.28
person
0.27
人çī©
0.25
figura
0.23
someone
0.23
guy
0.22
somebody
0.22
homme
0.22
man
0.22
uomo
0.21
Activations Density 0.271%