INDEX
Explanations
references to historical figures and leadership roles in a cultural context
New Auto-Interp
Negative Logits
Sabha
-0.15
Victorian
-0.15
lah
-0.14
á»ĵi
-0.14
149
-0.14
Laure
-0.13
183
-0.13
svob
-0.13
French
-0.13
ull
-0.13
POSITIVE LOGITS
Hou
0.20
Tang
0.20
Han
0.20
Jose
0.19
Han
0.18
éļ
0.18
Yellow
0.18
Yellow
0.17
Bian
0.17
HÃłn
0.17
Activations Density 0.010%