INDEX
Explanations
information related to historical figures and their roles
New Auto-Interp
Negative Logits
ολ
-0.15
calar
-0.15
erge
-0.15
ocratic
-0.15
addCriterion
-0.14
downt
-0.14
oden
-0.14
imer
-0.14
ımızda
-0.14
hatt
-0.14
POSITIVE LOGITS
ige
0.19
stell
0.17
Initi
0.17
borderTop
0.17
Che
0.16
idge
0.16
quier
0.15
ạ
0.14
Refer
0.14
lang
0.14
Activations Density 0.028%