INDEX
Explanations
entities related to historical figures and their relationships
New Auto-Interp
Negative Logits
iram
-0.15
reater
-0.15
quirrel
-0.15
ÅĻi
-0.14
rů
-0.14
Earn
-0.14
niej
-0.14
Kral
-0.14
erca
-0.14
klu
-0.14
POSITIVE LOGITS
akh
0.20
zh
0.20
Volk
0.19
Push
0.19
ugin
0.18
achen
0.18
enin
0.18
Tro
0.17
istrat
0.17
agina
0.16
Activations Density 0.088%