INDEX
Explanations
geographic locations and historical context related to cultural figures
New Auto-Interp
Negative Logits
_WRONG
-0.17
ares
-0.17
bbw
-0.15
å±±å¸Ĥ
-0.15
outil
-0.15
683
-0.14
ÏĥÏĥ
-0.14
ätt
-0.14
Deniz
-0.14
vra
-0.14
POSITIVE LOGITS
Mem
0.17
Kash
0.16
Lithuania
0.16
Packers
0.15
keh
0.15
brick
0.15
Polish
0.15
ÅĤ
0.15
ddb
0.14
Pr
0.14
Activations Density 0.015%