INDEX
Explanations
mentions of significant cultural events or figures
New Auto-Interp
Negative Logits
anch
-0.17
wheel
-0.15
Reds
-0.15
628
-0.14
hani
-0.14
multinational
-0.14
ino
-0.13
Buckley
-0.13
annies
-0.13
eth
-0.13
POSITIVE LOGITS
essim
0.19
iani
0.19
emann
0.15
ermann
0.15
omba
0.14
abbo
0.14
лами
0.14
-addons
0.14
erman
0.14
trecht
0.14
Activations Density 0.664%