INDEX
Explanations
proper nouns referencing sports teams and their affiliations
New Auto-Interp
Negative Logits
lessness
-0.16
нев
-0.15
apsed
-0.15
umbed
-0.14
Institutes
-0.14
ourcem
-0.14
ureau
-0.14
icho
-0.14
hausen
-0.14
ogui
-0.14
POSITIVE LOGITS
faithful
0.20
ettes
0.18
们
0.17
urs
0.15
ies
0.15
birds
0.15
themselves
0.15
gs
0.15
Bias
0.15
les
0.14
Activations Density 0.035%