INDEX
Explanations
references to competitive sports events and their outcomes
New Auto-Interp
Negative Logits
Collapse
-0.16
tom
-0.15
izen
-0.15
paren
-0.15
afort
-0.14
eree
-0.14
gord
-0.14
ضا
-0.14
جاÙħ
-0.13
dater
-0.13
POSITIVE LOGITS
Brut
0.17
andi
0.15
iyon
0.15
peon
0.14
adan
0.13
ibu
0.13
seg
0.13
elry
0.13
Medic
0.13
ooks
0.13
Activations Density 0.005%