INDEX
Explanations
references to different sports
mentions of various sports
New Auto-Interp
Negative Logits
ignt
-0.74
idges
-0.69
sidx
-0.64
andering
-0.63
ppelin
-0.62
defective
-0.62
sunshine
-0.62
inki
-0.62
ading
-0.60
inois
-0.60
POSITIVE LOGITS
sw
1.25
scar
1.08
nell
0.98
sc
0.85
spe
0.83
manship
0.81
club
0.81
Sport
0.80
enegger
0.78
nel
0.78
Activations Density 0.010%