INDEX
Explanations
words related to sports
references to a specific sport
New Auto-Interp
Negative Logits
ignt
-0.66
abetic
-0.63
sidx
-0.62
Khe
-0.60
pts
-0.60
idges
-0.59
ppelin
-0.57
defective
-0.57
tubes
-0.57
chords
-0.56
POSITIVE LOGITS
sw
1.32
scar
1.15
nell
1.11
sc
0.90
manship
0.89
iest
0.88
bike
0.87
ive
0.84
nels
0.84
enegger
0.83
Activations Density 0.020%