INDEX
Explanations
references to a sports team named "Lions."
references to a specific sports team, the Lions
New Auto-Interp
Negative Logits
mble
-0.99
elsius
-0.93
PDATE
-0.92
Seym
-0.92
ntil
-0.85
ccording
-0.84
DonaldTrump
-0.83
ulates
-0.79
srf
-0.79
lly
-0.74
POSITIVE LOGITS
Lions
0.95
Tigers
0.85
shire
0.78
Pistons
0.78
burg
0.75
berger
0.75
Packers
0.74
Hots
0.72
OGR
0.72
izont
0.70
Activations Density 0.032%