INDEX
Explanations
phrases indicating temporal sequence or order
phrases indicating time or sequential events
New Auto-Interp
Negative Logits
Policy
-0.74
bill
-0.73
alth
-0.72
opian
-0.71
Federal
-0.70
payer
-0.70
onym
-0.70
insula
-0.70
ument
-0.68
olitical
-0.67
POSITIVE LOGITS
teammate
1.14
halftime
1.07
averaging
1.00
juries
1.00
teammates
0.97
Jere
0.95
rookies
0.93
Reggie
0.92
Kyle
0.92
finishing
0.90
Activations Density 0.223%