INDEX
Explanations
specific mentions of the number of games or trips taken
phrases indicating recent performance or outcomes
New Auto-Interp
Negative Logits
hyde
-0.71
conom
-0.58
Background
-0.58
issance
-0.57
mund
-0.53
EStreamFrame
-0.52
Terrorism
-0.52
clerosis
-0.51
Moder
-0.51
oller
-0.50
POSITIVE LOGITS
two
1.25
four
1.25
three
1.24
eight
1.20
five
1.19
seven
1.18
nine
1.16
six
1.14
eleven
1.04
twelve
1.03
Activations Density 0.089%