INDEX
Explanations
references to the TV show "WWE Monday Night Raw."
references to the WWE show "Raw."
New Auto-Interp
Negative Logits
etheless
-0.84
uate
-0.84
repre
-0.80
uated
-0.79
ortium
-0.73
anwhile
-0.71
raints
-0.70
mercial
-0.70
arios
-0.69
therap
-0.69
POSITIVE LOGITS
lings
1.32
lins
1.20
hide
1.19
ls
1.17
leigh
0.98
kus
0.90
esome
0.88
burn
0.86
cliffe
0.86
ford
0.85
Activations Density 0.004%