INDEX
Explanations
mentions of "Dodger" related terms
references to a specific sports team and related entities
New Auto-Interp
Negative Logits
ropolitan
-0.80
UGE
-0.74
OURCE
-0.73
SHARE
-0.70
prick
-0.69
HAHA
-0.68
xual
-0.67
ASE
-0.66
oteric
-0.66
ISTER
-0.64
POSITIVE LOGITS
s
1.05
oslov
1.05
sworth
0.94
gments
0.91
erick
0.91
son
0.90
eous
0.89
iots
0.87
gement
0.85
icative
0.85
Activations Density 0.033%