INDEX
Explanations
mentions of sporting events or achievements, particularly in baseball
references to "major league" and "major" in the context of sports
New Auto-Interp
Negative Logits
minist
-0.76
tein
-0.73
uber
-0.69
Prompt
-0.68
merga
-0.66
Mim
-0.66
Rosenberg
-0.65
Canaver
-0.64
slaughtered
-0.63
vag
-0.62
POSITIVE LOGITS
league
0.99
leagues
0.91
itized
0.88
league
0.88
tournaments
0.87
disadvant
0.83
League
0.83
depressive
0.81
League
0.78
liga
0.78
Activations Density 0.019%