INDEX
Explanations
names of sports teams and their respective cities or players
references to specific sports teams and athletes
New Auto-Interp
Negative Logits
rodu
-0.67
arters
-0.66
antry
-0.61
hea
-0.60
ubb
-0.59
amily
-0.59
vironments
-0.58
ivation
-0.58
eworthy
-0.57
pee
-0.57
POSITIVE LOGITS
counterpart
0.83
playbook
0.73
connection
0.71
glove
0.70
teammate
0.69
counterparts
0.69
dictator
0.68
analogy
0.65
colleague
0.65
merger
0.65
Activations Density 0.494%