INDEX
Explanations
names of sports teams
references to sports teams
New Auto-Interp
Negative Logits
ctors
-0.82
ctor
-0.75
uters
-0.74
ricks
-0.64
ivably
-0.64
ted
-0.64
ographer
-0.63
ilus
-0.63
ntax
-0.62
oxic
-0.62
POSITIVE LOGITS
'
1.09
hift
1.08
mith
0.95
peed
0.89
layer
0.89
boro
0.88
warm
0.84
heet
0.83
hip
0.81
burg
0.80
Activations Density 0.132%