INDEX
Explanations
words related to sports, specifically hockey and government-related words
New Auto-Interp
Negative Logits
assian
-0.76
Helpful
-0.76
ussian
-0.72
#$
-0.70
pring
-0.70
Brow
-0.65
igious
-0.62
Proced
-0.61
enge
-0.60
enthal
-0.59
POSITIVE LOGITS
circles
0.91
plates
0.91
tracks
0.90
planes
0.85
wat
0.84
stores
0.84
lance
0.83
warts
0.83
codes
0.80
eworks
0.76
Activations Density 0.327%