INDEX
Explanations
words related to various universities, possibly associated with sports teams or news articles
abbreviations or abbreviatory forms of male names or terms
New Auto-Interp
Negative Logits
00007
-0.83
WD
-0.63
blame
-0.63
estyles
-0.61
precincts
-0.61
NK
-0.58
paw
-0.58
fiat
-0.58
sidx
-0.58
Mechdragon
-0.57
POSITIVE LOGITS
pillar
0.86
heim
0.79
leigh
0.75
lectic
0.73
ertain
0.73
otropic
0.71
zona
0.71
ritz
0.70
ENA
0.69
ylum
0.69
Activations Density 0.054%