INDEX
Explanations
proper nouns related to political figures, sports teams, and locations
New Auto-Interp
Negative Logits
ACTED
-0.80
Lt
-0.78
Redd
-0.73
Spy
-0.72
UGC
-0.71
CG
-0.70
ulhu
-0.68
slave
-0.67
HI
-0.66
Si
-0.66
POSITIVE LOGITS
Barron
0.96
otyp
0.75
alon
0.75
asso
0.75
abad
0.75
cloth
0.74
agus
0.71
xual
0.70
mares
0.70
agraph
0.70
Activations Density 0.263%