INDEX
Explanations
references to celebrities
occurrences and references to celebrities
New Auto-Interp
Negative Logits
choes
-0.92
hematic
-0.85
anus
-0.82
THER
-0.76
¼
-0.75
doors
-0.74
tered
-0.74
¾
-0.72
atives
-0.71
¸
-0.70
POSITIVE LOGITS
rities
1.06
endorsements
1.03
endors
0.97
chef
0.95
wcs
0.89
gossip
0.89
chefs
0.88
celebrities
0.83
idols
0.82
celebrity
0.79
Activations Density 0.028%