INDEX
Explanations
mentions of celebrities
mentions of celebrities
New Auto-Interp
Negative Logits
choes
-0.84
anus
-0.81
hematic
-0.80
tered
-0.76
nea
-0.73
doors
-0.73
nerg
-0.72
sterdam
-0.71
THER
-0.70
¼
-0.68
POSITIVE LOGITS
rities
1.16
endors
1.01
endorsements
1.00
celebrities
0.90
chef
0.88
wcs
0.87
chefs
0.85
gossip
0.83
idols
0.80
celebrity
0.79
Activations Density 0.021%