INDEX
Explanations
references to celebrities
mentions of celebrities
New Auto-Interp
Negative Logits
doors
-0.71
choes
-0.70
hematic
-0.69
Wind
-0.69
nerg
-0.66
uv
-0.66
atives
-0.66
empty
-0.65
nda
-0.64
rt
-0.64
POSITIVE LOGITS
rities
1.14
wcs
1.05
celebrities
1.00
endors
0.97
celebrity
0.93
endorsements
0.89
celeb
0.85
gossip
0.81
superstar
0.78
Celebrity
0.77
Activations Density 0.012%