INDEX
Explanations
mentions of things being or becoming popular
references to the concept of popularity
New Auto-Interp
Negative Logits
¯¯
-0.86
erm
-0.82
ibur
-0.79
abol
-0.76
alk
-0.74
cise
-0.74
holes
-0.71
htaking
-0.69
gans
-0.69
rib
-0.68
POSITIVE LOGITS
popularity
1.11
popular
0.91
Popular
0.86
yip
0.84
ratings
0.80
ubiqu
0.77
unpopular
0.75
renown
0.75
é¾įå¥ij士
0.75
diffusion
0.74
Activations Density 0.009%