INDEX
Explanations
mentions of things or actions being popular
the concept of popularity in various contexts
New Auto-Interp
Negative Logits
thur
-0.88
Aviv
-0.74
heed
-0.73
aca
-0.70
ouls
-0.69
ander
-0.69
Kear
-0.68
aul
-0.65
Centauri
-0.64
cule
-0.64
POSITIVE LOGITS
popular
1.20
popular
1.11
Popular
0.85
rities
0.80
iatus
0.73
favourite
0.72
ity
0.71
unpopular
0.70
ised
0.69
circulation
0.68
Activations Density 0.016%