INDEX
Explanations
references to celebrity culture and public figures
New Auto-Interp
Negative Logits
pta
-0.15
\OptionsResolver
-0.15
igans
-0.14
ntag
-0.14
independents
-0.14
NÄĽm
-0.14
arbeit
-0.14
izu
-0.14
aise
-0.14
iap
-0.13
POSITIVE LOGITS
celebrities
0.40
celebrity
0.39
cele
0.36
stars
0.35
star
0.30
stars
0.28
famous
0.27
cele
0.27
-ce
0.26
superstar
0.26
Activations Density 0.210%