INDEX
Explanations
names of celebrities
mentions of popular celebrities, particularly Kanye West
New Auto-Interp
Negative Logits
suff
-0.72
Downloadha
-0.71
ettlement
-0.70
NetMessage
-0.67
dayName
-0.67
nesota
-0.66
thia
-0.66
Proced
-0.65
ajor
-0.64
llah
-0.64
POSITIVE LOGITS
Kanye
0.91
anye
0.81
Kardashian
0.79
ipedia
0.75
mson
0.72
efully
0.70
pants
0.70
weather
0.69
reth
0.67
cé
0.66
Activations Density 0.008%