INDEX
Explanations
names of celebrities or public figures
mentions of proper names and surnames
New Auto-Interp
Negative Logits
agonist
-0.63
yip
-0.61
Eater
-0.59
ersive
-0.58
allery
-0.55
orney
-0.54
ymes
-0.54
netflix
-0.54
yton
-0.54
vertisements
-0.53
POSITIVE LOGITS
anc
0.68
ANC
0.64
ans
0.59
ois
0.53
cest
0.53
apt
0.51
Going
0.48
cia
0.48
ousse
0.47
ahi
0.47
Activations Density 0.208%