INDEX
Explanations
words related to advertisements like "poster" and "bragging"
references to posters and related imagery
New Auto-Interp
Negative Logits
estial
-0.81
Ago
-0.78
%]
-0.77
ESSION
-0.74
hews
-0.73
owship
-0.71
IELD
-0.67
Liberties
-0.67
IVE
-0.67
efe
-0.66
POSITIVE LOGITS
poster
1.07
posters
0.99
iors
0.88
flyer
0.84
onymous
0.81
Poster
0.79
ieu
0.79
pillar
0.77
ity
0.73
flyers
0.72
Activations Density 0.014%