INDEX
Explanations
references to posting or sharing content on social media platforms
instances of the word "posted" in various contexts
New Auto-Interp
Negative Logits
osi
-0.72
kos
-0.70
glers
-0.69
schild
-0.67
fts
-0.64
wing
-0.63
ppo
-0.63
Iv
-0.63
mentation
-0.62
Flavoring
-0.62
POSITIVE LOGITS
hum
0.96
ulate
0.94
online
0.87
mortem
0.84
postings
0.83
onymous
0.81
pics
0.79
prominently
0.78
anonymously
0.77
gres
0.76
Activations Density 0.046%