INDEX
Explanations
social media platform names, with a particular focus on Pinterest
mentions of social media platforms, particularly Pinterest
New Auto-Interp
Negative Logits
ppo
-0.79
Luxem
-0.72
perse
-0.71
phas
-0.69
Antar
-0.66
arbon
-0.65
neys
-0.65
enegger
-0.65
ysis
-0.64
lves
-0.64
POSITIVE LOGITS
1.16
1.00
0.87
avascript
0.86
0.82
Tumblr
0.80
vine
0.77
ished
0.75
umblr
0.75
oké
0.75
Activations Density 0.006%