INDEX
Explanations
social media platform mentions
references to social media platforms, particularly Pinterest and Twitter
New Auto-Interp
Negative Logits
thro
-0.81
pill
-0.69
iren
-0.68
hid
-0.67
pill
-0.67
spoiler
-0.66
©¶æ¥µ
-0.65
liner
-0.63
pled
-0.62
lie
-0.62
POSITIVE LOGITS
0.85
PHOTO
0.84
Photograph
0.84
Images
0.80
IMAGES
0.77
atoon
0.74
Sergeant
0.71
iewicz
0.69
Painting
0.67
Pic
0.66
Activations Density 0.014%