INDEX
Explanations
references to images with accompanying text descriptions
the presence of the word "Hide" in the context of content presentation
New Auto-Interp
Negative Logits
etheless
-0.85
eele
-0.78
ammy
-0.77
ilingual
-0.77
odic
-0.74
ounty
-0.73
rontal
-0.70
issance
-0.70
enegger
-0.68
confir
-0.68
POSITIVE LOGITS
Caption
1.08
away
0.98
Hide
0.93
Hide
0.89
ously
0.85
Streamer
0.74
hide
0.74
Pic
0.71
Emb
0.70
Track
0.70
Activations Density 0.014%