INDEX
    Explanations

    references to images with accompanying text descriptions

    the presence of the word "Hide" in the context of content presentation

    New Auto-Interp
    Negative Logits
    etheless
    -0.85
    eele
    -0.78
    ammy
    -0.77
    ilingual
    -0.77
    odic
    -0.74
    ounty
    -0.73
    rontal
    -0.70
    issance
    -0.70
    enegger
    -0.68
     confir
    -0.68
    POSITIVE LOGITS
     Caption
    1.08
    away
    0.98
    Hide
    0.93
     Hide
    0.89
    ously
    0.85
    Streamer
    0.74
    hide
    0.74
    Pic
    0.71
    Emb
    0.70
    Track
    0.70
    Act Density 0.014%

    No Known Activations