INDEX
    Explanations

    captions in images

    mentions of image captions and their corresponding attributes

    New Auto-Interp
    Negative Logits
    <|endoftext|>
    -0.74
    aturdays
    -0.69
    akespeare
    -0.68
    onest
    -0.67
     bom
    -0.61
     recl
    -0.61
     territ
    -0.60
    ury
    -0.60
    stab
    -0.59
     reborn
    -0.59
    POSITIVE LOGITS
     GOODMAN
    0.92
     Gladiator
    0.75
    Phys
    0.75
    UTERS
    0.73
     Javascript
    0.68
     IMAGES
    0.65
     Mandatory
    0.65
     Immun
    0.64
    Ability
    0.63
    itars
    0.63
    Act Density 0.094%

    No Known Activations