INDEX
    Explanations

    references to visual representations or mental images

    New Auto-Interp
    Negative Logits
    eners
    -0.58
    olites
    -0.52
     málo
    -0.51
    onato
    -0.50
    たまに
    -0.50
     validators
    -0.49
    onomous
    -0.48
    validators
    -0.48
    ophones
    -0.48
    onomy
    -0.48
    POSITIVE LOGITS
     picture
    1.91
     Picture
    1.74
    picture
    1.73
    Picture
    1.63
     PICTURE
    1.59
    PICTURE
    1.41
    icture
    0.94
     pic
    0.91
     pictured
    0.82
     imagen
    0.82
    Act Density 0.004%

    No Known Activations