INDEX
    Explanations

    phrases that describe appearances or visual characteristics

    New Auto-Interp
    Negative Logits
    roker
    -0.15
    874
    -0.15
    utra
    -0.14
    ADX
    -0.14
     Ú¯ÛĮرÛĮ
    -0.14
    iet
    -0.14
    иÑģÑĤÑĢа
    -0.14
    looking
    -0.14
    æ´¥
    -0.14
    oeff
    -0.13
    POSITIVE LOGITS
     like
    0.43
     Like
    0.36
    like
    0.35
    Like
    0.34
     LIKE
    0.31
    LIKE
    0.30
    -like
    0.28
    _like
    0.27
     likes
    0.25
    .like
    0.24
    Act Density 0.007%

    No Known Activations