INDEX
    Explanations

    references to physical appearances and comparisons

    New Auto-Interp
    Negative Logits
    lal
    -0.15
    evin
    -0.13
    tron
    -0.13
    oppel
    -0.13
     aud
    -0.13
    ipers
    -0.13
    lion
    -0.13
    ailand
    -0.13
     Jacobs
    -0.13
    subcategory
    -0.13
    POSITIVE LOGITS
     like
    0.53
     Like
    0.42
     likes
    0.41
     LIKE
    0.39
    like
    0.38
    Like
    0.38
    .like
    0.35
     seperti
    0.34
    _like
    0.32
     como
    0.32
    Act Density 0.086%

    No Known Activations