INDEX
    Explanations

    words indicating perception or appearance

    New Auto-Interp
    Negative Logits
    BOOLE
    -0.15
    *dt
    -0.14
    aben
    -0.14
    taire
    -0.13
    andes
    -0.13
    anela
    -0.13
    etten
    -0.13
    .Suppress
    -0.13
    abant
    -0.13
    theValue
    -0.13
    POSITIVE LOGITS
     like
    0.52
     Like
    0.47
    Like
    0.41
    like
    0.39
     LIKE
    0.36
    _like
    0.34
    .like
    0.33
     likes
    0.30
     wie
    0.30
     como
    0.29
    Act Density 0.010%

    No Known Activations