INDEX
    Explanations

    phrases indicating change or transition

    New Auto-Interp
    Negative Logits
    faf
    -0.15
    _cpp
    -0.14
    ิศ
    -0.14
    Ù쨧ÙĤ
    -0.14
    issan
    -0.14
    cxx
    -0.14
    arkin
    -0.13
    enheim
    -0.13
    elle
    -0.13
    istar
    -0.13
    POSITIVE LOGITS
     follow
    1.23
    follow
    1.17
     Follow
    1.16
     follows
    1.13
     followed
    1.09
    Follow
    1.09
     FOLLOW
    0.99
    -follow
    0.97
    _follow
    0.93
    .follow
    0.92
    Act Density 0.293%

    No Known Activations