INDEX
    Explanations

    phrases indicating transformation or change in a state or condition

    New Auto-Interp
    Negative Logits
    noop
    -0.18
    umba
    -0.15
    itics
    -0.15
     recent
    -0.14
     older
    -0.14
     çİ
    -0.14
    .hm
    -0.14
     previous
    -0.14
    ¯
    -0.14
    ее
    -0.14
    POSITIVE LOGITS
     full
    0.36
     fully
    0.25
     bona
    0.25
    (full
    0.25
    full
    0.24
     actual
    0.24
    /full
    0.24
     something
    0.23
     mini
    0.23
    _full
    0.22
    Act Density 0.249%

    No Known Activations