INDEX
    Explanations

    instances of specific verbs indicating actions or states

    New Auto-Interp
    Negative Logits
    nosis
    -0.18
    eyse
    -0.18
    ecko
    -0.17
    íĹĪ
    -0.16
    eph
    -0.16
    ï¼
    -0.16
    пе
    -0.15
    wy
    -0.15
    sis
    -0.15
    istrovstvÃŃ
    -0.14
    POSITIVE LOGITS
    onto
    0.16
    581
    0.15
     ben
    0.14
    brick
    0.14
    ering
    0.14
    ÏĦαν
    0.14
     Om
    0.14
     Saint
    0.14
     deeper
    0.14
    upt
    0.13
    Act Density 0.010%

    No Known Activations