INDEX
    Explanations

    verbs that indicate actions or changes

    New Auto-Interp
    Negative Logits
    679
    -0.17
    reesome
    -0.15
     болÑĮ
    -0.15
    .alibaba
    -0.15
    699
    -0.14
    aval
    -0.14
    iddles
    -0.13
    elson
    -0.13
    izo
    -0.13
    bout
    -0.13
    POSITIVE LOGITS
     something
    0.57
    something
    0.48
     another
    0.43
     perhaps
    0.42
    Something
    0.42
     yet
    0.42
     Something
    0.42
    yet
    0.38
    another
    0.38
    perhaps
    0.34
    Act Density 0.059%

    No Known Activations