INDEX
    Explanations

    phrases indicating personal statements or feelings of the speaker

    New Auto-Interp
    Negative Logits
    idor
    -0.19
    zew
    -0.16
    MOTE
    -0.15
    ãĤĥ
    -0.15
     neust
    -0.15
    ERSION
    -0.15
    ymi
    -0.15
    RTOS
    -0.14
    pedia
    -0.14
    ERCHANT
    -0.14
    POSITIVE LOGITS
     going
    0.23
     done
    0.19
     Going
    0.19
     gonna
    0.18
     finished
    0.18
     sorry
    0.18
     tired
    0.18
    -going
    0.17
    Going
    0.16
     coming
    0.16
    Act Density 0.207%

    No Known Activations