INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    orc
    -0.09
    adena
    -0.09
     riot
    -0.09
    ¶Į
    -0.08
     contrary
    -0.08
     ump
    -0.08
    braco
    -0.08
    ITT
    -0.08
     arist
    -0.08
    Ħĸ
    -0.08
    POSITIVE LOGITS
     oh
    0.28
     Oh
    0.23
     yet
    0.18
    Oh
    0.17
     Yet
    0.17
    oh
    0.17
    Yet
    0.16
     ah
    0.15
     OH
    0.14
     Ah
    0.12
    Act Density 0.076%

    No Known Activations