INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     someone
    -0.08
     jemanden
    -0.07
     qualifying
    -0.07
     Canary
    -0.07
    คู่
    -0.07
     직원
    -0.07
     Bruno
    -0.07
     Essence
    -0.07
     oven
    -0.07
     Principles
    -0.07
    POSITIVE LOGITS
     rhetorical
    0.08
    itura
    0.08
    336
    0.08
     honors
    0.08
     Oakland
    0.08
    ాడ
    0.08
     capitalization
    0.08
     casing
    0.08
     stopwatch
    0.07
    726
    0.07
    Act Density 0.012%

    No Known Activations