INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Nek
    -0.79
    with
    -0.79
     anomalous
    -0.73
    פורט
    -0.73
    Totals
    -0.73
    ji
    -0.72
    infall
    -0.71
    十六
    -0.71
    Hun
    -0.71
    HANDLER
    -0.69
    POSITIVE LOGITS
    ::_
    0.69
    %%%%%%%%%%%%%%%%
    0.68
     lidí
    0.68
    0.67
    ONE
    0.67
    🚬
    0.66
    0.66
    0.61
    بش
    0.61
    entamiento
    0.60
    Act Density 0.342%

    No Known Activations