INDEX
    Explanations

    occurrences of specific types of words or phrases in a mixture of languages

    New Auto-Interp
    Negative Logits
    alling
    -0.17
    redi
    -0.15
    ALIGN
    -0.14
    ofilm
    -0.14
     ragaz
    -0.14
    auty
    -0.14
     поÑĤÑĢап
    -0.14
    elib
    -0.14
    èĢĹ
    -0.14
    ulent
    -0.14
    POSITIVE LOGITS
     penn
    0.16
    .sdk
    0.16
     Glock
    0.15
    iterations
    0.15
     Pit
    0.15
     truncate
    0.15
    MSN
    0.15
     gle
    0.15
    meric
    0.14
     unarmed
    0.14
    Act Density 0.025%

    No Known Activations