INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    [color
    -0.07
    ІІ
    -0.07
     dedic
    -0.07
    fout
    -0.06
     aval
    -0.06
     dissert
    -0.06
     Celsius
    -0.06
    มหาว
    -0.06
    ження
    -0.06
    -0.06
    POSITIVE LOGITS
    roupe
    0.08
     payday
    0.07
    :",
    0.06
    便
    0.06
     least
    0.06
    endant
    0.06
     squ
    0.06
    YEAR
    0.06
    чива
    0.06
    وار
    0.06
    Act Density 0.058%

    No Known Activations