INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    firma
    -0.08
    disk
    -0.07
    fir
    -0.07
     destinado
    -0.07
     trill
    -0.07
    운데
    -0.07
    lij
    -0.07
     flirt
    -0.07
    jam
    -0.07
     destinada
    -0.07
    POSITIVE LOGITS
     Expertise
    0.09
     expertise
    0.08
    (input
    0.08
     мая
    0.07
     storyboard
    0.07
     stove
    0.07
     تقل
    0.07
    ান্ত
    0.07
    regar
    0.07
     ideology
    0.07
    Act Density 0.001%

    No Known Activations