INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sight
    -0.08
     skal
    -0.08
     Pel
    -0.07
     oprav
    -0.07
    -0.07
     pel
    -0.07
     plat
    -0.07
     cho
    -0.07
     weed
    -0.07
     ene
    -0.07
    POSITIVE LOGITS
     honestly
    0.09
     importantly
    0.08
     Kam
    0.08
    Chen
    0.07
    Cole
    0.07
    Employ
    0.07
     ومع
    0.07
     Katie
    0.07
     Kalam
    0.07
    ın
    0.07
    Act Density 0.025%

    No Known Activations