INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    star
    0.51
    water
    0.50
    therapy
    0.49
    ogast
    0.48
    wine
    0.47
     W
    0.47
     Medications
    0.47
     Wisconsin
    0.46
    wick
    0.45
    thorn
    0.45
    POSITIVE LOGITS
    یم
    0.49
    نګه
    0.48
     بسیاری
    0.46
     گراف
    0.46
    یر
    0.46
     درباره
    0.46
     konuş
    0.45
     منفی
    0.43
    क्षित
    0.43
     بانک
    0.43
    Act Density 0.012%

    No Known Activations