INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ના
    1.32
     are
    1.14
    1.05
     болезнь
    0.98
    0.97
    ς
    0.95
    0.95
    ला
    0.93
    یاء
    0.93
    ामध्ये
    0.92
    POSITIVE LOGITS
    0
    1.70
    1.53
    1.35
     écriv
    1.21
    }$
    1.20
    that
    1.16
    ור
    1.16
    to
    1.16
    1.13
    ip
    1.12
    Act Density 0.027%

    No Known Activations