INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    is
    2.22
    ت
    1.91
    as
    1.83
    ع
    1.77
    en
    1.73
    u
    1.71
    т
    1.68
    on
    1.66
    t
    1.58
    ur
    1.54
    POSITIVE LOGITS
     hero
    1.14
    9
    1.11
    ája
    1.06
     trois
    0.89
     cinco
    0.88
     Hero
    0.87
     olimp
    0.86
    Hero
    0.85
    0.85
    hero
    0.85
    Act Density 0.010%

    No Known Activations