INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ع
    0.86
    у
    0.78
    ا
    0.70
    о
    0.57
    g
    0.57
     revolt
    0.57
    е
    0.54
    ע
    0.54
    і
    0.53
    اك
    0.52
    POSITIVE LOGITS
    ert
    0.60
    radiative
    0.56
    nier
    0.55
     andre
    0.54
    arker
    0.52
     at
    0.51
     andra
    0.51
     for
    0.50
    ake
    0.50
    ্ষিকী
    0.49
    Act Density 0.002%

    No Known Activations