INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ن
    1.04
    0.80
    ك
    0.74
    ت
    0.73
    0.72
    н
    0.67
    0.64
    nak
    0.60
    т
    0.57
    0.56
    POSITIVE LOGITS
    𝕡
    0.54
     competencia
    0.54
    0.53
    0.53
    йна
    0.53
    하다
    0.53
    üğünüz
    0.53
    ის
    0.52
     textual
    0.51
    の中心
    0.51
    Act Density 0.000%

    No Known Activations