INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ре
    0.96
    مان
    0.81
    0.80
    achusetts
    0.78
    رفة
    0.77
    می
    0.77
    는데
    0.76
    ها
    0.74
    льнявыя
    0.74
     mágico
    0.74
    POSITIVE LOGITS
    in
    1.31
    י
    1.20
    t
    1.15
    т
    1.09
    ت
    1.07
    ת
    1.05
    ר
    1.01
    تهم
    0.99
     K
    0.98
    the
    0.97
    Act Density 0.022%

    No Known Activations