INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ς
    1.34
    ן
    1.18
    ні
    1.01
    eli
    1.00
    1.00
    ani
    0.97
    য়ের
    0.96
    dır
    0.94
    ри
    0.94
    ва
    0.92
    POSITIVE LOGITS
    to
    1.73
    可以
    1.19
    1.18
    ول
    1.13
     a
    1.09
     Flight
    1.05
     you
    1.04
    O
    0.98
    中国
    0.97
    },
    0.97
    Act Density 0.006%

    No Known Activations