INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ри
    1.25
    ن
    1.16
    ли
    1.13
    но
    1.06
    lendi
    0.98
    ပါသည်။
    0.97
    n
    0.95
    ري
    0.94
    ви
    0.93
    ен
    0.92
    POSITIVE LOGITS
    u
    0.99
    I
    0.88
    DA
    0.81
     filosof
    0.80
    那种
    0.78
    gence
    0.74
     diskut
    0.73
    cs
    0.73
    V
    0.72
     siden
    0.70
    Act Density 0.009%

    No Known Activations