INDEX
    Explanations

    Overall followed by an evaluation

    New Auto-Interp
    Negative Logits
     as
    1.24
     
    1.23
    ل
    1.09
    ll
    1.02
    মতো
    1.01
    d
    0.96
     sposób
    0.91
     out
    0.90
     waż
    0.88
     arise
    0.88
    POSITIVE LOGITS
    ва
    1.21
    ческих
    1.13
    1.07
    <0x80>
    1.03
    ών
    1.03
    ной
    0.96
    щий
    0.95
    ρι
    0.92
    сти
    0.92
    ческий
    0.92
    Act Density 0.017%

    No Known Activations