INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ere
    1.09
    s
    1.08
    ی
    1.05
    ind
    1.00
    0.98
    out
    0.96
    ars
    0.94
    ni
    0.91
    are
    0.89
    aving
    0.89
    POSITIVE LOGITS
    ج
    1.28
    ر
    1.13
    ق
    1.12
     Evolution
    1.10
    0.98
     эволю
    0.98
     evolution
    0.94
            
    0.93
    ksiyon
    0.93
     evolved
    0.93
    Act Density 0.032%

    No Known Activations