INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    1.97
     with
    1.17
     was
    1.10
    y
    1.05
    d
    1.04
    r
    1.01
     =
    0.97
     from
    0.96
     as
    0.94
    ی
    0.94
    POSITIVE LOGITS
    𝒆
    1.06
    на
    1.02
    การ
    1.00
     prohibitions
    0.93
    ban
    0.92
    </strong>
    0.91
     banning
    0.90
    側の
    0.87
    esses
    0.86
     मदत
    0.86
    Act Density 0.013%

    No Known Activations