INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ن
    2.17
    1.88
    1.76
    н
    1.71
    ه
    1.64
    1.63
    n
    1.55
    an
    1.52
    ان
    1.49
    r
    1.48
    POSITIVE LOGITS
     by
    1.62
     as
    1.52
     la
    1.45
     c
    1.34
     to
    1.33
     thirty
    1.23
     o
    1.15
     f
    1.12
     le
    1.11
     $
    1.09
    Act Density 0.000%

    No Known Activations