INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    l
    1.07
    s
    1.05
    n
    1.00
    r
    0.99
    0.99
    t
    0.96
    0.87
    0.87
    0.86
    0.85
    POSITIVE LOGITS
     worthy
    1.25
     to
    1.23
    ة
    1.05
    are
    1.02
     by
    0.99
    hana
    0.92
     unworthy
    0.92
    ches
    0.88
    worthy
    0.86
    amente
    0.83
    Act Density 0.001%

    No Known Activations