INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.38
     (
    1.34
    ا
    1.12
     assassin
    1.06
    ,
    1.02
     acacia
    1.00
     can
    0.96
    ல்
    0.95
    тся
    0.95
     enzyme
    0.95
    POSITIVE LOGITS
    w
    1.36
    om
    1.27
    1
    1.25
    2
    1.20
    1.20
    t
    1.19
    negl
    1.18
    d
    1.18
    ne
    1.16
    1.16
    Act Density 0.543%

    No Known Activations