INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.23
     \
    1.08
    1.02
     insanely
    0.99
     imperialism
    0.99
    lerin
    0.96
     that
    0.96
    .\
    0.96
    lerden
    0.91
    spannung
    0.91
    POSITIVE LOGITS
    1.55
    ul
    1.26
    il
    1.22
    1.21
    ه
    1.21
    a
    1.15
    ו
    1.12
    са
    1.10
    in
    1.08
    1.03
    Act Density 0.250%

    No Known Activations