INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.45
     on
    1.38
     is
    1.18
     instala
    1.18
     an
    1.16
     \
    1.16
     är
    1.15
    ח
    1.14
    ²
    1.14
    ').
    1.12
    POSITIVE LOGITS
    at
    1.75
    as
    1.57
    i
    1.57
    ad
    1.52
    the
    1.45
    in
    1.40
    d
    1.36
    ar
    1.31
    ab
    1.28
    م
    1.27
    Act Density 0.018%

    No Known Activations