INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     of
    1.23
     I
    1.17
     A
    1.16
     (
    1.15
    ETT
    1.15
     unavoidable
    1.14
    ا
    1.14
     H
    1.13
     and
    1.12
     or
    1.12
    POSITIVE LOGITS
     savaş
    1.48
    до
    1.42
     klassischen
    1.34
    funk
    1.30
    idze
    1.26
    ка
    1.25
     dünyanın
    1.25
    evším
    1.25
     tão
    1.24
    zelfde
    1.24
    Act Density 0.272%

    No Known Activations