INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     to
    1.12
     in
    0.99
    ה
    0.96
    0.93
    c
    0.89
    ING
    0.88
    NING
    0.87
     h
    0.87
     r
    0.86
     l
    0.85
    POSITIVE LOGITS
    ar
    0.89
    ορ
    0.85
    ac
    0.82
    un
    0.81
    ну
    0.80
    αλ
    0.79
    0.76
    ων
    0.75
     rétr
    0.75
    0.75
    Act Density 0.975%

    No Known Activations