INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    c
    1.41
    ه
    1.29
    a
    1.27
     contamin
    1.12
    лся
    1.02
    ה
    1.01
     venezol
    0.97
    τής
    0.96
    ول
    0.94
     desenvolv
    0.94
    POSITIVE LOGITS
     whom
    1.36
     Whom
    1.23
    whom
    1.18
    -
    1.09
    na
    1.04
    1.02
     selben
    1.02
    >
    0.91
    ue
    0.91
    '
    0.90
    Act Density 0.001%

    No Known Activations