INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ER
    0.75
    3
    0.73
    لا
    0.70
    س
    0.69
    2
    0.68
    5
    0.67
    4
    0.66
    يت
    0.66
    До
    0.66
     и
    0.66
    POSITIVE LOGITS
    ,
    0.79
    0.77
    ↵↵
    0.75
     hedon
    0.71
     Pearce
    0.68
     perpetuated
    0.67
     Ironically
    0.65
     desses
    0.64
     (=
    0.63
     verwend
    0.62
    Act Density 0.462%

    No Known Activations