INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Majefty
    -0.85
    ally
    -0.80
    ſelves
    -0.75
     moschino
    -0.75
    ſelf
    -0.75
    ostante
    -0.70
     Reſ
    -0.70
     bershka
    -0.69
     purpoſe
    -0.69
     omnia
    -0.67
    POSITIVE LOGITS
    e
    1.62
    s
    1.16
    a
    1.04
    i
    1.01
    o
    0.96
    ه
    0.87
    ی
    0.84
    y
    0.82
    t
    0.79
    ete
    0.69
    Act Density 0.103%

    No Known Activations