INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    a
    1.08
    1.04
    0.89
    se
    0.89
    r
    0.89
    t
    0.89
    ся
    0.88
    ж
    0.84
    ند
    0.83
    ше
    0.82
    POSITIVE LOGITS
     piers
    0.91
     pier
    0.90
    0.80
    Β
    0.79
     tentacles
    0.76
    ദു
    0.75
    0.75
     tecnológica
    0.75
    Это
    0.73
     énergie
    0.73
    Act Density 0.001%

    No Known Activations