INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    o
    1.48
    AM
    1.43
    AN
    1.25
    IT
    1.25
    IF
    1.24
    U
    1.24
    AW
    1.22
    IP
    1.21
     in
    1.20
    AF
    1.17
    POSITIVE LOGITS
    <0x80>
    0.87
    <0xBB>
    0.84
    ţin
    0.84
    ков
    0.80
     fashioned
    0.77
    ći
    0.76
    angas
    0.75
     SBOM
    0.73
    ubah
    0.73
    ationen
    0.73
    Act Density 0.004%

    No Known Activations