INDEX
    Explanations

    starting to or improving performance

    New Auto-Interp
    Negative Logits
    ancies
    0.53
    as
    0.51
    ata
    0.47
    ates
    0.47
    otas
    0.46
    atino
    0.44
    igail
    0.44
    ikli
    0.44
     linge
    0.43
    cik
    0.43
    POSITIVE LOGITS
     hairy
    0.50
    τι
    0.49
    כ
    0.48
    וב
    0.46
    נ
    0.46
    תי
    0.45
    0.44
     ن
    0.43
     הצ
    0.43
     करणे
    0.43
    Act Density 0.000%

    No Known Activations