INDEX
    Explanations

    words containing specific special characters like Arabic or similar characters

    New Auto-Interp
    Negative Logits
     coff
    -0.69
     asleep
    -0.67
     gas
    -0.66
     adaptation
    -0.62
     blitz
    -0.61
     McC
    -0.61
     jail
    -0.61
     Hamm
    -0.60
     sleep
    -0.60
     bro
    -0.60
    POSITIVE LOGITS
    Ĵ
    4.49
    ĵ
    2.07
    IJ
    1.96
    Ķ
    1.88
    ij
    1.82
    İ
    1.82
    ¢
    1.78
    ı
    1.76
    ¡
    1.72
    ĸ
    1.69
    Act Density 0.011%

    No Known Activations