INDEX
    Explanations

    letters and symbols related to non-English languages like Turkish

    New Auto-Interp
    Negative Logits
     Perkins
    -0.74
    enegger
    -0.72
    WARD
    -0.71
     Binary
    -0.67
     Mandela
    -0.66
     Stard
    -0.66
    ifying
    -0.65
    mort
    -0.65
     mutual
    -0.64
     Silk
    -0.64
    POSITIVE LOGITS
    ĥ
    1.75
    ķ
    1.53
    Ĵ
    1.53
    Ĺ
    1.51
    Ń
    1.50
    Ģ
    1.50
    ī
    1.49
    İ
    1.49
    Į
    1.48
    ħ
    1.46
    Act Density 0.008%

    No Known Activations