INDEX
    Explanations

    special characters or symbols primarily used in non-Latin scripts

    New Auto-Interp
    Negative Logits
    ters
    -0.88
    rette
    -0.83
     bluff
    -0.80
    verts
    -0.78
    sett
    -0.78
    fet
    -0.77
    raviolet
    -0.76
    zees
    -0.75
    iannopoulos
    -0.72
    anooga
    -0.72
    POSITIVE LOGITS
    ĩ
    1.25
    ī
    1.20
    Į
    1.13
    ĥ
    1.10
    į
    1.06
    ĭ
    1.04
    ا
    1.03
    Ĥ
    1.02
    à¤
    0.96
    IJ
    0.95
    Act Density 0.005%

    No Known Activations