INDEX
    Explanations

    non-English characters, potentially in a specific language or encoding

    New Auto-Interp
    Negative Logits
    illas
    -0.69
    illa
    -0.68
    aic
    -0.67
    eers
    -0.67
     logger
    -0.66
    olves
    -0.64
    stadt
    -0.63
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    -0.61
     chau
    -0.59
     levers
    -0.58
    POSITIVE LOGITS
    ħ
    0.94
    ¾
    0.85
    ¼
    0.84
    Į
    0.84
    ãģį
    0.83
    ttp
    0.81
    İ
    0.78
    α
    0.76
    Ĩ
    0.76
    ŀ
    0.75
    Act Density 9.747%

    No Known Activations