INDEX
    Explanations

    words in a foreign language that are not part of the standard English alphabet

    specific non-Latin characters or symbols

    New Auto-Interp
    Negative Logits
    iture
    -0.68
     Pers
    -0.66
     model
    -0.66
     respectively
    -0.65
     Pere
    -0.64
     net
    -0.64
     roll
    -0.64
     Cly
    -0.62
     PE
    -0.61
     Blank
    -0.61
    POSITIVE LOGITS
    Ń
    4.48
    ¬
    2.32
    ®
    2.09
    ¯
    1.95
    «
    1.86
    µ
    1.79
    º
    1.73
    ³
    1.72
    ¸
    1.70
    ī
    1.69
    Act Density 0.001%

    No Known Activations