INDEX
    Explanations

    words in a foreign language, potentially indicating a specific language or pattern in the text

    New Auto-Interp
    Negative Logits
     Franch
    -0.80
     panels
    -0.75
     charm
    -0.71
    theless
    -0.68
     inev
    -0.65
     comparisons
    -0.64
     admission
    -0.64
     concede
    -0.63
     responsibility
    -0.62
     cooler
    -0.62
    POSITIVE LOGITS
    º
    1.48
    ¾
    1.47
    ²
    1.47
    ´
    1.37
    ¸
    1.35
    ¢
    1.31
    ¼
    1.30
    ©¶æ¥µ
    1.30
    ½
    1.30
    Ĩ
    1.30
    Act Density 0.015%

    No Known Activations