INDEX
    Explanations

    non-English characters or gibberish symbols

    occurrences of a specific character or symbol in various contexts

    New Auto-Interp
    Negative Logits
     Gap
    -0.85
    ierrez
    -0.74
     Extrem
    -0.68
    aneers
    -0.67
    raints
    -0.65
    htaking
    -0.64
    itches
    -0.63
     similarity
    -0.63
    ativity
    -0.63
    gewater
    -0.62
    POSITIVE LOGITS
    İ
    1.17
    å§«
    1.11
    士
    1.11
    ãĤ¨ãĥ«
    1.09
    âĶĢâĶĢ
    0.98
    Ü
    0.97
    ¯¯¯¯
    0.89
    âĸijâĸij
    0.87
    女
    0.86
    ¯¯
    0.84
    Act Density 0.001%

    No Known Activations