INDEX
    Explanations

    non-English characters or symbols often used in specific cultural and contextual discussions

    New Auto-Interp
    Negative Logits
     ly
    -0.16
    ÙIJÙĥ
    -0.15
    heimer
    -0.14
    æľīåħ³
    -0.14
     rats
    -0.14
     Ú¯ÛĮرد
    -0.14
     bott
    -0.14
     fib
    -0.14
    ÌĢ
    -0.14
    are
    -0.14
    POSITIVE LOGITS
    etas
    0.17
     Jeg
    0.17
    лагод
    0.16
    âĢĮ
    0.16
    ÃĽ
    0.16
    çe
    0.15
    Ú¯
    0.15
    inx
    0.15
     بÙĩ
    0.15
    .ops
    0.15
    Act Density 0.003%

    No Known Activations