INDEX
    Explanations

    occurrences of specific punctuation and white space characters, likely focusing on formatting or structuring of the text

    New Auto-Interp
    Negative Logits
    SharedDtor
    -0.92
     queſta
    -0.81
     transfieras
    -0.78
    хьтан
    -0.75
     ویکی‌آمباردا
    -0.75
     صوتيه
    -0.73
     चीज़ों
    -0.73
    Jeografia
    -0.71
     ſind
    -0.70
    Бахар
    -0.69
    POSITIVE LOGITS
     hipó
    0.43
    UC
    0.40
    Bob
    0.40
    1
    0.40
    0.39
    Service
    0.38
     Bob
    0.38
    ↵↵
    0.38
    The
    0.37
    Exit
    0.37
    Act Density 0.356%

    No Known Activations