INDEX
    Explanations

    instances of high activation on punctuation marks, especially periods

    New Auto-Interp
    Negative Logits
    ugin
    -0.18
    unn
    -0.15
    @Id
    -0.15
     maduras
    -0.14
     sober
    -0.14
     universal
    -0.14
    plan
    -0.14
    Ãłu
    -0.13
    annel
    -0.13
    ative
    -0.13
    POSITIVE LOGITS
    krit
    0.16
    oka
    0.15
    forman
    0.14
    781
    0.14
    andle
    0.14
    inu
    0.14
    vig
    0.14
    gary
    0.13
    ppe
    0.13
    fault
    0.13
    Act Density 0.185%

    No Known Activations