INDEX
    Explanations

    specific characters or symbols in the text

    New Auto-Interp
    Negative Logits
    ——
    -0.16
     fuck
    -0.16
    ÑijÑĢ
    -0.15
     FUCK
    -0.15
     shitty
    -0.14
    âĢIJ
    -0.14
     valueForKey
    -0.14
     fucking
    -0.14
    -0.14
    -0.14
    POSITIVE LOGITS
     Explorer
    0.17
    --
    0.16
     elsewhere
    0.16
    Else
    0.15
     Privacy
    0.14
    trimmed
    0.14
    enegro
    0.14
     jinak
    0.14
     Experts
    0.14
     Flores
    0.14
    Act Density 0.003%

    No Known Activations