INDEX
    Explanations

    numbers and special characters mixed within text

    special characters or symbols in the text

    New Auto-Interp
    Negative Logits
    raints
    -0.93
     manif
    -0.85
     Instr
    -0.84
     horizont
    -0.82
     philos
    -0.79
     womb
    -0.76
    ngth
    -0.75
     tentacles
    -0.74
     condem
    -0.73
     symp
    -0.73
    POSITIVE LOGITS
    âĶĢâĶĢ
    1.07
    âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
    1.04
    ishable
    1.00
    ãĥ¼ãĥ
    0.99
    à©
    0.92
    cffffcc
    0.90
    ļ
    0.87
    ãĥ¼ãĥ«
    0.85
    à¨
    0.85
    ा
    0.85
    Act Density 0.059%

    No Known Activations