INDEX
    Explanations

    mentions of negative events or allegations

    dialogue or quotes from various speakers in a text

    New Auto-Interp
    Negative Logits
    ¬¼
    -0.82
    ²¾
    -0.80
    ħĭ
    -0.80
    etheless
    -0.79
    ĻĤ
    -0.69
    anmar
    -0.69
    ©¶æ
    -0.68
    ļéĨĴ
    -0.65
    ĪĴ
    -0.64
    Ĭ
    -0.63
    POSITIVE LOGITS
     writes
    1.28
     wrote
    1.22
     reads
    1.18
     explains
    1.13
     recalls
    1.12
     according
    1.05
     says
    1.05
     recalled
    1.05
     observes
    1.04
     explained
    1.03
    Act Density 0.101%

    No Known Activations