INDEX
    Explanations

    words related to negative attributes or consequences

    instances of the word "ill" and its variations, indicating a focus on concepts relating to negative health or logical fallacies

    New Auto-Interp
    Negative Logits
    ļéĨĴ
    -0.75
    */(
    -0.75
    EStream
    -0.74
    uyomi
    -0.73
    kefeller
    -0.73
    âĹ¼
    -0.67
     compr
    -0.67
    ©¶æ¥µ
    -0.66
     derog
    -0.65
    EStreamFrame
    -0.64
    POSITIVE LOGITS
    uminati
    1.30
    inois
    1.12
    ogical
    1.11
    umin
    1.08
    awar
    1.08
    iberal
    1.06
    igan
    0.99
    nesses
    0.98
    ison
    0.98
    usive
    0.96
    Act Density 0.009%

    No Known Activations