INDEX
    Explanations

    phrases related to causality or consequences

    instances of emotional or evaluative language

    New Auto-Interp
    Negative Logits
    çīĪ
    -0.75
     STATS
    -0.75
    è£ħ
    -0.70
     gad
    -0.69
    omorphic
    -0.68
    racuse
    -0.66
    quished
    -0.64
    ãĥīãĥ©
    -0.63
     cyan
    -0.63
     messenger
    -0.62
    POSITIVE LOGITS
    º
    0.87
    ¡
    0.85
    Ĵ
    0.80
    ł
    0.79
    ĵ
    0.79
    £
    0.78
    ¬
    0.73
    ¼
    0.72
    ¢
    0.70
    Ķ
    0.69
    Act Density 0.372%

    No Known Activations