INDEX
    Explanations

    emotional reactions and states

    New Auto-Interp
    Negative Logits
    ặp
    0.95
     controversial
    0.81
     contentious
    0.78
     scary
    0.75
    irstyle
    0.75
     intimidating
    0.72
    افه
    0.72
     erred
    0.71
     Scary
    0.70
     memungkinkan
    0.70
    POSITIVE LOGITS
    ingly
    1.08
    不已
    0.99
     watching
    0.98
    스러운
    0.93
    Watching
    0.92
    Knowing
    0.89
     Watching
    0.89
     speechless
    0.89
     knowing
    0.88
     thoughts
    0.88
    Act Density 0.294%

    No Known Activations