INDEX
    Explanations

    emotional reactions or opinions expressed in written text

    expressions of disappointment or dissatisfaction

    New Auto-Interp
    Negative Logits
     Mirage
    -0.65
     pyramid
    -0.62
     accomp
    -0.62
     shadow
    -0.61
     specialist
    -0.61
     crus
    -0.60
     Camel
    -0.60
     Amon
    -0.59
     presumed
    -0.59
     trainer
    -0.59
    POSITIVE LOGITS
    onto
    1.01
    ï¸ı
    0.99
    ationally
    0.95
    gently
    0.94
    him
    0.93
    efe
    0.89
    against
    0.89
    early
    0.87
    selves
    0.85
    oward
    0.84
    Act Density 0.243%

    No Known Activations