INDEX
    Explanations

    negative statements or self-criticism

    New Auto-Interp
    Negative Logits
     simultane
    -0.67
     mathemat
    -0.65
     bicy
    -0.62
     CY
    -0.61
     ascending
    -0.60
     blanket
    -0.60
     retirees
    -0.59
     networking
    -0.58
     Scarlet
    -0.58
     ANGEL
    -0.57
    POSITIVE LOGITS
    t
    1.53
    tion
    1.21
    tions
    1.16
    ti
    1.12
    tis
    1.10
    tre
    1.07
    tar
    1.06
    nt
    1.04
    td
    1.04
    tu
    1.02
    Act Density 0.150%

    No Known Activations