INDEX
    Explanations

    words related to negative outcomes or distressing experiences

    New Auto-Interp
    Negative Logits
    inoa
    -0.74
     Iw
    -0.74
    taboola
    -0.73
     Tsukuyomi
    -0.70
     Debor
    -0.68
    eva
    -0.68
     Ezek
    -0.67
     Niet
    -0.66
     Sonia
    -0.65
    âĵĺ
    -0.64
    POSITIVE LOGITS
    icultural
    0.81
    aday
    0.78
    quarters
    0.73
    angs
    0.70
    lights
    0.70
    abee
    0.66
    sters
    0.66
    dog
    0.66
    acan
    0.65
    ogeneous
    0.65
    Act Density 0.033%

    No Known Activations