INDEX
    Explanations

    words related to grievances or complaints

    New Auto-Interp
    Negative Logits
    aqu
    -0.19
    eut
    -0.15
    tains
    -0.15
    xfa
    -0.15
    nero
    -0.14
     Hab
    -0.14
    onso
    -0.14
    izard
    -0.14
    aç
    -0.14
     habit
    -0.14
    POSITIVE LOGITS
    ovsky
    0.17
    asje
    0.16
     Giang
    0.16
     ÑĤвеÑĢд
    0.15
    ÑĤеÑĢн
    0.15
    .mob
    0.14
    stell
    0.14
    اÙī
    0.14
    igo
    0.14
    .bias
    0.14
    Act Density 0.018%

    No Known Activations