INDEX
    Explanations

    gaslighting

    New Auto-Interp
    Negative Logits
     pcl
    -0.09
     плот
    -0.08
     consumir
    -0.08
     pinterest
    -0.08
     пить
    -0.07
    ovirus
    -0.07
     grate
    -0.07
     плит
    -0.07
     ضرورت
    -0.07
     svm
    -0.07
    POSITIVE LOGITS
    0.08
     torture
    0.07
    cred
    0.07
     włas
    0.07
    cac
    0.07
     dre
    0.07
    foreground
    0.07
     repression
    0.07
    alter
    0.07
    0.07
    Act Density 0.005%

    No Known Activations