INDEX
    Explanations

    words related to suppression or control of information or actions

    terms related to suppression or repression

    New Auto-Interp
    Negative Logits
    sen
    -0.83
     Journey
    -0.73
    rians
    -0.72
    deals
    -0.71
    ser
    -0.70
    Step
    -0.69
    ubuntu
    -0.69
    Sky
    -0.69
    PART
    -0.69
    ature
    -0.69
    POSITIVE LOGITS
     suppress
    1.27
     suppression
    1.17
     suppressing
    1.16
     suppressed
    1.08
     muzzle
    0.87
    ively
    0.83
    encing
    0.75
     inhib
    0.70
    ences
    0.70
     apparatus
    0.67
    Act Density 0.012%

    No Known Activations