INDEX
    Explanations

    terms related to control and repression of emotions or dissent

    New Auto-Interp
    Negative Logits
    andra
    -0.14
    /use
    -0.14
    bee
    -0.14
    enou
    -0.14
    lut
    -0.14
    ifu
    -0.13
    ismu
    -0.13
     kinh
    -0.13
    ìĭ
    -0.13
     Ihnen
    -0.13
    POSITIVE LOGITS
    /mit
    0.24
    ä½ı
    0.20
     ä½ı
    0.19
     expectations
    0.19
    /null
    0.17
    /pre
    0.17
     potential
    0.16
     further
    0.15
     expected
    0.14
    645
    0.14
    Act Density 0.106%

    No Known Activations