INDEX
    Explanations

    phrases that denote monitoring and observation

    New Auto-Interp
    Negative Logits
    ivol
    -0.16
    228
    -0.15
    666
    -0.15
    weise
    -0.15
    369
    -0.14
    PR
    -0.14
    706
    -0.14
    409
    -0.14
    ips
    -0.14
    own
    -0.14
    POSITIVE LOGITS
    eil
    0.15
    eut
    0.15
    osit
    0.14
    iel
    0.14
    fang
    0.14
     sight
    0.14
    erset
    0.14
    åde
    0.14
    enan
    0.14
    ffen
    0.14
    Act Density 0.033%

    No Known Activations