INDEX
    Explanations

    instances of the word "act" in various forms and contexts

    New Auto-Interp
    Negative Logits
    ahan
    -0.19
    anke
    -0.17
    /fw
    -0.17
    theless
    -0.17
    jet
    -0.16
    attern
    -0.16
     Bu
    -0.15
    rieg
    -0.15
    ied
    -0.15
    otti
    -0.15
    POSITIVE LOGITS
    uator
    0.28
    UAL
    0.28
    ual
    0.25
    uar
    0.25
    uality
    0.23
    uelle
    0.23
    uated
    0.22
    uation
    0.21
    ively
    0.20
    uary
    0.20
    Act Density 0.012%

    No Known Activations