INDEX
    Explanations

    the word "prior" and its variations, indicating a focus on previous experience or events

    New Auto-Interp
    Negative Logits
    chin
    -0.15
    832
    -0.15
    empo
    -0.15
    avn
    -0.15
    oton
    -0.14
    cheon
    -0.14
    elah
    -0.14
    v
    -0.14
    leme
    -0.14
    packing
    -0.14
    POSITIVE LOGITS
    /current
    0.22
    itize
    0.16
    imes
    0.15
    itized
    0.15
    itar
    0.15
    ileges
    0.14
    à¹īà¸ĩ
    0.14
    umat
    0.14
    Argb
    0.14
    Ŀ
    0.14
    Act Density 0.010%

    No Known Activations