INDEX
    Explanations

    references to group dynamics and collective experiences

    New Auto-Interp
    Negative Logits
    of
    -0.16
    asant
    -0.14
    elcome
    -0.14
    azu
    -0.14
    oi
    -0.14
     itself
    -0.14
     ne
    -0.14
    ovu
    -0.14
     of
    -0.13
    elf
    -0.13
    POSITIVE LOGITS
    addon
    0.17
    semb
    0.15
    maal
    0.15
    anon
    0.15
    cuts
    0.14
    glich
    0.14
    gos
    0.14
    opc
    0.14
     jadx
    0.14
    astype
    0.13
    Act Density 0.046%

    No Known Activations