INDEX
    Explanations

    reflections and thoughts about personal experiences and feelings

    New Auto-Interp
    Negative Logits
    onn
    -0.15
    rone
    -0.15
    ime
    -0.14
    uc
    -0.14
    hap
    -0.14
    алом
    -0.14
    els
    -0.13
    nell
    -0.13
    von
    -0.13
     ith
    -0.13
    POSITIVE LOGITS
    :
    0.32
    0.31
     oh
    0.23
     ok
    0.23
    0.22
     hey
    0.21
     `
    0.21
     «
    0.21
     '
    0.20
     Oh
    0.20
    Act Density 0.263%

    No Known Activations