INDEX
    Explanations

    discussions about perspective-taking and empathy

    New Auto-Interp
    Negative Logits
    Strict
    -0.16
    ernals
    -0.15
    AVA
    -0.15
     Chow
    -0.15
    engu
    -0.15
    estead
    -0.15
    raud
    -0.14
    orden
    -0.14
     Strict
    -0.14
    aternity
    -0.14
    POSITIVE LOGITS
    Ñıб
    0.15
     pres
    0.15
     Others
    0.15
    canf
    0.14
     Cunningham
    0.14
     backs
    0.14
    gel
    0.14
    pres
    0.14
    aupt
    0.14
    Others
    0.13
    Act Density 0.233%

    No Known Activations