INDEX
    Explanations

    phrases related to social interactions and personal identity

    New Auto-Interp
    Negative Logits
    idata
    -0.16
    imu
    -0.16
    imers
    -0.15
    eka
    -0.15
    edith
    -0.15
    ESIS
    -0.15
    angi
    -0.15
    ysa
    -0.15
    )NULL
    -0.15
    jom
    -0.14
    POSITIVE LOGITS
     rather
    0.24
     Rather
    0.21
    rather
    0.21
    Rather
    0.20
     instead
    0.20
     Freund
    0.16
     universal
    0.16
     abstract
    0.15
     Instead
    0.15
     determined
    0.15
    Act Density 0.258%

    No Known Activations