INDEX
    Explanations

    discussions about values and the inconsistency in human behavior

    New Auto-Interp
    Negative Logits
    νÏĮ
    -0.17
    dech
    -0.16
    Biz
    -0.16
    itag
    -0.15
    ltk
    -0.15
    ibold
    -0.15
    ITTE
    -0.14
     archae
    -0.14
    apiro
    -0.14
    ÑĤÑĢа
    -0.14
    POSITIVE LOGITS
     Charity
    0.17
     GPI
    0.16
     util
    0.16
     interventions
    0.15
     Prison
    0.15
    elen
    0.15
     charity
    0.15
     EA
    0.14
    ocale
    0.14
     Slate
    0.14
    Act Density 0.023%

    No Known Activations