INDEX
    Explanations

    concepts related to empathy and social welfare

    New Auto-Interp
    Negative Logits
     method
    -0.49
    Erreferentziak
    -0.49
     Colin
    -0.47
    Наводи
    -0.46
    ENCES
    -0.46
     model
    -0.46
    האם
    -0.46
    -0.45
     círculo
    -0.45
    esub
    -0.45
    POSITIVE LOGITS
     AttributeSet
    0.85
     pleaſure
    0.81
     wellbeing
    0.80
     welfare
    0.80
     sake
    0.74
     gainera
    0.72
    +#+#
    0.69
    脚注の使い方
    0.69
     purpoſe
    0.67
    ViewFeatures
    0.67
    Act Density 0.236%

    No Known Activations