INDEX
    Explanations

    themes of social justice and moral judgment

    New Auto-Interp
    Negative Logits
    ulis
    -0.14
    .shared
    -0.14
    enus
    -0.14
    akov
    -0.13
    .Shared
    -0.13
     Laud
    -0.13
    agem
    -0.13
     shared
    -0.13
     bonne
    -0.13
    zz
    -0.13
    POSITIVE LOGITS
    fü
    0.14
    ingham
    0.14
    ÑĩеÑĢ
    0.14
    vat
    0.14
    ä»Ķ
    0.14
    ·
    0.14
    CRET
    0.14
    510
    0.14
    ÑĹ
    0.14
    ضÙĬ
    0.13
    Act Density 0.342%

    No Known Activations