INDEX
    Explanations

    instances of assessing moral judgments or character evaluations

    New Auto-Interp
    Negative Logits
    enco
    -0.16
    hwnd
    -0.15
    inea
    -0.15
    zim
    -0.15
    earch
    -0.14
    osaur
    -0.14
    UNUSED
    -0.14
     somehow
    -0.14
     Gn
    -0.14
    Liked
    -0.13
    POSITIVE LOGITS
     except
    0.56
    except
    0.48
     Except
    0.41
    Except
    0.40
     apart
    0.39
     кÑĢоме
    0.36
     aside
    0.34
    _except
    0.34
    éϤäºĨ
    0.32
    	except
    0.32
    Act Density 0.228%

    No Known Activations