INDEX
    Explanations

    words related to ethical and philosophical discussions

    New Auto-Interp
    Negative Logits
     externalToEVAOnly
    -0.65
     seiz
    -0.64
     Mub
    -0.61
    olver
    -0.60
     stride
    -0.59
    ilogy
    -0.59
    dfx
    -0.58
     disg
    -0.58
    anchez
    -0.56
     submar
    -0.56
    POSITIVE LOGITS
    ments
    1.62
    ment
    1.46
    MENT
    1.33
     Yourself
    1.28
    able
    1.27
    ables
    1.25
    ings
    1.22
    MENTS
    1.14
    ABLE
    1.10
    ability
    1.10
    Act Density 0.178%

    No Known Activations