INDEX
    Explanations

    phrases related to advising or urging against certain actions

    phrases related to abstaining or refraining from actions

    New Auto-Interp
    Negative Logits
    ammy
    -1.01
    NetMessage
    -0.75
    immer
    -0.70
    rations
    -0.69
    ramid
    -0.69
    ovies
    -0.67
    neau
    -0.67
    odes
    -0.67
    oÄŁ
    -0.66
    onomy
    -0.65
    POSITIVE LOGITS
     refrain
    1.18
    rences
    0.90
     abst
    0.86
    SourceFile
    0.78
    ////////
    0.69
    ministic
    0.69
     stren
    0.67
     answering
    0.67
     acknow
    0.66
    swer
    0.66
    Act Density 0.008%

    No Known Activations