INDEX
    Explanations

    affirmative actions and expressions of preference or support

    New Auto-Interp
    Negative Logits
    InputBorder
    -0.63
    transQ
    -0.61
     themselves
    -0.56
    发表于
    -0.56
     their
    -0.56
    OGND
    -0.56
     utafitiHapana
    -0.54
    pcm
    -0.52
     JTable
    -0.51
    washingtonpost
    -0.51
    POSITIVE LOGITS
     myself
    0.93
     myſelf
    0.87
    myself
    0.77
     Myself
    0.68
     minhas
    0.67
    我自己
    0.65
     my
    0.59
     मैं
    0.57
    LookAnd
    0.57
     خودم
    0.56
    Act Density 0.906%

    No Known Activations