INDEX
    Explanations

    negations and warnings against specific actions

    New Auto-Interp
    Negative Logits
     allAfrica
    -0.53
     متحده
    -0.52
     otomatig
    -0.49
    matchCondition
    -0.47
     تانيه
    -0.47
     насељу
    -0.47
    tvguidetime
    -0.46
     Exactos
    -0.46
    IContainer
    -0.45
    MLLoader
    -0.45
    POSITIVE LOGITS
     Jangan
    0.73
     不要
    0.64
    Jangan
    0.63
     đừng
    0.63
     jangan
    0.60
    Dont
    0.59
    DoNot
    0.57
    你不要
    0.57
     Dont
    0.57
     avoid
    0.54
    Act Density 0.209%

    No Known Activations