INDEX
    Explanations

    negative judgment or disapproval

    New Auto-Interp
    Negative Logits
    /
    0.45
    大型
    0.43
     lifecycle
    0.40
    üksek
    0.40
     ከፍተኛ
    0.39
    ↵↵
    0.38
    including
    0.38
    进行
    0.38
    OD
    0.38
     काफी
    0.38
    POSITIVE LOGITS
     forbids
    0.47
     denunci
    0.46
     отрица
    0.45
     despised
    0.44
     diminution
    0.43
     amenaza
    0.43
     scorn
    0.43
     sufrimiento
    0.43
     あなた
    0.43
     ненави
    0.42
    Act Density 0.054%

    No Known Activations