INDEX
    Explanations

    words that denote various forms of criticism or negativity towards entities or behaviors

    New Auto-Interp
    Head Attr Weights
    0:0.04
    1:0.02
    2:0.16
    3:0.04
    4:0.18
    5:0.09
    6:0.03
    7:0.03
    8:0.11
    9:0.17
    10:0.05
    11:0.02
    Negative Logits
    ��
    -1.64
    ��
    -1.62
    ufact
    -1.52
    acea
    -1.49
    QL
    -1.46
    ドラ
    -1.45
    Si
    -1.40
    omnia
    -1.34
    theless
    -1.32
    ña
    -1.32
    POSITIVE LOGITS
     Brom
    1.38
     Cle
    1.36
     Lamar
    1.32
     Hank
    1.30
     Mer
    1.29
     Mort
    1.21
     Klu
    1.20
    agall
    1.19
     Clay
    1.19
     Louis
    1.19
    Act Density 0.006%

    No Known Activations