INDEX
    Explanations

    words related to criticism or negative descriptions

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.03
    2:0.31
    3:0.09
    4:0.15
    5:0.04
    6:0.02
    7:0.02
    8:0.05
    9:0.06
    10:0.05
    11:0.02
    Negative Logits
    uddin
    -1.38
    yip
    -1.34
    john
    -1.22
    terness
    -1.21
    ollah
    -1.19
    htaking
    -1.13
    lik
    -1.13
     underestimate
    -1.12
     envelope
    -1.12
    agine
    -1.11
    POSITIVE LOGITS
    GES
    1.31
     mach
    1.27
    ombat
    1.26
    ゼウス
    1.25
     Tire
    1.23
    pter
    1.17
    >>>>
    1.16
    cised
    1.15
    aband
    1.15
    1.13
    Act Density 0.003%

    No Known Activations