INDEX
    Explanations

    negative terms associated with competition or conflict

    terms related to conflict or opposition

    New Auto-Interp
    Negative Logits
    ples
    -0.67
    ination
    -0.64
     effected
    -0.63
    inated
    -0.61
    acting
    -0.59
    informed
    -0.58
    izations
    -0.58
    romy
    -0.58
    val
    -0.56
    uter
    -0.56
    POSITIVE LOGITS
    bilt
    0.71
    mares
    0.67
    pole
    0.65
    emonic
    0.64
    cliffe
    0.64
    ngth
    0.64
    ãĤ¤ãĥĪ
    0.64
    ¯¯¯¯
    0.62
    rawler
    0.61
    weed
    0.61
    Act Density 0.188%

    No Known Activations