INDEX
    Explanations

    words related to opposition or contradiction

    terms related to counter-arguments and counterproductive actions

    New Auto-Interp
    Negative Logits
    ©¶æ¥µ
    -0.71
    ogo
    -0.63
    é¾įå
    -0.63
     Bonds
    -0.62
     Vide
    -0.61
     FANTASY
    -0.61
    weeney
    -0.61
    ahime
    -0.60
     Likes
    -0.60
     livest
    -0.59
    POSITIVE LOGITS
    measures
    0.80
    dict
    0.78
    attack
    0.76
    xual
    0.75
    intuitive
    0.75
    arya
    0.73
    ctive
    0.72
    atives
    0.71
    argument
    0.71
    rad
    0.71
    Act Density 0.078%

    No Known Activations