INDEX
    Explanations

    mentions of negative aspects or consequences

    New Auto-Interp
    Negative Logits
    rouse
    -0.74
    glas
    -0.73
    ynthesis
    -0.72
     Swords
    -0.70
    orthy
    -0.70
    aeda
    -0.69
    aukee
    -0.68
     Collider
    -0.67
    arya
    -0.66
    ILA
    -0.66
    POSITIVE LOGITS
    der
    0.86
     plag
    0.80
    fully
    0.79
     havoc
    0.78
    ulent
    0.78
     inflicted
    0.77
     Clown
    0.77
    asses
    0.76
     heap
    0.76
    ged
    0.76
    Act Density 4.465%

    No Known Activations