INDEX
    Explanations

    the word "des" followed by a single character

    terms related to the concept of 'destruction' or 'damaging' actions

    New Auto-Interp
    Negative Logits
    OWS
    -0.77
    glers
    -0.74
    DAY
    -0.74
    Reviewer
    -0.73
    ancial
    -0.72
    hetti
    -0.72
    regor
    -0.71
    razil
    -0.70
    intendent
    -0.67
    ONY
    -0.66
    POSITIVE LOGITS
    ync
    0.96
    ugar
    0.91
    ktop
    0.90
    erve
    0.89
    plet
    0.85
    perate
    0.85
    semb
    0.84
    irable
    0.82
    viron
    0.81
    erving
    0.81
    Act Density 0.005%

    No Known Activations