INDEX
    Explanations

    words related to attacks or negative events, particularly those involving physical harm

    terms related to 'ter' and written works or documents

    New Auto-Interp
    Negative Logits
    overs
    -0.69
    OWN
    -0.65
    upid
    -0.63
    ushi
    -0.62
    ooth
    -0.62
    raction
    -0.61
    EGA
    -0.61
    icably
    -0.61
     Shiv
    -0.60
    UNE
    -0.58
    POSITIVE LOGITS
    mic
    1.00
    pher
    0.93
    ping
    0.91
    borgh
    0.89
    mes
    0.82
    ciating
    0.76
    pers
    0.76
    gins
    0.76
    thing
    0.75
    rior
    0.75
    Act Density 0.100%

    No Known Activations