INDEX
    Explanations

    profane words or phrases

    linguistic patterns related to specific suffixes and potentially notable keywords

    New Auto-Interp
    Negative Logits
    rift
    -0.80
    iry
    -0.75
     Wond
    -0.74
    MQ
    -0.74
    romy
    -0.70
    dq
    -0.69
    DOM
    -0.67
    IR
    -0.67
    Soc
    -0.66
    ilk
    -0.64
    POSITIVE LOGITS
     banter
    0.82
     insult
    0.78
     gluc
    0.74
     fres
    0.74
    plet
    0.74
     heck
    0.73
    eah
    0.72
     boo
    0.66
    onsense
    0.66
     outburst
    0.66
    Act Density 0.055%

    No Known Activations