INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     Seym
    -0.85
    swick
    -0.80
    ilib
    -0.79
     Volunte
    -0.77
    cair
    -0.75
    KER
    -0.70
    mercial
    -0.69
    pron
    -0.68
    Correct
    -0.67
     DOC
    -0.64
    POSITIVE LOGITS
    thora
    0.74
     harass
    0.73
     frank
    0.71
     grav
    0.69
    riz
    0.69
     masturb
    0.67
     rampage
    0.67
     morale
    0.65
     inciner
    0.65
     wrath
    0.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.