INDEX
    Explanations

    terms related to power dynamics or performances

    New Auto-Interp
    Negative Logits
    romeda
    -0.72
     Niet
    -0.68
    roit
    -0.67
    eret
    -0.67
     Bei
    -0.67
    OOL
    -0.67
    eryl
    -0.65
    algia
    -0.64
     Von
    -0.63
    olson
    -0.63
    POSITIVE LOGITS
    houses
    1.05
    stroke
    1.03
    lifting
    0.97
     outage
    0.90
    puff
    0.89
    train
    0.85
    lessness
    0.82
    full
    0.80
     chords
    0.80
    Reviewer
    0.80
    Act Density 0.035%

    No Known Activations