INDEX
    Explanations

    phrases related to comparison or evaluation

    phrases that emphasize negation or denial

    New Auto-Interp
    Negative Logits
     Almighty
    -0.64
    LIN
    -0.63
     Naples
    -0.59
    WT
    -0.59
    Condition
    -0.56
    catentry
    -0.56
    multipl
    -0.56
     motif
    -0.55
    TEXTURE
    -0.54
    soType
    -0.53
    POSITIVE LOGITS
     anymore
    0.69
    vae
    0.68
     nor
    0.64
     Enough
    0.64
     bother
    0.63
    dfx
    0.61
    REDACTED
    0.61
     Cheong
    0.61
    userc
    0.60
    cffff
    0.59
    Act Density 0.182%

    No Known Activations