INDEX
    Explanations

    negative responses or refusals

    negative affirmations or words expressing refusal

    New Auto-Interp
    Negative Logits
    RAFT
    -0.76
    lycer
    -0.75
    iership
    -0.74
    ulative
    -0.66
    romeda
    -0.66
    endish
    -0.65
    IUM
    -0.64
    assies
    -0.64
    ual
    -0.64
    rious
    -0.63
    POSITIVE LOGITS
    xious
    1.12
    zzle
    0.99
     matter
    0.93
    except
    0.92
    obs
    0.88
     longer
    0.86
    oses
    0.86
    ct
    0.84
    ises
    0.83
    AH
    0.81
    Act Density 0.087%

    No Known Activations