INDEX
    Explanations

    expressions of irony and hypocrisy

    New Auto-Interp
    Negative Logits
    lah
    -0.15
    lator
    -0.14
    .bundle
    -0.14
     nonatomic
    -0.14
    ering
    -0.14
    .Glide
    -0.14
    ingo
    -0.14
    -grade
    -0.13
     åIJ
    -0.13
    ignon
    -0.13
    POSITIVE LOGITS
    ickt
    0.17
    ikat
    0.17
    TEGER
    0.17
    bero
    0.15
    eras
    0.15
    etta
    0.15
    quals
    0.14
    aland
    0.14
    emez
    0.14
    odel
    0.14
    Act Density 0.041%

    No Known Activations