INDEX
    Explanations

    negations and phrases indicating resistance or refusal

    New Auto-Interp
    Negative Logits
    ndon
    -0.15
    -avatar
    -0.14
    assis
    -0.14
     nowhere
    -0.14
    ัà¸ģà¸Ĺ
    -0.14
    599
    -0.14
     fend
    -0.14
    ì¹Ļ
    -0.13
     actually
    -0.13
    rios
    -0.13
    POSITIVE LOGITS
     sugar
    0.24
     waiver
    0.24
     settling
    0.21
     settle
    0.20
     Sugar
    0.20
    Sugar
    0.20
     gloss
    0.19
     rest
    0.19
     coast
    0.19
    succ
    0.18
    Act Density 0.248%

    No Known Activations