INDEX
    Explanations

    negations and expressions of denial or refusal

    New Auto-Interp
    Negative Logits
    endet
    -0.15
    å©Ĩ
    -0.15
    hurst
    -0.15
    stras
    -0.15
    onia
    -0.15
    ipar
    -0.15
    ë¡Ń
    -0.14
    akra
    -0.14
    astr
    -0.14
    oksen
    -0.14
    POSITIVE LOGITS
    tingham
    0.18
    te
    0.15
    ye
    0.15
    lus
    0.15
    ori
    0.15
    ches
    0.15
    rac
    0.14
    laz
    0.14
     Dess
    0.14
    cher
    0.14
    Act Density 0.075%

    No Known Activations