INDEX
    Explanations

    phrases indicating an uncertainty or speculation about future outcomes

    the expression of negation or refusal

    New Auto-Interp
    Negative Logits
    illin
    -0.69
    OTOS
    -0.67
    bian
    -0.65
    gypt
    -0.62
    mens
    -0.61
     Traps
    -0.60
    MEN
    -0.60
    assies
    -0.59
    angering
    -0.59
     sqor
    -0.58
    POSITIVE LOGITS
    't
    1.22
    itive
    0.95
    stall
    0.78
    now
    0.74
    geon
    0.70
    rar
    0.70
    ª
    0.69
    kish
    0.67
    ners
    0.67
    ald
    0.65
    Act Density 0.027%

    No Known Activations