INDEX
    Explanations

    phrases with contrasting language, often involving limitations or exceptions

    New Auto-Interp
    Negative Logits
    erenn
    -0.63
    tnc
    -0.59
    edu
    -0.58
    代
    -0.57
    ¢
    -0.56
    ADA
    -0.55
    oire
    -0.55
    pet
    -0.55
    oufl
    -0.54
    abre
    -0.53
    POSITIVE LOGITS
     alas
    1.08
     nevertheless
    0.94
     secondly
    0.93
     nonetheless
    0.91
     unfortunately
    0.90
     luckily
    0.90
     beware
    0.90
     fortunately
    0.88
    tons
    0.84
     interestingly
    0.82
    Act Density 0.178%

    No Known Activations