INDEX
    Explanations

    terms expressing contradiction or exception

    New Auto-Interp
    Negative Logits
    AnchorTagHelper
    -0.71
     Himo
    -0.64
    -0.60
    AndEndTag
    -0.60
    OuterAlt
    -0.58
     kasarigan
    -0.58
    llary
    -0.56
     kaarangay
    -0.54
    IntoConstraints
    -0.54
    Aid
    -0.52
    POSITIVE LOGITS
    1.96
    1.80
    但却
    1.23
    却不
    1.20
    却是
    1.13
    却没有
    1.11
    却被
    1.09
    却又
    1.04
     però
    0.81
     justru
    0.74
    Act Density 0.002%

    No Known Activations