INDEX
    Explanations

    expressions of contradiction or clarification in statements

    New Auto-Interp
    Negative Logits
    ÙĪÙĪ
    -0.18
    uelle
    -0.16
    uel
    -0.15
    isel
    -0.15
    elin
    -0.14
    oldt
    -0.14
     certainly
    -0.14
    ogn
    -0.14
    Äĥr
    -0.14
     misdemean
    -0.14
    POSITIVE LOGITS
     actually
    0.22
    actually
    0.22
    actual
    0.21
     actual
    0.20
     Actually
    0.18
    Actually
    0.17
    Actual
    0.17
     Actual
    0.17
    _actual
    0.16
    (actual
    0.16
    Act Density 0.140%

    No Known Activations