INDEX
    Explanations

    negations and expressions of doubt or uncertainty in statements

    New Auto-Interp
    Negative Logits
    mazon
    -0.17
    unas
    -0.15
    ijke
    -0.14
    رÙĪØ³
    -0.14
    amburg
    -0.14
    inet
    -0.14
    .ravel
    -0.14
     çĤ
    -0.13
    Trap
    -0.13
    OOM
    -0.13
    POSITIVE LOGITS
    VF
    0.17
    ãĤ
    0.15
     powder
    0.15
    ey
    0.14
    issing
    0.14
    BU
    0.14
    rif
    0.14
    iç
    0.14
    CTR
    0.14
    ãĤĥ
    0.13
    Act Density 0.372%

    No Known Activations