INDEX
    Explanations

    negations and expressions of uncertainty or doubt

    New Auto-Interp
    Negative Logits
     really
    -0.20
     Really
    -0.19
    inski
    -0.18
     only
    -0.18
    coni
    -0.17
     never
    -0.17
    really
    -0.17
    ruh
    -0.17
    Really
    -0.16
     NOT
    -0.16
    POSITIVE LOGITS
     already
    0.31
    already
    0.30
    Already
    0.28
     Already
    0.27
    å·²ç»ı
    0.23
     otherwise
    0.21
     yet
    0.19
     å·²
    0.19
    otherwise
    0.19
     bereits
    0.19
    Act Density 0.179%

    No Known Activations