INDEX
    Explanations

    phrases indicating time or duration

    New Auto-Interp
    Negative Logits
    onis
    -0.16
    776
    -0.15
    UCE
    -0.14
    stanov
    -0.14
    ritz
    -0.14
    krom
    -0.14
    cef
    -0.14
    irim
    -0.14
    ÏĦεÏħ
    -0.14
    RYPT
    -0.14
    POSITIVE LOGITS
     wh
    0.55
    wh
    0.55
    WH
    0.43
    -wh
    0.43
     Wh
    0.41
    Wh
    0.38
     WH
    0.36
    .wh
    0.34
    _wh
    0.34
    _WH
    0.31
    Act Density 0.110%

    No Known Activations