INDEX
    Explanations

    indicators of existence and presence in statements

    New Auto-Interp
    Negative Logits
    entin
    -0.18
    igor
    -0.15
    ãĤ¤ãĥī
    -0.15
    forge
    -0.15
     errs
    -0.15
    .gdx
    -0.14
    ÎŃλ
    -0.14
    Beginning
    -0.14
     Sever
    -0.14
    rippling
    -0.14
    POSITIVE LOGITS
     happened
    0.18
     having
    0.17
     helt
    0.17
     being
    0.15
     done
    0.15
    933
    0.15
     existed
    0.15
     gonna
    0.15
    rani
    0.14
    лиÑĤ
    0.14
    Act Density 0.267%

    No Known Activations