INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     CreateTagHelper
    -0.59
    septic
    -0.57
    ArrowToggle
    -0.56
    TintMode
    -0.55
     للمعارف
    -0.54
     Wikimedijinoj
    -0.54
    Enders
    -0.53
    tops
    -0.53
    Становништво
    -0.53
    ="@+
    -0.52
    POSITIVE LOGITS
     the
    1.20
     a
    1.02
     an
    0.85
     some
    0.78
     those
    0.78
     this
    0.75
     their
    0.73
     your
    0.72
     up
    0.72
     its
    0.72
    Act Density 0.048%

    No Known Activations