INDEX
    Explanations

    equals sign

    New Auto-Interp
    Negative Logits
     acted
    -0.08
     continually
    -0.08
    Enter
    -0.08
     acts
    -0.08
    represent
    -0.07
    ------------------------------------------------------------------------------------------------
    -0.07
     اقدامات
    -0.07
     enters
    -0.07
    enter
    -0.07
     corridors
    -0.07
    POSITIVE LOGITS
     infatti
    0.10
     indeed
    0.09
     ntabwo
    0.08
     opravdu
    0.08
    0.08
     :-)
    0.08
     இல்லை
    0.08
     Indeed
    0.08
     невер
    0.08
     zählen
    0.08
    Act Density 0.062%

    No Known Activations