INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I
    -0.57
     and
    -0.54
    out
    -0.48
    OUT
    -0.47
     is
    -0.47
     has
    -0.43
     out
    -0.42
     Le
    -0.42
    se
    -0.40
     Out
    -0.40
    POSITIVE LOGITS
     CreateTagHelper
    0.99
    expandindo
    0.96
     Russians
    0.91
     дописавши
    0.90
     beginnetje
    0.90
     Paglinawan
    0.88
     Roskov
    0.86
     Arabs
    0.85
     Frenchmen
    0.84
     Spaniards
    0.84
    Act Density 0.039%

    No Known Activations