INDEX
    Explanations

    negation and restriction

    New Auto-Interp
    Negative Logits
    erece
    -0.09
     впол
    -0.09
    amac
    -0.09
    olley
    -0.09
    ilde
    -0.09
    endl
    -0.09
     Sik
    -0.08
    å¬
    -0.08
    .appspot
    -0.08
    owi
    -0.08
    POSITIVE LOGITS
     too
    0.16
     direct
    0.16
     directly
    0.14
    too
    0.14
    direct
    0.14
    太
    0.14
    缴æİ¥
    0.13
     Direct
    0.13
    Direct
    0.13
     TOO
    0.13
    Act Density 0.119%

    No Known Activations